Though smartphones can be used to listen to music, they can't compete with high-end music players. Toward the top of that list is Sony's NW-ZX707 Walkman.
This feature ensures the NW-ZX707 can transform standard MP3 or PCM audio to the ultra-high frequency 11.2 Mhz DSD audio stream.
That doesn’t make a lot of sense to me.
Humans can only hear up to about 20kHz, so you’re not getting much benefit above about double that.
Even assuming that humans could hear frequencies hundreds of times higher, audio isn’t generally available sampled at 11.2 Mhz. If you’re getting music, the recording and audio engineering work, the microphones, etc, aren’t designed to accurately capture data at high frequencies.
Even assuming that none of that were the case, the audio engineer and artists weren’t trying to make audio that sounds good at that frequency (which they can’t hear either). The music doesn’t intrinsically have some aesthetically-pleasing quality that you can extract; they were the ones who added it, and they did that via making judgments using their own senses, which can’t hear this.
Even aside from that, it doesn’t look like this comes with headphones. Whatever you are plugging into this has to induce vibration in the air for it to make it to your ears, and probably does not have a meaningful frequency response at that frequency.
The NW-ZX707 also gets Sony’s proprietary digital music processing technologies, including the DSEE Ultimate technology, developed in-house to restore compressed music files to the quality of a CD by interpolating sound algorithms.
And it makes even less sense if your starting audio has actually thrown out data in frequencies that humans can hear by using lossy compression there, even if we aren’t terribly sensitive to those.
Yeah the entire article smells like gold plated HDMI cables from Monster, as if that somehow improves the quality of digital signals.
Sony has judiciously used gold across the internals of the NW-ZX707, including its solder and reflow solder elements, to further improve sound localization.
Gold has a higher resistivity than copper. Resistance adds noise. It’s probably just for corrosion resistance.
Another reason audiophiles have come to appreciate the NW-ZX707 is something called the vinyl processor that lends the unmistakable character of vinyl discs back to their digital tracks.
So they further distort the sound to replicate lower quality equipment? They’re definitely not making it sound more like the original by introducing vinyl artifacts.
This is some serious hobbyist pricing bait, but I can’t judge since I’ve got my own dumb expensive hobbies.
MHz refers to the samples per second, not the pitch. CD audio for example is 16-bit/44.1kHz. What that means is there are 16-bits of sampling (audio) taken 44,100 times per second. DSD on the other hand is 1-bit samples taken 11.2 million times per second, this is referred to as DSD256. What that translates to is a digital wave that looks a lot closer to an analog wave than a CD does. It has nothing to do with the frequency of listening in this case.
This feature ensures the NW-ZX707 can transform standard MP3 or PCM audio to the ultra-high frequency 11.2 Mhz DSD audio stream.
I think the article is just incorrect. Sony probably means it can just decide .dsf files. And you are confusing 1 bit DSD with 16 bit PCM. The most common DSD format is DSD64 2.8Mhz which is equivalent to 16 bit /176khz, 24 bit/117khz, or 32 bit/ 88.2khz. And the microphones and instruments do work at these high frequencies.
No, the product page mentions the “DSD Remastering Engine”, which says the same thing as the article. They probably just mean they’re using a 1-bit DAC, and are trying to pass that off as a selling point. Although the article did lose the “1-bit” part.
Then I stand corrected, although the article does conflate DSD decoding with The “remastering engine”. Just cause it can decode it doesn’t mean it can resample PCM into DSD. Those are 2 seperate features.
Let me add that I don’t think that we are at the end-all-and-be-all of audio. I can hypothetically imagine things that might be done if one threw more money at audio playback that would create a better experience than one can get today.
When you hear audio from a given point, some of how you detect the location of an audio source is due to the effect on it hitting your ears, which are of a distinct shape, which means that what’s actually hitting your inner ear is slightly unique to an individual person Currently, if you’re listening to a static audio file, it’s the same for everyone. One could hypothetically ship hardware which fits inside the ear of and can build an audio model for the ear of a given individual to make audio which reflects their specific ears. Then audio could be played back that sounds as if it’s actually coming from a given point in space relative to someone’s ears. That’s not a drop-in improvement for existing audio, because you’d need to have 3D location information available about the individual sources in the audio. But if audio companies wanted to sell a fancier experience for audio that does have that information, they could leverage that.
For decades, audio playback devices have tried to produce visual effects that synchronize with music. They haven’t done a phenomenal job, at even basic stuff like beat detection, in my opinion, and so clubs and the like have people that have to rig up DMX512 gear with manually-created annotations to have effects happen at a given point. Audio tracks today don’t have a standard format for annotations; if I go buy an album, it doesn’t come with something like that. One could produce a standard for it and rig up various gear, like strobes or colored light or even do this in VR, to stimulate the other senses in time with the audio.
I suspect that very few people listen to audio in an environment where they can hear absolutely zero detectable background sound when they don’t have their audio playing. You can get decent passive sound cancellation devices, but they only go so far; even good passive sound cancellation headphones are something that one can probably hear fairly quiet sound through. Right now, active sound cancellation devices are being worked on, but that doesn’t get one to the point of inaudibility either, and I haven’t seen anything that does both good active and passive cancellation, so using active noise cancellation means giving up good passive noise cancellation.
My point is that I think that there are remaining areas for audio hardware companies to explore to try to create better experiences. I just don’t think that playing audio at a sampling frequency hundreds of times above the frequencies that humans can hear is really a fantastic area to be banging on.
the benefit of sampling above 20khz is that you can even out the signal over a period of time which will make it more accurate for frequencies up to 20khz. you will get a noisy signal but all the noise is in frequencies you can’t hear.
you also need to consider how the voltage is generated. in general there are limits regarding how quickly can voltage surge. e.g. you can’t reproduce a square wave properly in most cases after amplification. in the end this makes dsd much less relevant.
you also need to consider that the reproduction is not perfect and neither is the recording. e.g. a square wave will not be captured properly
edit: I forgot to mention that the slew rate limit has a parallel on the speaker/headphone membrane but it’s much worse than the amp since it’s a physical object with momentum.
That doesn’t make a lot of sense to me.
Humans can only hear up to about 20kHz, so you’re not getting much benefit above about double that.
Even assuming that humans could hear frequencies hundreds of times higher, audio isn’t generally available sampled at 11.2 Mhz. If you’re getting music, the recording and audio engineering work, the microphones, etc, aren’t designed to accurately capture data at high frequencies.
Even assuming that none of that were the case, the audio engineer and artists weren’t trying to make audio that sounds good at that frequency (which they can’t hear either). The music doesn’t intrinsically have some aesthetically-pleasing quality that you can extract; they were the ones who added it, and they did that via making judgments using their own senses, which can’t hear this.
Even aside from that, it doesn’t look like this comes with headphones. Whatever you are plugging into this has to induce vibration in the air for it to make it to your ears, and probably does not have a meaningful frequency response at that frequency.
And it makes even less sense if your starting audio has actually thrown out data in frequencies that humans can hear by using lossy compression there, even if we aren’t terribly sensitive to those.
Yeah the entire article smells like gold plated HDMI cables from Monster, as if that somehow improves the quality of digital signals.
Gold has a higher resistivity than copper. Resistance adds noise. It’s probably just for corrosion resistance.
So they further distort the sound to replicate lower quality equipment? They’re definitely not making it sound more like the original by introducing vinyl artifacts.
This is some serious hobbyist pricing bait, but I can’t judge since I’ve got my own dumb expensive hobbies.
MHz refers to the samples per second, not the pitch. CD audio for example is 16-bit/44.1kHz. What that means is there are 16-bits of sampling (audio) taken 44,100 times per second. DSD on the other hand is 1-bit samples taken 11.2 million times per second, this is referred to as DSD256. What that translates to is a digital wave that looks a lot closer to an analog wave than a CD does. It has nothing to do with the frequency of listening in this case.
If you’d like to learn more, check this out.
You should also check this out: https://www.youtube.com/watch?v=cD7YFUYLpDc
Here is an alternative Piped link(s): https://piped.video/watch?v=cD7YFUYLpDc
https://piped.video/watch?v=cD7YFUYLpDc
Piped is a privacy-respecting open-source alternative frontend to YouTube.
I’m open-source, check me out at GitHub.
I think this article covers it more succinctly https://en.wikipedia.org/wiki/Nyquist%E2%80%93Shannon_sampling_theorem
edit:
also relevant: https://en.wikipedia.org/wiki/Delta-sigma_modulation#
I think the article is just incorrect. Sony probably means it can just decide .dsf files. And you are confusing 1 bit DSD with 16 bit PCM. The most common DSD format is DSD64 2.8Mhz which is equivalent to 16 bit /176khz, 24 bit/117khz, or 32 bit/ 88.2khz. And the microphones and instruments do work at these high frequencies.
No, the product page mentions the “DSD Remastering Engine”, which says the same thing as the article. They probably just mean they’re using a 1-bit DAC, and are trying to pass that off as a selling point. Although the article did lose the “1-bit” part.
Then I stand corrected, although the article does conflate DSD decoding with The “remastering engine”. Just cause it can decode it doesn’t mean it can resample PCM into DSD. Those are 2 seperate features.
Let me add that I don’t think that we are at the end-all-and-be-all of audio. I can hypothetically imagine things that might be done if one threw more money at audio playback that would create a better experience than one can get today.
When you hear audio from a given point, some of how you detect the location of an audio source is due to the effect on it hitting your ears, which are of a distinct shape, which means that what’s actually hitting your inner ear is slightly unique to an individual person Currently, if you’re listening to a static audio file, it’s the same for everyone. One could hypothetically ship hardware which fits inside the ear of and can build an audio model for the ear of a given individual to make audio which reflects their specific ears. Then audio could be played back that sounds as if it’s actually coming from a given point in space relative to someone’s ears. That’s not a drop-in improvement for existing audio, because you’d need to have 3D location information available about the individual sources in the audio. But if audio companies wanted to sell a fancier experience for audio that does have that information, they could leverage that.
For decades, audio playback devices have tried to produce visual effects that synchronize with music. They haven’t done a phenomenal job, at even basic stuff like beat detection, in my opinion, and so clubs and the like have people that have to rig up DMX512 gear with manually-created annotations to have effects happen at a given point. Audio tracks today don’t have a standard format for annotations; if I go buy an album, it doesn’t come with something like that. One could produce a standard for it and rig up various gear, like strobes or colored light or even do this in VR, to stimulate the other senses in time with the audio.
I suspect that very few people listen to audio in an environment where they can hear absolutely zero detectable background sound when they don’t have their audio playing. You can get decent passive sound cancellation devices, but they only go so far; even good passive sound cancellation headphones are something that one can probably hear fairly quiet sound through. Right now, active sound cancellation devices are being worked on, but that doesn’t get one to the point of inaudibility either, and I haven’t seen anything that does both good active and passive cancellation, so using active noise cancellation means giving up good passive noise cancellation.
My point is that I think that there are remaining areas for audio hardware companies to explore to try to create better experiences. I just don’t think that playing audio at a sampling frequency hundreds of times above the frequencies that humans can hear is really a fantastic area to be banging on.
Isn’t that what Atmos is supposed to do. Although currently we don’t have personalized HRTFs for it.
the benefit of sampling above 20khz is that you can even out the signal over a period of time which will make it more accurate for frequencies up to 20khz. you will get a noisy signal but all the noise is in frequencies you can’t hear.
you also need to consider how the voltage is generated. in general there are limits regarding how quickly can voltage surge. e.g. you can’t reproduce a square wave properly in most cases after amplification. in the end this makes dsd much less relevant.
you also need to consider that the reproduction is not perfect and neither is the recording. e.g. a square wave will not be captured properly
edit: I forgot to mention that the slew rate limit has a parallel on the speaker/headphone membrane but it’s much worse than the amp since it’s a physical object with momentum.