Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is a great post, but it's a little bit misleading when talking about MP3s and lossy compression and conflates analog fourier analysis with discrete analysis.

When you're talking about a digital signal, it is the sample rate that determines the maximum frequency you can represent. It's not MP3s that "throw out the really high notes" -- it's any digital signal. A discrete fourier transform actually is lossless, but it is bandwidth limited.

The reason audiophiles prefer Flac to MP3s, for instance, is because MP3s do more than just "throw out the high notes." Both are bandwidth limited, but MP3s also throw out other information based on psychoacoustic principles.



I'm an audiophile, and I can tell you from looking at the FFT which settings you used in LAME to make that MP3* :)

Different presets set different lowpass filter values, opting for the best balance between preserving frequencies, achieving a small file size, and the perceived quality. The latter is often measured with various approximations to human perception of sound - psychoacoustics. This is why you can't compare different encoders using an approximation - the algorithms themselves are made using them. It's the Dunning-Kruger effect, but for computers.

You can actually change those presets, so you can hear for yourself what effects changing it has: http://lame.cvs.sourceforge.net/viewvc/lame/lame/USAGE

* Why would anyone bother to learn this skill? Well, I don't mind MP3s (or AACs or OGGs for that matter), and I certainly can't tell the difference between them and lossless formats, but an MP3 that has been re-encoded several times is really atrocious. It's like when you re-compress a jpg a few times, it gets messy. This helps definitively figure out what an MP3 really is or was. Sometimes the tags on the MP3 lie about which encoder was used or what the compression settings were. Other times, the file used to be something else, such as re-encoding a 128kbps MP3 as a LAME V0 preset MP3. Looking at the frequency plot in a sound editor makes this rather obvious, as the lower presets and crummier encoders have much lower lowpass filters.


Thanks for your feedback. Sure, any digital signal is by definition finite in its resolution (the sample rate or bits). I was trying to address the distinction between wave files of the type stored on audio CDs, and MP3s - both digital signals. I agree that the Fourier transform is in principle lossless, but it's particularly useful to use it in a lossy way, i.e. to throw out the least important (to us) components of the signal. If I was particularly misleading about this, I'd like to know.


Well, if you want to be 100% accurate, I think the section talking about how the high notes aren't important could be clarified. The really high "notes" have already been lost when you recoded the wav file digitally. The lossy step of mp3 encoding is not a result of the transform, but what you do with that information and is more complex than just discarding high frequency components.

Also, the the word "note" is confusing in the context of music, since really low notes usually contain a lot of high frequency information.


Correct. (For the author's benefit) We usually call them harmonics, in the context of pitched sounds, but more accurately, the sinusoidal components of any sound are called its partials. Partials differ from harmonics in that harmonics are restricted to be sinusoids with frequencies that are integer multiples of a fundamental frequency. Real world musical notes don't often exactly fit this paradigm [1].

In any case, if discarding high frequency information is all you needed to do to compress, you could simply low-pass filter the time-domain signal. A better description of what goes into MP3 compression is that it omits frequency components in sound that we can't hear because they are shadowed by nearby (in time and/or frequency) components that are louder.

[1] http://en.wikipedia.org/wiki/Piano_tuning#Stretch


For what it's worth, mp3s do not store the signal as a DFT. The frequency domain data is produced as a MDCT (modified discrete cosine transform). A DFT is performed during MP3 encoding, but it's used to apply the psychoacoustic model (which is much more complex than just throwing out high frequencies) to figure out how to layout the frequency bands in the MDCT. I don't believe JPEG uses a DFT at any point, just a DCT.


I think you were kind of hand-wavy, but that's just how it has to be for this kind of article. If you're trying to explain this stuff to a general audience, you can't just jump right into talking about nyquist limits and stuff.

But maybe you could allude to the fact that there's more going on.

Anyways, I liked the article.


Yes, any digital signal has a limited bandwidth

However, MP3s throw more high frequencies than the digital signal at that given sampling rate allows.


Well, sort of. If you have a wav file and an mp3 file, both of which have a sample rate of 44kHz, they will both be able to represent the same maximal frequency of 22kHz. The mp3 wouldn't necessarily discard the high frequency information, but it may do so when it is deemed that the sound wouldn't be perceivable.


But a lot of MP3s do have a 16kHz shelf and higher frequencies are aggressively discarded.


of course, analog or discrete, without the fourier (and related) transforms, you can't even do the analysis that allows you to "throw out information".




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: