Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

OK, since you asked:

> The sound wave produced by a piano note is a simple sine wave.

No, it's not. A piano note is a complicated stack of overtones (some of which are harmonic and some of which aren't) and transients. If it was just a sine wave, it would sound like a sine wave and not like a piano.

This is part of why things like Shazam are so difficult: musical notes aren't just a single frequency in the FFT, they are a stack of them.

> You could just tell them a handful of numbers—the sizes of the different circles in the picture above.

This is actually the exact same number of numbers as in the time-domain series. Taking the FFT on its own doesn't reduce the amount of data, it's about discarding or compressing some of the frequencies after you do.

> The really high notes aren’t so important (our ears can barely hear them), so MP3s throw them out, resulting in added data compression.

This is only part of what MP3's do, and at high bitrates this really is inaudible. The main source of compression is that the precision of the numbers used to represent the amplitude of the sine waves is reduced.

When you have a loud sound and a quiet sound at the same time, the loud sound will "drown out" the quiet one (called "auditory masking"). You won't be able to hear the quiet one, or one be able to hear it precisely. MP3 and other audio codecs take advantage of that by encoding quieter frequencies with less fidelity when there are other louder frequencies at the same time. You don't notice the loss of precision since it's buried under louder sounds.

> Just as MP3s throw out the really high notes, JPEGs throw out the really tiny circles.

This is off too. If JPEG discarded high-frequency signals, you would just be blurring the entire image. It would be exactly like saving it scaled down and then scaling it back up with some smooth interpolation.

Obviously, JPEGs don't appear to be stretched out thumbnails, so that isn't what happens. Instead, it's not that high-frequency signals are discarded, it's that their precision is reduced.

Human eyes are quite good at detecting sharp edges and fine details. What they aren't good at is detecting how sharp an edge is. We can definitely see a break between two colors, but we can't accurately detect the magnitude of that the difference.

JPEG takes advantage of that by rounding off those high-frequency variances to nearby values. That means there are fewer possible values at high frequencies, so fewer bits are needed to encode them.

I realize I'm being a negative Nancy here. I really liked your post and I agree 100% on how awesome the Fourier transform is. It's also quite hard to describe it in an approachable way, and you've done an admirable job. I just get bugged when simplifications for a lay audience are actually off the mark.



>> The sound wave produced by a piano note is a simple sine wave.

> No, it's not. A piano note is a complicated stack of overtones (some of which are harmonic and some of which aren't) and transients. If it was just a sine wave, it would sound like a sine wave and not like a piano.

This is really such a fundamental* aspect of frequencies and sound, it should have been the starting point of the article. "Look how noisy and squiggly this wave is, but with FFT we can see that it really only consists of waves at multiples of the same frequency".

* no pun intended


This is actually really valuable feedback and one of the things I like about the HN audience. I struggle to strike a balance between precision and simplicity, and your comments raised many interesting subtleties that I had overlooked. So thanks for helping me get a better handle on this.


Auditory masking is more than that. The psychoacoustic model used for MP3s will discard some frequencies during others, like you said, but it will also discard "information" proceeding and following an acoustic event.

While not a perfect analog, it is similar to how vision works. If you are in a bright room and the lights go out, it takes you a little time to readjust to the darker room before you can see things again. While your irises were adjusting, the chair and take were still there, and light reflecting from those objects were still reaching your eyes, but you were unable to see them. It works the other way too. If you are still in that dark room and the lights come on, it will take your eyes a little time to readjust, less time then it took your eyes to adjust to the dark, but still your eyes are effectively discarding that information.

The psychoacoustic model in MP3s is even more bizarre. It turns out that some frequencies, when heard BEFORE another frequency, your brain will throw that first frequency away. It is unintuitive, but that has been proven in laboratory settings. Knowing how the majority of humans discard the same auditory information under different conditions, MP3s are compressed by throwing away the same information that the brain would normally discard. The Fourier transform is a significant part, but it isn't the whole story.


I love the FFT even more than you and enjoyed that it is getting lauded, but would have found more algorithmic details even more interesting. Breaking down DFT, etc. and then showing the performance magic of FFT is a great way to approach discussion of many issues in problem analysis and algorithm design.

So, Nice enough article for slipping into the topic - now give me more! harder! faster!


Sparse fast Fourier transform is even more "magical" than fast Fourier transform (FFT). If you assume that the discrete Fourier transform (DFT) has only k non zero coefficients, then, there exists an algorithm to compute it in O(k log(n)). That's right, you do not have to see the entire signal to compute the DFT, which is pretty awesome.

If you are interested, see http://groups.csail.mit.edu/netmit/sFFT/ .


I work in this field (though I'm none of the authors). If anybody has any questions about the sFFT, let me know.


> This is actually the exact same number of numbers as in the time-domain series.

In a way, it's twice as many, since they have both real and imaginary components.


Nope - the amount of numbers is in fact identical. The discrete Fourier transform has redundant information in bins above NFFT/2 (where NFFT is the size of the transform/number of samples in the signal) for purely real valued input. This means the "number of numbers" is the same - NFFT/2 bins of necessary complex values after transformation, versus NFFT real values in the original series.


Ah yeah, forgot about that. Thank you.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: