Ear Does Fourier Analysis

Perhaps most importantly, from the point of view of computer music research, is that the human

ear is a kind of spectrum analyzer. That is, the cochlea of the inner ear physically splits sound
into its (quasi) sinusoidal components. This is accomplished by the basilar membrane in the
inner ear: a sound wave injected at the oval window (which is connected via the bones of the
middle ear to the ear drum), travels along the basilar membrane inside the coiled cochlea. The
membrane starts out thick and stiff, and gradually becomes thinner and more compliant toward
its apex (the helicotrema). A stiff membrane has a high resonance frequency while a thin,
compliant membrane has a low resonance frequency (assuming comparable mass per unit length,
or at least less of a difference in mass than in compliance). Thus, as the sound wave travels, each
frequency in the sound resonates at a particular place along the basilar membrane. The highest
audible frequencies resonate right at the entrance, while the lowest frequencies travel the farthest
and resonate near the helicotrema. The membrane resonance effectively ``shorts out'' the signal
energy at the resonant frequency, and it travels no further. Along the basilar membrane there are
hair cells which ``feel'' the resonant vibration and transmit an increased firing rate along the
auditory nerve to the brain. Thus, the ear is very literally a Fourier analyzer for sound, albeit
nonlinear and using ``analysis'' parameters that are difficult to match exactly. Nevertheless, by
looking at spectra (which display the amount of each sinusoidal frequency present in a sound),
we are looking at a representation much more like what the brain receives when we hear.(


Fast Fourier Transform

Murray Bourne

1. Digital Audio


Pulse code modulation (PCM) is the most common type of digital audio recording, used to
make compact disks and WAV files.
In PCM recording hardware, a microphone converts sound waves into a varying voltage. Then an
analog-to-digital converter samples the voltage at regular intervals of time. For example, in a
compact disc audio recording, there are 44100 samples taken every second.

The data that results from a PCM recording is a function of time. How does this work?

Imagine that you were very small and could fit into your friend's ear drum. Suppose also that you
could see things in very slow motion and that you could record the position of the ear drum once
every 44100th of a second. Your eyes are so good that you can notice 65536 distinct positions of
the ear drum's surface as it moves back and forth in response to incoming sound waves.

If your friend is listening to the sound of a flute, and you write down the positions of the ear
drum that you notice, then you would have a digital PCM recording - a series of numbers.

If you could later make your own ear drum move back and forth in accordance with the
thousands of numbers you had written down, you would hear the flute exactly as it originally
sounded. We have gone from:

rich sound with fundamentals and harmonics

→\displaystyle\rightarrow→ numbers
→\displaystyle\rightarrow→ rich sound with fundamentals and harmonics

To be able to convert from the series of numbers to sound, we need to apply the Fourier

2. Frequency Information as a Function of Time


One analogy for the type of thing a Fourier Transform does is a prism which splits white light
into a spectrum of colors.

White light consists of all visible frequencies (red, orange, yellow, green, blue, indigo and violet)
mixed together (much like the information on a CD has sounds of all frequencies mixed
together) and the prism breaks them apart so we can see the separate frequencies (much like the
CD player splits apart the sound frequencies so they can be amplified and sent to the speakers).
White light is split into individual frequencies by a prism


In our inner ears, the cochlea enables us to hear subtle differences in the sounds coming to our
ears. The cochlea consists of a spiral of tissue filled with liquid and thousands of tiny hairs which
gradually get smaller from the outside of the spiral to the inside. Each hair is connected to a
nerve which feeds into the auditory nerve bundle going to the brain. The longer hairs resonate
with lower frequency sounds, and the shorter hairs with higher frequencies. Thus the cochlea
serves to transform the air pressure signal experienced by the ear drum into frequency
information which can be interpreted by the brain as tonality and texture.

The cochlea transforms sound waves into electrical signals.

The Fourier Transform is a mathematical technique for doing a similar thing - resolving any
time-domain function into a frequency spectrum. The Fast Fourier Transform is a method for
doing this process very efficiently.
3. The Fourier Transform
As we saw earlier in this chapter, the Fourier Transform is based on the discovery that it is
possible to take any periodic function of time f(t) and resolve it into an equivalent infinite
summation of sine waves and cosine waves with frequencies that start at 0 and increase in

The job of a Fourier Transform is to figure out all the an and bn values to produce a Fourier
Series, given the base frequency and the function f(t).

In our CD example, which has a sampling rate of 44100 samples/second, if the length of our
recording is 1024 samples, then the amount of time represented by the recording is

If you process these 1024 samples with the FFT (Fast Fourier Transform), the output will be the
sine and cosine coefficients an and bn for the frequencies

Let's say that we use the FFT to process a series of numbers on a CD, into a sound.

So the Fourier Series would be:

We have reconstructed a sound wave from the digital data fed from the CD into the sound
system of the CD player.


Hearing Lecture notes (1): Introductory Hearing
1. What is hearing for ?
2. Waveforms and Frequency Analysis
3. Why does the auditory system analyse sound by frequency ?
4. Sine waves
5. Complex periodic sounds
6. Linearity
7. Filters
8. Resonance
9. What you should know

1. What is hearing for ?

* (i) Indicate direction of sound sources (better than eyes since omni-directional, no eye-lids; but poorer
resolution of direction).

* (ii) Recognise the identity and content of a sound source (such as speech or music or a car).

* (iii) Give information on the nature of the environment via echoes, reverberation (normal room,
cathedral, open field).

2. Waveforms and Frequency Analysis

Sound is a change in the pressure of the air. The waveform of any sound shows how the pressure
changes over time. The eardrum moves in response to changes in pressure.

Any waveform shape can be produced by adding together sine waves of appropriate frequencies and
amplitudes. The amplitudes (and phases) of the sine waves give the spectrum of the sound. The
spectrum of a sine wave is a single point at the frequency of the sine wave. The spectrum of white noise
is a line covering all frequencies.

The cochlea breaks the waveform at the ear down into its component sine waves - frequency analysis.
Hair cells in the cochlea respond to these component frequencies. This process of frequency analysis is
impaired in sensori-neural hearing loss. It cannot be compensated for by a conventional hearing aid.

3. Why does the auditory system analyse sound by frequency

Some animals do not analyse sound by frequency, but simply transmit the pressure waveform at the ear
directly. We could do this by having hair cells on the eardrum. But instead we have an elaborate system
to analyse sound into its frequency components. We do this because, since almost all sounds are
structured in frequency, we can detect them, especially in the presence of other sounds, more easily by
"looking" at the spectrum than at the waveform.

In the six panels below, the left-hand column shows plots of the waveform of a sound - the way that
pressure changes over time. The right-hand column shows the spectrum of the sound - how much of
each sine-wave you have to add together in order to make that particular waveform.

The upper panel is a sine wave tone with a frequency of 1000 Hz. A sine wave has energy at just one
frequency, so the spectrum is just one point.

waveform ----------------------------------spectrum

The middle panel is white noise (like the sound of a waterfall). White noise has equal energy at
all frequencies, so the spectrum is a horizontal line.

The lower panel is the sine tone added to the noise. The spectrum of the sum is just the sum of
the spectra of the two components.

Notice that you can see the tone very easily in the spectrum, but it is completely obscured by the
noise in the waveform.
Click on the icon to hear Noise then Noise+Tone twice

4. Sine waves
A sine wave has three properties which appear in the basic equation:

p(t) = a* sin(2 pi ft +phase)

(i) frequency (f) - measured in Hertz (Hz), cycles per second.

Click on the icon to hear a 500 Hz , a 1000 Hz and a 4000 Hz sine wave

(ii) amplitude (a) - is a measure of the pressure change of a sound. It is usually measured in decibels (dB)
relative to another sound; the dB scale is a logarithmic scale : if we have two sounds p1 and p2, then p1
is 20*log10(p1/p2) dB greater than p2. Doubling pressure (amplitude) gives on increase of 6dB: 20 *
log10(2/1) = 20 * 0.3 = 6.

Amplitude squared is proportional to the energy, or level, or intensity (I) of a sound. The decibel
difference between two sounds can also be expressed in terms of intensity changes: 10*log10(I1/I2).
Doubling intensity gives an increase of 3dB (10 * 0.3). The just noticeable difference (jnd) in intensity
between two sounds is about 1dB.

(iii) phase (phase) - measured in degrees or radians, indicates the relative time of a wave.

The sine wave shown above has an amplitude of 1, a frequency of 1000 Hz, and it starts in zero sine
phase, phase = 0.

5. Complex periodic sounds

A sound which has more than one (sine-wave) frequency component is a complex sound. A periodic
sound is one which repeats itself at regular intervals. A sine wave is a simple periodic sound. Musical
instruments or the voice produce complex periodic sounds. They have a spectrum consisting of a series
of harmonics. Each harmonic is a sine wave that has a frequency that is an integer multiple of the
fundamental frequency.

For example, the note 'A' played by the oboe to tune the orchestra has a fundamental frequency of 440
Hz, giving harmonics at 440, 880, 1320, 1760, 2200, 2640, etc. If the oboe played a higher pitch, the
fundamental frequency (and so all the harmonic frequencies of the note would be higher. The period of
a complex sound is 1/fundamental frequency (in this case 1/440 = 0.0023s = 2.3ms). A different
instrument, with a different timbre, playing the same pitch as the oboe, would have harmonics at the
same frequencies, but the harmonics would have different relative amplitudes. The overall timbre of a
natural instrument is partly deptermined by the relative amplitudes of the harmonics, but the attack of
the note is also important. Different harmonics start at different times in different instruments, and the
rate at which they start also differs markedly across instruments. Cheap synthesisers cannot imitate the
attack, and so they do not make very lifelike sounds. Expensive synthesisers (like Yamaha's Clavinova)
store the whole note including the attack and so sound very realistic.

Here is one second of the waveform and also the spectrum of a complex periodic sound consisting of the
first four harmonics of a fundamental of 100 Hz. Notice that there are 100 cycles of the waveform in 1s,
and all the frequency components are integer multiples of 100 Hz.

Here is a sound with the same period, but a different timbre. Notice that the waveform has a different
shape, but the same period. The change in timbre is produced by making the higher harmonics lower in
We can also change the shape of the waveform by changing the relative phase of the different
frequencies. In this example four components were all in sine phase, in the next example the odd
harmonics are in sine phase and the even in cosine phase. This change produces very little change in

Click on the icon to hear these three sounds in order

6. Linearity
Most studies of the auditory system have used sine waves. If we know how a system responds to sine
waves, then we can predict exactly how it will behave to complex waves (which are made up of sine
waves), provided that the system is linear.

* The output of a linear system to the sum of two inputs, is equal to the sum of its outputs to the two
inputs separately.

* Equivalently, if you double the input to a linear system, then you double the output.

* A linear system can only output frequencies that are present on the input, non-linear systems always
add extra frequency components.

The filters we describe below are linear. The auditory system is only linear to a first approximation.

7. Filters
A filter lets through some frequencies but not others. A treble control acts as a low-pass filter, letting
less of the high frequencies through as you turn the treble down. A bass control acts as a high-pass filter,
letting less of the low frequencies through as you turn the bass down. A band-pass filter only lets
through frequencies that fall within some range. A slider on a graphic equalizer controls the output level
of a band-pass filter. In analysing sound into its frequency components, the ear acts like a set of band-
pass filters.

We can represent the action of a filter with a diagram like a spectrum which shows by how much each
frequency is attenuated (or reduced in amplitude) when it passes through the filter.

Input sound


Output sound

Click on the icon to hear the unfiltered and the filtered sounds
8. Resonance
A resonant system acts like a band-pass filter, responding to a narrow range of frequencies. Examples
are: a tuning fork, a string of a harp or piano, a swing. Helmholtz was almost right in thinking that the
ear consisted of a series of resonators - like a grand-piano with the sustaining pedal held down. Here is
what happens when a complex sound is passed through a sharply- tuned band-pass filter. Notice that a
complex wave goes in, but a sine wave comes out. Each part of the basilar membrane acts like a band-
pass filter tuned to a different frequency.

Input sound


Output sound
Click on the icon to hear the unfiltered and the filtered sounds twice

What you should know.

 You should understand the meaning of all the terms shown in italics.

 You should also be able to explain all the diagrams in this handout.

If you do not understand any of the terms or diagrams, first try asking someone else in the class who you
think might.

If you still don't, then ask me either in a lecture, after a lecture or in my office.

Chris Darwin

