Download as pdf or txt
Download as pdf or txt
You are on page 1of 64

Sound and Hearing

Norden E. Huang
Research Center for Adaptive Data Analysis
Center for Dynamical Biomarkers and Translational Medicine
National Central University
Zhongli, Taiwan, ROC
All sound, no matter how complex, can be
mathematically broken down into constituent
sine waves. Whether you know it or not.
Helmholtz 1863


Are we really hearing in terms of

sinusoidal waves?
Surrogate Signal

For example, Delta Function and White Noise

having the same White Fourier Spectrum.
The original data : Hello
The surrogate data : Hello
The Fourier Spectra : Hello
The original data : Hello
Uncertainty Principle
Time-frequency trade-off

Uncertainty Principle: t 1 2

Therefore, for 10 Hz; t 50m sec .
2 10 Hz
Speech Analysis
Hello : Data
Pitch perception
A trained ear, under optimal condition, can have
a smallest Just Noticeable differences (JNDs)
threshold (71% correct) at 0.2%.

At 1000 Hz, the time needed to integrate for

Fourier analysis would be
t 0.25 sec
2 x2Hz
But at 100Hz, the time would be 2.5 sec. That is
too long.
Missing Fundamental

Certainly not strictly frequency determined

Data: Missing Fundamental
10 Harmonics
10 7 Harmonics
Signal Magnitude



0.2 0.25 0.3 0.35 0.4
Time : sec
Spectra Missing Fundamental: 10 and 7 Harmonics
10 Harmonics
7 Harmonics
Spectral Density




10 0 1 2
10 10 10
Frequency: Hz
IMF Cosine 10 Harmonics





0 500 1000 1500 2000 2500 3000 3500 4000

Time : sec*4096 sec
IMF Cosine 7 Harmonics





0 500 1000 1500 2000 2500 3000 3500 4000

Time : sec*4096 sec
Holo-Hilbert Spectrum: Cosine 10 Harmonics

FM: Frequency (Hz)





50 100 150 200 250

Frequency (Hz)
Holo-Hilbert Spectrum: Cosine 7 Harmonics (4-10)
FM: Frequency (Hz)

150 1.6
100 1.2
50 100 150 200 250
Frequency (Hz)
Without the first three harmonics, the
Holo-Hilbert spectra are almost identical.

The two sounds should be the same.

Periodicity of Sounds and of
Envelope Cues: which frequency, instantaneous
frequency or the modulating frequency, is more
Envelope Cue
Pitch is determined in most cases by the
periodicity of the sound waveform.

However, some sounds have other, more

subtle periodicities evoked by the
envelope. In some cases, these
periodicities may determine the pitch, but
in other cases they don't.
Mixed Harmonics: Cosine(odd) Sine(even)

10 Harmonics
Cosine(even) Sine(odd)
Signal Magnitude


0.2 0.25 0.3 0.35 0.4
Time : sec
IMF Mixed Cosine(Odd) Sine(even)





0 1000 2000 3000 4000

Time : sec*4096 sec
Holo-Hilbert Spectrum: Cosine(odd) Sine(even)
200 2
FM: Frequency (Hz)

150 1.6
50 100 150 200 250
Frequency (Hz)
Though the Frequency of the sound is still
similar to the pure cosine harmonics, the
envelope (AM) frequency is doubled.

Therefore, the basic pitch of the sound

should not change much, but the envelop
periodicity and sound are different from
the pure cosine harmonics.
Clicking Sounds

Based on Fourier analysis, clicking sounds

should be broad band.
Click sounds
According to Fourier analysis, clicks are broad band
sounds; therefore, clicks should excite all parts of the
basilar membrane, but they do not excite all parts at
Each point on the basilar membrane has its own
"impulse response", and each will resonate at its own
characteristic frequency. The response of the high
frequency part of the basilar membrane starts earlier and
dies away quicker than that of the low frequency part.
This allows the basilar membrane to carry out
a multiresolution analysis.
If indeed this is so, the click sound should
disintegrate by itself.
Click sounds
This animation may help illustrate the important concept of resolved
and unresolved harmonics.
Note that both the 500 Hz and the 1000 Hz points of the basilar
membrane respond with regular vibrations, while the region in
between (around ca 700 Hz) appears to be more or less at rest.
Place along the basilar membrane is therefore said to "resolve" the
first two harmonics (500 and 1000 Hz) of the click train stimulus, as
each produces a distinct region of high amplitude vibration, clearly
separated by regions of low vibration amplitude.
However, the points on the basilar membrane tuned to higher
harmonics (say 2000, 2500 or 3000 Hz...) do not show distinctly
separate peaks in their vibration amplitude, i.e. these higher
harmonics are "not resolved".
Why not? How can a membrane vibrating according to
Fourier harmonics?
Sound perceptions

From objective to subjective views

Auditable sound properties are mostly described
in the following terms:

Scientifically, it should all be in terms of

Modulations both AM and FM
Properties of Sounds


An auditory sensation that can be ordered on a

(musical) scale extending from high to low.

Pitch of any sound can be describe in terms of

single tone with the frequency of a sinusoidal
wave giving the same auditory sensation.

Pitch only exists in auditable frequency range.

Pitch Formation
The main determinant of pitch is sound periodicity (or
frequency). A sound is periodic when it is composed of
consecutive repetitions of a single short segment (the
Only a small number of repetitions of a period are
required to generate the perception of pitch. A single
repeat results in no pitch sensation at all - there is
nothing periodic. With eight repeats (8), a clear pitch
can be heard.
There are however examples of sounds that evoke pitch
without being strictly, but nearly, periodic.
Pitch Formation
There is a lot of confusion
concern the pitch.

I think that all the problems arise from the frequency

definition based on Fourier Analysis.

It makes people to conclude that pitch is a subjective

perception rather than an objective quantity.
A sound wave capable of exciting an auditory sensation
having pitch or sound sensation having pitch.

Tone has to have repetitions patterns. It is used again

for auditable sounds.

Tones of standard musical instruments are poorly

simulated by summation of steady component frequency,
since a synthesis cannot produce the dynamic variation
with time characteristic of the instrument.
Tone formation
Regular click trains at a rate of less than about 40 Hz
sound like individual regular events, a bit like machine-
gun fire. Click trains with rates faster than about 40 Hz
merge into a continuous "buzz: the faster the rate, the
higher the pitch.
Examples: The first example consists of a click train with
a rate of every 3 seconds. During the first 3 seconds, the
rate of the clicks is about 10.7 Hz, then it increases to
21.4 Hz, 43 Hz, 86 Hz, 172 Hz, 344 Hz, 689 Hz, 1378
Hz, 2756 Hz, and 5512 Hz. Clear pitch emerges
between 43 Hz and 86 Hz, although at 86 Hz there is still
'flutter', and full smoothness of the resulting percept
occurs only at rates of a few hundreds Hz.
Timbre enable us to differentiate sounds of the same
pitch and intensity but different quality. Timbre enable us
to differentiate piano from horn or violin. It reflects their
harmonic compositions.

Timbre is multi-dimensional; there is no single scale to

define it. Timbre describes the nonlinear distortion of the

There is also time elements. For example, recognition of

instruments depends on the onset transients. That is
also the reason why we have poor synthesized tones of
musical instruments (or even speech) by summation of
steady sinusoidal waves.
Magnitude or Loudness.
Intensity is measured by dB SPL:
x2 x
y dB 10 log10 2 20 log 10
x x
ref ref

O dB is the minimum sound audible. 120 dB is a

pretty loud verging on causing damages. It is
10^12 louder than the barely audible.
Total power, however, is still very small on the
ear drum: 0.05 mW at 120 dB (snail pace: 1.5
mm/s; at 0 dB, mm/2000 yrs).
Organs for hearing

The ears and the nerve systems

Translate the mechanical energy of sound
waves to electric signals for the brain
The Ear
Osseous labyrinth
Interior of the Osseous labyrinth
From Wikipedia, the free encyclopedia
Schematic of basilar membrane
Basilar membrane model
BM dynamics model : Low frequency
BM dynamics model : Midium frequency
BM dynamics model : High frequency
Endocochlear Potential

In 1952, von Bekesy determined that the

scala vestibuli has a potential of +5 mV
with respect to the scala tympani and the
scala media has a relatively large positive
potential of 80 mV. This positive 80-mV
potential is known as the endocochlear
potential, and it serves as the major
driving force for signal transduction.
Bekesys Nobel lecture 1961

To my surprise, although the travelling waves ran along

the whole length of the membrane with almost the same
amplitude, and only a quite flat maximum at one spot,
the sensations along my arm were completely different.

My surprise was even greater when it turned out that two

cycles of sinusoidal vibration are enough to produce a
sharply localized sensation on the skin, just as sharp as
for continuous stimulation. This was in complete
agreement with the observations of Savart, who found
that two cycles of tone provide enough cue to determine
the pitch of the tone.
Sound signals from mechanical
energy to electric pulses

Cochlear and auditory nerves

Hair Cells
Types of Hair Celles
Hair Cells

Together, the cochlear microphonic, the

summating potential, and the compound
action potential constitute the potentials
recorded in the form of evoked potential
audiometry known as ectrocochleography.
Bekesy Place Theory : passive

Classic studies, such as those by von Bekesy in

1960, were performed to investigate cadaveric
specimens. They revealed that the amplitude of
a sine wave traveling along the basilar
membrane increased until it reached a
maximum and then abruptly declined.
Physical properties of the basilar membrane,
related to changes in its stiffness and mass
along its length, can account for these passive-
tuning properties.
Georg von Bekesys cochlea model
Bekesy Place Theory : active

later studies revealed that, in vivo, a gradual rise

in amplitude of the wave does not occur as it
travels to its point of maximal amplitude. Rather,
the wave travels along the basilar membrane,
causing minimal in placement until it reaches the
site of the membrane that is maximally sensitive
to a stimulus of that particular frequency. At this
site, the basilar membrane vibrates at the
frequency of the stimulus, with the vibration
abruptly declining in the region apical to the
position of maximal sensitivity
Bekesy Place Theory : active

The basilar membrane behaves as a finely

tuned mechanical band-pass filter, with each
location along its length responding to a specific
or characteristic frequency. This fine-tuning
mechanism is dependent on active, or energy
dependent, processes and therefore not evident
in cadaveric studies. The outer hair cells appear
to mediate (amplifying) the active processes that
create the fine-tuning properties of the cochlea.
Their role is further delineated in the discussion
of the differential role of the inner and outer hair
cell in the sensorineural transduction process.
A lot of unanswered questions
How exactly does the cochlea translate
the mechanical energy of the sound waves
to electric signal for the brain?

Is Fourier expansion necessary? Or

should it be instantaneous frequency?

How important are the Amplitude


You might also like