Voice analysis and

resynthesis for Psychologists

Summer 2012
Lecture 1
Essential definitions
Basic acoustics

Course schedule
Lecture 1: voice production
Lecture 2: voice structure
Workshop 1: voice analysis
Workshop 2: speech analysis
Lecture 3: Voice & Psychology
Workshop 3: voice synthesis

Language / Natural language

Course Assessment
Exercise (100%, 1000 words)
produce a spectrogram of your voice, analyse/
report values and discuss their relevance.
See instructions for more details.

Deadline: week 10 (check SD)

Open communication system that uses a set of written, gestural, or spoken
symbols that refer to people, objects or ideas.
Open-ended system of communication in which the grammatical structure allows
information of great cognitive complexity to be passed from one individual (the
speaker) to another (the listener).

Natural Language:
Spoken or signed language (as opposed to written languages, computer
programming languages). The ASL, Spanish, English & French are natural

The study of human language.

Human spoken language (as opposed to sign language).

the sounds made by a person using the vocal folds for talking,
singing, screaming or crying
The voice results from the act of phonation = the use of the
laryngeal system to generate an audible source of acoustic

Not just speech - but more generally vocal communication,

including animal vocal communication.

The term voice refers to the form and to the quality of the vocal
signal rather than to its content.

Phonetics & Phonology

Phonetics is about the physical production and
perception of speech sounds.
How vowels and consonants are produced, their acoustic structure, and
how they are perceived.
A course in Phonetics, Ladefoged 2000
Speech Physiology, Speech perception, and Acoustic Phonetics,
Lieberman & Blumstein, 1988
Principles of voice production Titze, 1994

Phonology describes the way sounds function - within a

given language or across languages. "



Bioacoustics: how animals use sound for communication

and echolocation.


How animals produce sounds, the physical structure of these sounds, how
animals perceive them, what their function is and how they evolved.

The study of subjective human perception of sounds.

The Evolution of Communication, Hauser 1996
The Principles of Animal Communication, Bradbury & Vehrencamp, 1998
Animal Signals, Maynard Smith & Harper, 2003.

the psychology of acoustical perception

Study of the relations between the sound stimuli and their auditory
perception in terms of hearing sensations.
These relationships are not simple and linear.
Different people will hear the different things when they listen to the
same sound.

Speech Physiology, Speech perception, and Acoustic Phonetics,
Lieberman & Blumstein, 1988

Introduction to Acoustics:

A sound wave is caused by an increase in pressure at a certain point which

causes a "domino effect" outward.

What is sound?"
Vibration as perceived by the sense of hearing (Wikipedia Psychoacoustics definition)"
A disturbance of the equilibrium of density (or pressure of a gas,
liquid or solid) (Titze - Physics definition)"
A local pressure disturbance in a continuous medium that contains
frequencies in the range of 20 to 20,000Hz (the audible
range) (Titze, a compromise between physics and psychophysics)"
Small variations in air pressure that occur very rapidly one after
another (Ladefoged)"

If the perturbation is repeated periodically, then it

generates a series of sound waves:

Propagation and speed of Sound


In an homogeneous medium (~ the atmosphere), sound

propagates from the source at equal speed in all three
dimensions, therefore sound waves are spherical waves."


The speed at which sound propagates depends on the type,

temperature and pressure of the medium through which it
propagates. "


The crests correspond to the high pressure points and the troughs
correspond to the low pressure points.

In dry air at 20 C the speed of sound is approximately 343m/s."

Thats approximately 1 meter every 2.9 milliseconds."
In the human vocal tract, which is more humid and warmer,"
the speed of sound is higher at 355m/s."

Waveform / Oscillogram
Sound waves can be represented as the temporal variation of sound
pressure at a fixed point in space - for example the membrane of
a microphone.
When we record a sound - we record (analogically or digitally) this
temporal variation.

Periodic Sounds
Most sounds are generated by oscillators (strings, vocal folds,
resonators, etc)
Why oscillators?

This can be represented as a waveform or oscillogram:

Therefore most natural sounds are are periodic (or quasi-periodic).

The pressure variation of a periodic sound is an oscillation with a
given period and a given amplitude.

The period of a sound wave is the the duration of
an oscillation cycle
Can be measured as the time between two peaks.

The frequency of a sound is the number of air
pressure oscillation cycles per second. It is
the multiplicative inverse (or reciprocal) of the
period: F = 1/T
T = 0.74 ms,
F = 1/0.0074 = 133Hz
0.74 ms

7.5 ms


One single oscillatory cycle per second corresponds to 1 Hz. This is not audible.
125 oscillations (the fundamental frequency in male voice), is 125HZ.
200 oscillations (the fundamental frequency in female voice), is 200Hz
2000 oscillations (some bird calls), is 2000Hz,
15000 oscillations (some bats calls), is 15000Hz etc


The wavelength of a periodical sound is the distance (in

space) between two successive crests (and is the
distance that a wave travels in the time of one
oscillatory cycle).

It is a function of the frequency of the sound and of the speed of

sound in the medium in which the sound is propagated.
The wavelength of a sound of frequency F traveling at speed c is
given by d = c/F.

For c = 343 m/s (speed of sound in the atmosphere):

a 20 kHz sound wave has a wavelength of 343/20000 = 17 mm.
a 440 Hz wave has a wavelength of 343/440 = 78 cm,
a 20 Hz (an elephant rumble) wave has a wavelength of 343/20
=17 m.

Amplitude, SPL and loudness

The intensity contour

The amplitude is the magnitude of the change in sound pressure

within the wave. It corresponds to the maximum amount of
pressure at any point in the sound wave.
It is also called Sound Pressure Level and measured in decibels, a
logarithmic (perceptual) scale.

The amplitude envelope of a sound is the

smooth curve that passes through the
peaks of the amplitude.
It is also called the intensity contour
Determines the temporal structure of
animal calls / speech.


Examples of dB levels : ambiant speech in an office/restaurant:

60dB, vacuum cleaner at 1m: 80dB, red deer roar at 1m: 104 dB,
jet aircraft at 100m: 120dB, blue whale at 1m:180dB.
Loudness is the perceptual correlate of amplitude it is a
subjective, non linear perceptual attribute of sound (varies with
people, frequency, distance)

How can we study the frequency structure

of sounds ?

Spectrums: frequency / amplitude representation. The time

dimension is removed.

Enable to visualise the distribution

of the energy (amplitude) in two
dimensions: time (s) and
frequency (Hz).


Time is on the x axis, frequency on

the y axis, and the energy is
represented by different shades
of grey.


Time (s)

Amplitude (dB)


Time (s)




Complex sounds

Simple sound-waves: pure tones

Animals, humans and most

musical instruments usually
generate periodic sounds
which have energy at more
than one frequency.
These sounds are called
complex sounds

Pure tones are single frequency tones with no harmonic content (no
overtones). This corresponds to a sine wave.

Frequency (kHz)

1.5 kHz

Frequency (kHz)


Time (s)


Time (s)


These sounds are composed of

more than one pure tone
(more than one sinusoidal
Red deer roar, herring gull call.

Examples of pure tones: whistles, scops owl hoots, most electronic beeps.

Fundamental frequency and harmonics in

complex periodic sounds
Typical vocal sounds are composed of several sinusoidal waves which appear
on spectrograms as evenly spaced, parallel, narrow frequency components.

The lowest of these parallel

frequency components is called
the fundamental frequency

The harmonics are integer

multiples of the fundamental
frequency: H1 = 2F0, H2 = 3F0,
The fundamental frequency
determines the pitch of the tone
(how high or low it is perceived
to be).


The variation of F0 with time

determines the fundamental
frequency contour. In speech it
affects the intonation.

Time (s)

The pitch
What is the Pitch of a voice?
The pitch is the perceived height of a voice (Titze)
It is mainly determined by the fundamental frequency of the sound.



very low
(early morning)


White noise:

Pitch goes up to 1.4 kHz (whistle register - female singers only).

The distribution of the energy is not

uniform across frequencies:

The peaks and valleys represent the

resonances that take place in the
cavities of the vocal tract.
Called formants (in latin formare
= to shape) because they shape
the spectral structure of the
speech signal.

Formants are central to human

speech as they provide the
acoustic variation at the basis of
vowels and consonants.
(see next lectures!).

- spectral envelope
formants ?


Time (s)


How vertebrates make sounds

Anurans use their

larynx, they often use
two sets of folds (AM).

Birds use their syrinx,

located at the base of
the trachea.

- intensity contour
- periodicity
pure tone?
F0+harmonics ?
F0 contour ?
noise ?


A vocal sound can be
described in terms of


The spectral envelope: resonance

frequencies (formants)

Noise is sound that is made of aperiodic series of waves, corresponding

to irregular and disordered vibrations that include all possible
frequencies (e.g. waves breaking on shore, wind)
Can play a role in speech: e.g. whisper - see next lecture.

Time (s)





Mammals use their

larynx, at the top of the

Time (s)


From Fitch & Hauser 2002

What is the vocal apparatus

The two functional components: the

source and the filter
Speech (and most mammal) sounds result from a twostage process:

The lungs (generate

the power)
The trachea
The larynx
The supralaryngeal
vocal tract:

- a periodic wave (called the glottal wave) is generated

in the larynx (= the source). Its fundamental frequency
determines the pitch of the voice.
- this wave is then filtered in the supralaryngeal
cavities of the vocal tract (= the filter), creating broad
bands of energy called vocal tract resonances or

the pharynx
The mouth
The nasal cavity

Defined by Fant, G. (1960). Acoustic Theory of Speech Production.

From Titze, 1994

Illustration of the source filter theory

Glottal wave

Speaker with an
anesthetized vocal

Speaker with a
normal vocal tract

Glottal wave
filtered by a
uniform tube

Glottal wave
filtered by a non
uniform, changing
vocal tract

