Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

Class web page

HCS 7367
Speech Perception Lab

http://www.utdallas.edu/~assmann/hcs7367/
Course information
Lab details

Dr. Peter Assmann


Fall 2013

Speech demos
Matlab programs used for class assignments
Additional resources

Matlab Background
Kermit Sigmon, MATLAB Primer 2nd Edition.
http://www.fi.uib.no/Fysisk/Teori/KURS/WRK/mat/singlemat.html

Praat : doing phonetics by computer

Download Praat:

Praat tutorial:

http://www.fon.hum.uva.nl/praat/
http://www.fon.hum.uva.nl/praat/manual/Intro.html

Getting started with Matlab (The MathWorks):


http://www.mathworks.com/help/techdoc/learn_matlab/bqr_2pl.html

UTD IR Matlab and Simulink: Resources for Getting Started


http://www.utdallas.edu/ir/how-to/ml_help/index.html

Wavesurfer

Download Wavesurfer:

Wavesurfer User Manual

Starting with Matlab

www.speech.kth.se/wavesurfer
www.speech.kth.se/wavesurfer/man.html

Interactive MATLAB Tutorial


http://www.mathworks.com/help/techdoc/learn_matlab/f011759.html
http://www.mathworks.com/academia/student_center/tutorials/ml_onramp/player.html?slide=1

Start Matlab

doc Matlab

Click on Getting Started

This launches a video in your browser

Dates for lab assignments

Term project: important dates

Lab assignment 1: Sept 19

Sept 5: Submit project topics

Lab assignment 2: Oct 10

Sep 26: Turn in project outline

Lab assignment 3: Oct 31

Oct 3: Preliminary project presentations

Lab assignment 4: Nov 21

Nov 14/21: Oral presentations

3 page reports (with figures) on lab projects

Dec 12: Final project paper due

Examples of topics

Initial stages

Acoustic analysis and intelligibility of childrens speech

Identify a topic area and read the relevant papers

Neural network models of vowel recognition

Refine your topic; choose a manageable problem

Simulating distortions introduced by hearing loss

Set specific goals and define evaluation metric

Noise reduction algorithms for hearing aid processors


Production and perception of foreign accents
Contribution of prosody to connected speech intelligibility

Identify the approach to solve the problem


Start right away.

Effects of noise, reverberation on speech communication


Monaural vs. binaural speech understanding in noise
Development of speech perception in infants
Models of speech coding in the auditory cortex

Finding papers
PubMed search engine:
http://www.ncbi.nlm.nih.gov/entrez/

Finding papers
PubMed search engine
http://www.ncbi.nlm.nih.gov/entrez/
Find more
papers

Find free fulltext articles

Finding papers

Fundamental frequency (F0)

Journal of the Acoustical Society of America:


http://scitation.aip.org/jasa/

Fundamental frequency (F0) is determined by the


rate of vocal fold vibration, and is responsible for the
perceived voice pitch.

Harmonicity and Periodicity


Audio demo: the source signal

Source signal for an adult male voice


Source signal for an adult female voice
Source signal for a 10-year child

Period: regularly repeating pattern in


the waveform Period duration T = 6 ms

Waveform

F0 = 1000 / 6 = 166 Hz

Amplitude (dB)

20

Harmonics are
integer multiples
of F0 and are
evenly spaced in
frequency

F0 = 1 / T0

0
-20

Amplitude
Spectrum

-40
0

0.5

1.5

2.5

Frequency (kHz)

Source properties

In voiced sounds the glottal source spectrum contains


a series of lines called harmonics.
The lowest one is called the fundamental frequency
(F0).
F0
0

Amplitude
Spectrum

Relative Amplitude (dB)

-10
-20
-30
-40
-50
0

200

400

600

800

1000

Frequency (Hz)

Filter properties

The vocal tract resonances (called formants)


produce peaks in the spectrum envelope.
Formants are labeled F1, F2, F3, ... in order of
increasing frequency.
F1

Demo: harmonic synthesis

F2

F3

Amplitude
Spectrum
(with superimposed
LPC spectral envelope)

Amplitude in dB

F4

-10

Additive harmonic synthesis: vowel /i/


Cumulative sum of harmonics: vowel /i/
Additive synthesis: wheel
(.wav)
Cumulative sum of partials:

(.wav)

(.wav)

(.wav)

-20
-30
-40
-50
0

2
3
Frequency (kHz)

Uniform tube model (schwa)

Vocal tract properties

Resonating tube model

approximation for neutral vowel (schwa), []

closed at one end (glottis); open at the other (lips)


uniform cross-sectional area
curvature is relatively unimportant

Glottis

//

Lips

length, L

American English vowel space


F1

front
i heed
high

center

F2
back
u whod

hid

hood

e hayed
mid
head

o hoed

low
had

schwa

hawed

hut
hod

Height

Second formant, F2 frequency (Hz)

Advancement

Acoustic vowel space


3000
i heed

2000

1000
hod

u whod
0

200

400

600

800

1000

First formant, F1 frequency (Hz)

Vocal tract model

Quarter-wave resonator:

Vocal tract model

Quarter-wave resonator:

Fn = ( 2n 1 ) c / 4 L

Fn = ( 2n 1 ) c / 4 L

Fn is the frequency of formant n in Hz

F1 = (2(1) 1)*35000/(4*17.5) = 500 Hz

c is the velocity of sound in air (about 35000 cm/sec)

F2 = (2(2) 1)*35000/(4*17.5) = 1500 Hz

L is the length of the vocal tract (17.5 for adult male)

F3 = (2(3) 1)*35000/(4*17.5) = 2500 Hz

Helium speech

The speed of sound in a helium/oxygen mixture


at 20C is about 93000 cm/s, compared to
35000 cm/s in air. This increases the resonance
frequencies but has relatively little effect on F0.
In helium speech, the formants are shifted up
but the pitch stays the same.

Note that the


vowel //
(schwa ) has
formants at
odd multiples
of F1

Helium speech

Using Matlab as a calculator, find the


frequencies of F1, F2 and F3 for a 17.5 cm
vocal tract producing the vowel // in a
helium/air mixture (velocity c 93000 cm/s)
Fn = ( 2n 1 ) c / 4 L

F1 = (2*(1) - 1)*93000/(4*17.5) = 1329

F2 = (2*(2) - 1)*93000 /(4*17.5) = 3986

F3 = (2*(3) - 1)*93000 /(4*17.5) = 6643

Speech in air

Helium speech
Audio demos

Speech in air

Speech in helium

Pitch in air

Pitch in helium

3
Frequency (kHz)

http://phys.unsw.edu.au/phys_about/PHYSICS!/SPEECH_HELIUM/speech.html

100

200

300

400 500 600


Time (ms)

700

800

900

http://phys.unsw.edu.au/phys_about/PHYSICS!/SPEECH_HELIUM/speech.html

Vocal tract model

Speech in helium

Frequency (kHz)

Quarter-wave resonator:
Fn = ( 2n 1 ) c / 4 l

where Fn is the frequency of formant n


c is the velocity of sound (about 35000 cm/sec)
l is the vocal tract tube length (17.5 cm for adult male)

100

200

300

400
500
Time (ms)

600

700

800

http://phys.unsw.edu.au/phys_about/PHYSICS!/SPEECH_HELIUM/speech.html

Perturbation Theory

Perturbation Theory

The first formant (F1) frequency is lowered by


a constriction in the front half of the vocal tract
(/u/ and /i/), and raised when the constriction is
in the back of the vocal tract, as in //.

delta
F1
glottis

delta
F2
lips

Perturbation Theory

glottis

lips

Perturbation Theory

The third formant (F3) is lowered by a


constriction at the lips or at the back of the
mouth or in the upper pharynx. This occurs in
/r/ and /r/-colored vowels like American
English / / (as in herd).

F3 is raised when the constriction is behind


the lips and teeth or near the upper pharynx.

delta
F3

delta
F3
glottis

The second formant (F2) is lowered by a


constriction near the lips or just above the
pharynx; in /u/ both of these regions are
constricted. F2 is raised when the constriction is
behind the lips and teeth, as in the vowel /i/.

glottis

lips

lips

Perturbation Theory

Perturbation Theory

All formants tend to drop in frequency when


the vocal tract length is increased or when a
constriction is formed at the lips.

F1 frequency is correlated with jaw


opening (and inversely related to tongue
height ).
Amplitude in dB

0
-10
-20
-30

amplitude
spectrum

-40

glottis

-50
0

lips

2
3
Frequency (kHz)

Perturbation Theory

F2 frequency is correlated with tongue


advancement (front-back dimension)
Amplitude in dB

0
-10
-20
-30

amplitude
spectrum

-40
-50
0

2
3
Frequency (kHz)

Spectral analysis

Amplitude spectrum: sound pressure levels


associated with different frequency
components of a signal

Power or intensity
Amplitude or magnitude
Log units and decibels (dB)

Phase spectrum: relative phases associated


with different frequency components

Degrees or radians

Spectral analysis of speech

Why perform a frequency analyses of speech?

Ear+brain carry out a form of frequency analysis

Relevant features of speech are more readily visible


in the amplitude spectrum than in the raw waveform

Spectral analysis of speech

But: the ear is not a spectrum analyzer.

Auditory frequency selectivity is best at low


frequencies and gets progressively worse at higher
frequencies.

Short-term amplitude spectrum

F1 = 281 Hz
F2 = 2196 Hz
F3 = 2755 Hz

60
50
Amplitude (dB)

40
30
20
10
0
-10
0

2
Frequency (kHz)

Speech spectrograms

What is a speech spectrogram?

Display of amplitude spectrum at successive


instants in time ("running spectra")
How can 3 dimensions be represented on a twodimensional display?

Gray-scale spectrogram
Waterfall plots
Animation

Speech spectrograms

Why are speech spectrograms useful?

Shows dynamic properties of speech


Incorporates frequency analysis
Related to speech production
Helps to visually identify speech cues

The watchdog
waveform
spectrogram

Frequency (kHz)

F3
F2

F1

Digital representations of signals

Digital representations of signals

Sampling frequency (e.g. 44.1 kHz)

amplitude

Quantization rate (16 bits)

time

sampling

16 bits =216 quantization steps


Effects of discrete-level quantization on
dynamic range

quantization

In-class assignment

Nyquist frequency
Effects of discrete-time sampling on bandwidth

Use the wavesurfer program on your


laptop as a sound recording device.
Left-click red button to record the vowels
ee, ah and oo, in quick sequence.

In-class assignment

Use wavesurfer to make a spectrogram of


the vowels. Right click on waveform plot
to add spectrogram + formant tracks.

In-class assignment

Left-click and drag mouse to select the


desired region in the signal. Then rightclick and select Statistics.

In-class assignment

For this vowel (ee) the estimated


formant frequencies are F1=174 Hz,
F2=1849 Hz, F3=2492 Hz, F4=3392 Hz.

In-class assignment

This will display the formant frequencies


(mean and standard deviation across
n=13 frames, in this example).

In-class assignment

Now measure the formants in your


productions of the vowels ee, ah, oo.
Make a table of F1, F2 and F3 frequencies.
i heed

Save the waveform as vowels.wav


Load into Praat and Matlab and repeat the
assignment (instructions to follow).

u whod

F1

F1

F1

F2

F2

F2

F3

F3

F3

Vector representation of speech

In-class assignment

hod

In Matlab speech signals are represented as row or


column vectors (e.g., N rows x 1 columns, where N
is the number of samples in the waveform).
>> [y,fs]=wavread(wheel.wav); % load waveform
>> size( y )
ans =
3200 1

The variable y has 3200 rows x 1 column (row vector).

The variable fs has 1 row x 1 column (scalar).

10

Spectral analysis in Matlab

Vector representation of speech


Load the waveform and plot it:
>> [y,fs]=wavread(wheel.wav); % load waveform
>> t=(1:length(y) ) ./ (fs/1000); % set up time axis
>> plot( t, y );
% use plot command
>> axis( [ 0 400 -1 1 ] ); % set axis limits
>> xlabel('Time (ms)');
% x-axis label
>> ylabel('Amplitude');
% y-axis label
>> title('Waveform plot'); % axis title

FFT Discrete Fourier transform.


FFT(X) is the discrete Fourier transform (DFT) of vector
X. If the length of X is a power of two, a fast radix-2 fastFourier transform algorithm is used. If the length of X is
not a power of two, a slower non-power-of-two algorithm
is employed. For matrices, the FFT operation is applied to
each column.
FFT(X,N) is the N-point FFT, padded with zeros if X has less
than N points and truncated if it has more.

Spectral analysis in Matlab

Log magnitude (amplitude) spectrum:


>> X= fft (y);
>> m = 20 * log10 ( abs ( X ) );
>> help abs

Fourier spectrum of a vector:


>> X= fft (y);
>> help fft

Spectral analysis in Matlab

Log magnitude (amplitude) spectrum:


>> plot(20*log10(abs(fft(y))))
140

120

100

ABS

Absolute value.

80

ABS(X) is the absolute value of the elements of X. When


X is complex, ABS(X) is the complex modulus
(magnitude) of the elements of X.

Plotting amplitude spectra


help fp
FP: function to compute & plot amplitude
spectrum
Usage: [a,f]=fp(wave,rate,window);
wave: input waveform
rate: sample rate in Hz (default 10000 Hz)
window options: 'hann', 'hamm', 'kais', or 'rect'
(default=hamming)
[a,f]: log magnitude (dB re:1), frequency (Hz)

60

40
0

200

400

600

800

1000

Plotting amplitude spectra


[a,f]=fp(wave,rate,window);
[a,f]=fp(y,fs,'hann');
p
20
10
0
Am plitude (dB)

-10
-20
-30
-40
-50
0

2
Frequency (kHz)

11

Assignment 1

Assignment 1

Part 1: (Matlab code, plots, brief summary)


Make a set of digital recordings (WAV files) of
the 12 vowels of American English:
/i/ "heed"

// "hid"

/e/ "hayed"

// "head"

// "had"

// "hud"

// "hod"

// "hawed"

/o/ "hoed"

// "hood"

/u/ "whod"

// "herd"

Load waveforms into Matlab; make 12


subplots of the amplitude spectra of the vowels,
sampled near the midpoint.
[ y, fs ] = wavread ('heed.wav');
subplot (4,3,1);
start = ( length (y) / 2 ) - 256;
stop = ( length (y) / 2 ) + 256;
fp ( y ( start : stop ) , 512 , fs, 'heed.wav', 'Hamming');

Assignment 1

Assignment 1

Plot the amplitude spectra of the vowels. Place


all 12 plots in a single figure window using the
subplot command:
>> subplot ( 3, 4, 1);
>> plot ( x, y );

>> filenames = char ( 'heed', 'hid', 'hayed',


'head', 'had', 'hud', 'herd', 'hod',

// "heed"

// "hid"

// "hayed"

// "head"

// "had"

// "hud"

// "hod"

// "hawed"

// "hoed"

// "hood"

// "whod"

// "herd"

'hawed', 'hoed', 'hood', 'whod' ) ;

>> subplot ( 3, 4, 2);


>> plot ( x, y );

Assignment 1
Step 2: Load the waveform of each vowel from
the disk:
>> for i=1:12,
>>

[ y, rate ] = wavread ( deblank ( filenames ( i , : ) ) );

>>

y = y * 2^15; % scale signal to 16-bit range (215)

>>

Step 1: Make a list of the filenames as a character


array:

>> deblank ( filenames ( 3, : ) )


ans =
hayed

Assignment 1
Step 2: extract the middle part from the waveform
>> % extract samples that lie between start and stop:
>> y = y( start : stop ); % but how do we select start and stop?
start

stop

% insert plot commands here

>> end;

12

Exercise1

Exercise1

Find out various properties of the waveform:


length ( y )

% vector length

Step 3: Find vowel midpoint; define a range


of sample points to extract from the waveform.

min ( y )

% minumum value

nfft = 512;

max ( y )

% maximum value

start = ( length (y) / 2 ) (nfft/2 1);

mean ( y )

% mean value

stop = ( length (y) / 2 ) + nfft/2;

plot ( y )

% inspect waveform

% y ( start : stop )

sound ( y, rate ) % listen to waveform

Exercise1

Function M-file: fp.m

Step 4: Use the function fp.m to compute and


plot the amplitude spectrum of the vowel
segment: input arguments
fp ( y( start : stop ) , fs , 'Hamm' );

There are two types of M-files: scripts and functions. To


display the contents of an M-file, type the following:

type fp.m
Function M-files start with a function statement (see next
page) and a series of comment lines. The comment lines
are included to provide online help and are optional (but

input vector
(waveform
segment)

sample
rate

type of
window
function

very useful!). The next five slides illustrate and explain


the contents of the function fp.m

Function M-file: fp.m

Function M-file: fp.m

% FP: function to compute & plot amplitude spectrum


% Usage: [a,f]=fp(wave,rate,window);

% set reasonable defaults for optional variables

comment
lines

% wave: input waveform

rate=10000;

% rate: sample rate in Hz (default 10000 Hz)

set
defaults

end;

% window options: 'hann', 'hamm', 'kais', 'rect' (default=hanning)


% [a,f]: log magnitude, frequency

function [ a, f ] = fp ( x, rate, window ) ;

if ~exist ( 'rate' , 'var' ) ,

function
statement

if ~exist ( 'window' , 'var' ),


window = 'hamm' ;
end;

optional output arguments


a=log magnitude spectrum
f=corresponding frequencies

13

Function M-file: fp.m

Function M-file: fp.m


% illustration of if-else statements:

x=x(:);

% convert x to column vector

window=lower(window); % window must be lower case

n = length ( x ) ;

% length of data vector

if window=='rect',

% rectangular window = [1 1 1 1 1]

x=x.*ones(n,1);

% multiplying x by 1 does nothing!

elseif window=='hamm',

Variables defined inside a


function are local. In
other words, they are
not accessible on the
command line, outside
the function itself.

x=x.*hamming(n);

% multiply x by Hamming window

elseif window=='hann',
x=x.*hanning(n);

% multiply x by Hanning window

else,
x=x.*hamming(n);

% default case: Hamming window

end;

Function M-file: fp.m

Function M-file: fp.m

m=fft(x,n); % Fast Fourier Transform (fastest if n = power of 2)

% plot amplitude spectrum: frequency vs. amplitude

no2=round(n/2);

plot ( freq , amp ) ; % frequency = x-axis, amplitude=y-axis

% n/2 samples: FFT is symmetrical

a=20*log10( abs ( m ) / n); % convert linear magnitude to dB

axis( [ 0 rate/2000 -Inf Inf ] ) ; % axis range: [ xl xh yl yh ]

f=rate/n*(0:no2)/1000; % frequency scale: DC = 0 to fs/2


freq = f (1:no2);

% retain only the first n/2 samples

amp = a (1:no2);

% retain only the first n/2 samples

Exercise1

Exercise: modify fp.m


% modify fp.m to compute phase spectrum

Annotate graph:
>> xlabel ( 'Frequency (kHz)' );

% ****** End of function fp.m ******

% x-axis label

phase = unwrap ( angle (m) ) ;


p = 180 / phase; % convert from radians to degrees

>> ylabel ( 'Amplitude (dB)' );

% y-axis label

% plot phase spectrum: frequency vs. phase

>> title ( filenames ( i , : ) );

% graph title

plot ( freq , phase ) ; % frequency = x-axis, phase=y-axis

Turn off the axis labels by inserting an empty string:

>> ylabel ( ' ' );

axis( [ 0 rate/2000 180 180 ] ) ;

% null axis label

14

Annotations

Modifying axes properties

>> xlabel ( 'Frequency (kHz)' );

% x-axis label

>> ylabel ( 'Amplitude (dB)' );

% y-axis label

>> title ( filenames ( i , : ) );

% graph title

Modify default axes properties:


>> gca

% get current axes = axes handle

>> set ( gca, 'XLim', [ 0 4 ] );

% x-axis range

>> set ( gca, 'YLim', [ -20 40 ] ); % y-axis range


>> set ( gca, 'TickDir', 'Out' );

Amplitude spectrum

% tick mark dir

Phase spectrum

60
150

50

Phase (deg)

Amplitude (dB)

100
40
30
20

50
0
-50

10
-100
0
-10
0

-150
1

2
Frequency (kHz)

Speech spectrograms in Matlab


help specgram
SPECGRAM Calculate spectrogram from signal.
B = SPECGRAM(A,NFFT,Fs,WINDOW,NOVERLAP)

calculates the spectrogram for the signal in vector A.


SPECGRAM splits the signal into overlapping
segments, windows each with the WINDOW vector
and forms the columns of B with their zero-padded,
length NFFT discrete Fourier transforms.

2
Frequency (kHz)

Speech spectrograms in Matlab


help sp
sp: create gray-scale spectrogram
Usage: h=sp(wave,rate,nfft,nsampf,nhop,pre,drng);
wave: input waveform
rate: sample rate in Hz (default 8000 Hz)
nfft: FFT window length (default: 256 samples)
nsampf: number of samples per frame (default: 60)
nhop: number of samples to hop to next frame
(default: 5 samples)
pre: preemphasis factor (0-1) (default: 1)
drng: dynamic range in dB (default: 80)
title: title for graph (default: none)

15

Making spectrograms

Making spectrograms
hod

>> load wheel

% Load pre-recorded waveform

>> sp (wheel, 8000);


>> colormap(hot);
>> axis tight;

% Use defaults for other variables

% extends plot to axis limits

Frequency (kHz)

% determines image color scheme

5
4
3
2
1
0

TrackDraw: a graphical speech synthesizer

100

200

300
Time (ms)

400

500

600

TrackDraw program

Provides a graphical interface for controlling a


speech synthesizer (cascade formant synthesis,
Klatt, 1980)
Allows for successive iterations of hand-tracking,
synthesizing and listening to the results
Assmann, P., Ballard, W., Bornstein, L., and
Paschall, D. (1994). Track-Draw: A graphical
interface for controlling the parameters of a
speech synthesizer. Behavior Research Methods,
Instruments and Computers 26, 431-436.

Using TrackDraw

The Spectral Slice Display

load wheel
y=wheel;
specsynth;

16

Fundamental Frequency (F0) window

TrackDraw: finished tracks

Amplitude of voicing (AV) window

Saving, printing and re-loading tracks


>> specsynth;
% when finished tracking click on exit button
>> savetr
% save tracks in file; enter name xxheedtr
% savetr will append the .mat extension
>> load xxheedtr.mat
% To re-load track files and run statistics
>> plottracks
>> print -Pljhd

17

You might also like