Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 42

Topics covered in this chapter

– Three basic problems in pattern comparison


• How to detect the speech signal in a recording
interval (i.e. separate speech from background)
• How to locally compare spectra from two speech
utterances (local spectral distortion measure),
and
• How to globally align and normalize the distance
between two speech patterns (sequences of
spectral vectors) which may or may not
represent the same linguistic sequence of
sounds (word, phrase, sentence, etc.)

1
Distortion Measures
• Mathematical considerations to find out the
dissimilarity between two feature vectors.
• Let x and y are two vectors defined on a
vector space X.
• A metric or distance function d on the vector
space X as a real valued function on the
Cartesian product XX is defined as ……

2
Distortion Measures

a) 0  d ( x, y )   for x, y  X and d(x, y)  0 iff x  y


(positive definitness property)
b) d(x, y)  d(y, x) for x, y  X (symmetry)
c) d(x, y)  d ( x, z )  d(y, z) for x, y  X (triangular
inequality condition)
in addition the distortion function is called invariant if
d) d(x  z, y  z)  d(x, y)

3
Distortion Measures
• If a measure of a distance d, satisfies only
the positive definiteness property then it is
called as distortion measure if vectors are
representation of the speech spectra.
• Distance in speech recognition means measure
of dissimilarity.
• For speech processing, an important
consideration in choosing a measure of
distance is its subjective meaningfulness
• The mathematical measure of distance to be
useful in speech processing should consider
the lingustic characteristics. 4
Distortion Measures

For example a large difference in the


waveform error does not always imply
large subjective differences.

5
Distortion Measures
• Perceptual considerations: the choice of an
appropriate measure of spectral dissimilarity
is the concept of subjective judgment of
sound difference or phonetic relevance.
• Spectral changes that keep the sound the
same perceptually should be associated with
small distances.
• And spectral changes that keep the sound the
different perceptually should be associated
with large distances

6
Distortion Measures

• Consider comparing two spectral


representations, S(w) and S’(w) using a
distance measure d(S,S’)
• If the spectral content of two signal are
phonetically same (same sound) then the
distance measure d is ideally very small

7
Distortion Measures
• Spectral changes due to large phonetic
distance include
– Significant differences in formant
locations. i.e the spectral resonance of S(w)
and S’(w) occure at very different
frequencies.
– Significant differences in formant
bandwidths. i.e the frequency widths of
spectral resonance of S(w) and S’(w) are
very different.
For each of these cases sounds are
different so the spectral distance measure
d(S,S’) is ideally very large 8
Distortion Measures

To relate a physical measure of difference to


subjective perceived measure of difference it
is important to understand auditory
sensitivity to changes in frequencies,
bandwidths of the speech spectrum, signal
sensitivity and fundamental frequency.

9
Distortion Measures
• This sensitivity is presented in the form
of just discriminable change – the
change in a physical parameter such
that the auditory system can reliably
detect the change as measured in
standard listening test.

10
Spectral-distortion measures

• Measuring the difference between two


speech patterns in terms of average spectral
distortion is reasonable way both in terms of
its mathematically tractability and its
computational efficiency
• Perceived sound differences can be
interpreted in terms of differences of
spectral features

11
Log spectral distance
• Consider two spectra S(w) and S’(w). The
difference between two spectra on a log
magnitude versus frequency scale is defined
by

V( )  log S( ) - logS' ( ) - - - - - - - - - -1

• A distance or distortion measure between S


and S’ can be defined

by
d
d (S, S' )  (d p )   V ( )
P
p
2

2 12

d
d (S, S' )  (d p )   V ( S , S ' )
P
p
2

2

This is related to how humans perceive


sound differences

13
Log spectral distance
• For P=1 the above equation defines the mean
absolute log spectral distortion
• For P=2, equation defines the rms log spectral
distortion that has application in many speech
processing systems
• For P tends to infinity, equation reduces to
the peak log spectral distrotion

14
Cepstral distances
• For the Cepstral coefficients we use the
rms log spectral distance.

S(t)  h(t) * x(t) vocal tract components and excitement


S( )  H ( ) X ( ) in the frequency domain
Taking log of S( )
log S ( )  log H ( )  log X ( )
The power spectrum

log | S( ) |2   n          1
c e  jn

n  
16
Cepstral distances

The Cepstral coefficients can be obtained from the


LPC coefficients
 2
d
d ceps ( S , S ' )   log S ( )  log S ' ( )
2
3

2

  n n where cn and cn' are cepstral
( c 
n  
c ' ) 2

coefficients of S( ) and S' ( ) respectively.

17
Weighted cepstral distances and
liftering
• Liftering makes the system more robust to
noise,
• Liftering is done to obtain the equal variance
• Liftering is significant for the improvement
for the recognition performance
• If we incorporate n2 factor into the cepstral
distance to normalize the contribution from
each cepstarl term, the distance
 
d 2
2w  n
n  
2
( cn  c ) 
' 2
n  (nc
n  
n  nc )        2
' 2
n
20
24
Weighted cepstral distances and
liftering
• The original sharp spectral peaks are highly
sensitive to the LPC analysis condition and the
resulting peakiness creates unnecessary
sensitivity in spectral comparison
• The liftering process tends to reduce the
sensitivity without altering the fundamental
“formant” structure.
• i.e the undesirable (noiselike) components of
the LPC spectrum are reduced or removed,
while essential characteristics of the
“formant” structure are retained
25
Weighted cepstral distances and
liftering
• A useful form of weighted cepstral distance
is
L
d 2
cw   ( w(n)cn  w(n)c )' 2
n
n 1

• Where w(n) is any lifter function.

26
Itakura and Saito
• The log spectral difference V(w) is defined
by V(w) = log S(w) – log S’(w) is the basis of
many distortion measures

• The distortion measure proposed by


Itakura and Saito in their formulation of
linear prediction as an approximate
maximum likelihood estimation is

27
Itakura and Saito

 e  dw
d IS ( S , S ' )  V ( w)
 V ( w)  1

2
 2
S ( w) dw 
d IS ( S , S ' )    log 1

S ' ( w) 2  '
2

2 2
where   and  ' are prediction errors of S(w)
and S' (w) respectively.
where
 
dw 
   exp  log S(w)
2

- 2  28
Itakura and Saito
• The Itakura Satio distortion measure can be
used to illustrate the spectral matching
properties by replacing S’(w) with the pth order
all pole spectrum
  

 S (w) A e 
2
  1 jw 2 dw
d Is  S ,   2  log  2   log  2  1
  
 A e jw  
2

 
2

where  2 is the gain


 S (w) A e 
jw 2 dw
  , where  is the residual energy

2
29
Itakura

let us consider   
2

then the Itakura distortion measure is


    A(e jw ) 2 
 1 1   dw 
dI  , 2 
 log   
 Ap
2
A  
jw
 Ap (e )
2
2 
   

30
Likelihood Distortions
• The role of the gain terms is not explicit in
the Itakura distortion because the signal
level essentially makes no difference in the
human understanding of speech so long as
it is unambiguously heard.
• Gain independent distortion measure called
likelihood ration distortion can be derived
directly from IS distortion measure
   
 1 1   1 1 
dI  2
, 2 
 d LR  2
, 2 
 Ap A   Ap A 
   
31
Likelihood Distortions

When the distortion is very small the Itakura


distortion measure is not very different from
the likelihood distortion measure.

32
Variations of likelihood
distortions
• Compare to the cepstral distance likelihood
distortions are asymmetric.
• To symmetries the distortion measure there
are two methods
– COSH distortion
– Weighted likelihood distortion

33
COSH distortion
• COSH distortion is given by

 S ( w)  dw
d COSH   cosh log  1
  S ' ( w)  2

• The COSH distortion is almost identical to


twice the log spectral distance for small
distortions

34
Weighted likelihood ratio
distortion
The purpose of weighting is to take the
spectral shape into account as a weighting
function such that different spectral
components along frequency axis can be
emphasized or de-emphasized to reflect some
of the observed perceptual effects

35
Weighted likelihood ratio
distortion
 rˆ(n) rˆ' (n) 
dWLR    2  2   cn  c'n 
 ' 
where c n and c'n are cepstral coefficients of
1 1
log 2
and log 2
A A'
and rˆ(n) and rˆ' (n) are autocorrelation sequences
2  '2
for 2 and 2 respectively
A A'
36
Comparison of dWLR and d2 2

1 1
In d 22 (log 2
- log 2
)
A A'
in d WLR this is replaced by linear deviation
1 1
( 2
- 2
) which shows heavier emphasis
A A'
in spectral peak areas than the compressed deviation
1 1
(log 2 - log 2 )
A A'
This property is required in the applications where
extraordinary emphasis of spectral peaks is necessary,
such as speech recognition in noisy environment
37
Weighted slope metric distortion
measure
Based on a series of experiments designed to
measure the subjective “phonetic” distance
between pairs of synthetic vowels and
fricatives, it is found that by controlled
variation of several acoustic parameters and
spectral distortions including formant
frequency, formant amplitude, spectral tilt,
highpass, lowpass, and notch filtering only
formant frequency deviation was phonetically
relevant
38
Weighted slope metric distortion
measure
WSM attach a weight on the spectral slope
difference near spectral peaks, rather than
the spectral amplitude difference, and take
the overall energy difference explicitly into
consideration
K
dWSM ( S , S ' )  u E Es  Es '   u (i )  s (i )   s ' (i )
2

i 1

where u E is the weighting constant for absolute energy


Es  Es ' between S and S' , u(i) is the weighting
coefficien ts for a critical band spectral slope difference
 s (i )   s ' (i ), between S and S' and K is the total no. of
critical bands considerd 39
S
Summary
• The spectral distortion measures are
designed to measure dissimilarity or distance
between two (power) spectra of speech
• Many of these dissimilarity measures are not
metrics because they do not satisfy the
symmetry property
• If an objective speech distortion measure
needs to reflect the subjective reality of
human perception of sound differences, or
even phonetic disparity, the asymmetry seems
to be actual desirable.
40

S
Summary
• All distortion measures are equally important
because certain distortion measures may be
better for an less noisy environment, while
others may be robust when the background is
more noisy.

41
Summary
• Log spectral: Lp metric requires large
amount of calculations because we need 2
FFT’s to obtain S(w) and S’(w), logarithms
of all values of S and S’ and an integral

 1/ p
 P d 
d p    log( S ( w)  log S ' ( w) 
 
2 

42
Summary
• Truncated and weighted cepstral: Requires
only L operations where L is of the order of
12-16 hence calculations required are less
compared to Lp metric

L
d ( L )   (cn  cn )
2 ' 2
c
n 1
L
  W ( n ) ( cn  c n )
2 ' 2
d CW
n 1
43
Summary
• The likelihood, Itakura-Saito, Itakura and
COSH measurements: all requires on the
order of p is the LPC order of all pole
polynomial (8-12). Hence the computations are
same for cepstral measures

44
Summary
 2
S ( w) dw 
d IS ( S , S ' )    log 1

S ' ( w) 2  '
2

 A2 
 dw 
d I  log   
 Ap 2 
2

 2
A dw
d LR   A

2
2
1
p

 S ( w)  dw
d COSH   cosh log  1
 S ' ( w)  2 45
Summary
• Weighted likelihood ratio distortion: Requires
L operations, similar to that of the cepstral
measures

L
 rˆ(n) rˆ' (n) 
   2  2  ( cn  c n )
' 2
dWLR
n 1   ' 

46
Summary
• Weighted Slope metric (WSM): Requires K
operations, where K is the number of
frequency bands used in computations (32-
64)

K
dWSM ( S , S ' )  u E Es  Es '   u (i )  s (i )   s ' (i )
2

i 1

47
Summary
• From all these points we can say that all the
measures are both physically reasonable and
computationally tractable for speech
recognition except for the Lp metrics.
• Hence, practically we are going to use all the
measures to study the speech recognition
system

48

You might also like