Professional Documents
Culture Documents
Automatic Speech Recognition Using Cepstral and Itakura-Saito Distances For Vocal Command
Automatic Speech Recognition Using Cepstral and Itakura-Saito Distances For Vocal Command
net/publication/257788283
CITATIONS READS
0 247
3 authors, including:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Abdennaceur Kachouri on 30 May 2014.
Zied SAKKA, Abdennaceur KACHOURI, Ahmed BEN AISSA and Mounir SAMET.
Abstract
The speech recognition system is not stopping to (Perceptual Linear Predictive Cepstral Coefficients).
evolve and to present significant performances. We adopted like a method of classification the
Nevertheless, the extent of the calculations is very measure of distortion by calculating a distance to
important and complex in particular in the know the cepstral distance and the Itakura-Saito
classification phase. We are interested in this paper distance [1][3][7].
to the sturdiness of the techniques of The acoustic and phonetic basis used is the TI46
parameterization LPCC, MFCC, PLPCC and to the
classification by simple measure of distance or 2. Speaker recognition system
distortion indeed the cepstral distance and Itakura-
Saito distance. The basic elements of a speaker recognition system
are shown in Fig.1. An input utterance from an
Keys words: Speech recognition, cepstral distance, unknown speaker is analyzed to extract speaker
Itakura-Saito distance. characteristic features. The measured features are
compared with prototype features obtained from
1.Introduction known speaker models.
In the identification mode, a speech sample from an
The vocal command is the order which obeys to unknown speaker is analyzed and compared with
sound with thin the meaning of the voice. models of known speakers. The unknown speaker is
The performance of automatic word recognition identified as the speaker whose model best matches
system is dedicated to the effectiveness of the adopted the input speech sample. In the “closed set”
parameterisation technique and to the sturdiness of identification mode, the number of decision
the classification technique used. Seen the alternatives is equal to the size of the population. In
complexity of the word signal, his redundancy and the “open set” identification mode, a reference model
variability inter and intra speakers; the automatic for the unknown speaker may not exist. In this case,
word recognition remains a difficult problem to an additional alternative, “the unknown does not
resolve. To succeed the vocal order, it is necessary to match any of the models”, is required.
take carefully all the steps of the word recognition The unknown speaker’s speech sample is compared
system, in particular the techniques of with the model for the speaker whose identity is
parameterisation. In fact, more the parameters are claimed. If the match is good enough, as indicated by
sturdier more differentiating and more pertinent, more passing a threshold test, the identity claim is verified.
the performance of our system is better, that’s why Crucial to the operation of a speaker recognition
our study is focused on the parameterisation system is the establishment and maintenance of
techniques in order to improve the quality of acoustic speaker models. One or more enrolments sessions are
modelling, for that we present different cepstral required in which training utterances are obtained
parameterisation techniques to know: LPCC (Linear from known speakers. Features are extracted from the
Predictive Cepstral Coefficients), MFCC (Mel training utterances and compiled into models. Many
Frequency Cepstral Coefficients) and PLPCC speaker recognition systems include an updating
Feature extraction
Reference
Reference
template N
template 1
Fig.2: Algorithm of the LPCC technique.
- Pre accentuation
- window of Hamming FFT │ │2 Log(.)
Maximum selection
Identification result
Fig.1: Structure of speaker recognition system. Cepstral Recursion Durbin récursion IFFT
3. Techniques of parameterization
In this party we describe the three principal Fig3: Algorithm of the MFCC technique.
techniques of parameterisation that showed a
considerable interest in the automatic word
recognition systems. These techniques are LPCC 3.3. Parameterization by the PLPCC coefficients
(Linear Predictive Cepstral Coefficients), MFCC
(Mel Frequency Cepstral Coefficients) and PLPCC The PLP analyze is an improvement of the LPC
(Perceptual Linear Predictive cepstral analyze. It takes account of the three following
Coefficients)[4][5][7]. aspects:
- Integration of critical bands: spectral density is
3.1. Parameterization by the LPCC coefficients putted back on the Mel-scale then convoluted with a
function representing a filter with a critical band.
The calculation of the coefficients LPCC obeys to the
algorithm of the Fig.2. At first the signal is - Preaccentuation by a bank of isotonicity:
preaccentuated by a high pass filter then a Hamming perceived intensity, when we listen a pure sound with
window is applied on it. Next we determine the constant acoustic intensity, varies with the frequency
autocorrelation coefficients with the help of the of this pure sound. To simulate this phenomenon in
Fourier inverse transformation of the log of energy. the framework of PLP analyze, we multiplies
At last, once the coefficients autocorrelation are resultant spectral density of the preceding step by a
obtained, we calculate the coefficients of the auto function of balance.
1 f f
Π
- Law of Stevens: the two precedent treatments are
insufficient to establish the correspondence between
d IS ( f , f ' ) = ∫ − In − 1 dθ (3)
2.Π −Π f ' f '
measured intensity and perceived intensity (the The Fig.6 below illustrates the function of Itakura-
sonie). The law of Stevens gives a relation between Saito.
the sonie and intensity :
fonction distance cepstrale
100
0.33
Sonie = (intensity) (1)
90
- Pre accentuation 80
- window of Hamming FFT │ │2 Mel-scale
70
60
d is t a n c e
Cepstral Durbin 50
( ) 0.33 IFFT
Recursion récursion 40
30
Fig.4: Algorithm of the PLPC technique.
20
10
4. Classification and decision
0
-10 -8 -6 -4 -2 0 2 4 6 8 10
Once the parameters of the word are obtained, they
coeficients
must be compared to those memorized in the
reference dictionary. This comparison occurs while
measuring their distortions to the help of simple Fig.5: The function of distance cepstrale
distance calculation. We chose two types of distances
to know the cepstral distance and the Itakura-Saito fonction ItakuraSaito
distance; and according to the value of this distance, 4
superior or inferior to a threshhold that is empirically
estimated, the word will be judged if it is the voucher 3.5
or no. The quoted threshhold varies with the
technique of parameterisation and the types of chosen 3
distances.
2.5
4.1. Cepstral distance
d is t a n c e
2
Cepstral distance is formulated and given by the
following equation where c and c’ are cepstral 1.5
coefficients:
1
∞ 2
2
d cep = ∑ [c ( l ) − c ' ( l ) ]
l =1
(2)
0.5
0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
coefficients
The Fig.5 shows the function of cepstral distance. Fig.6: The function of Itakura-Saito.