LPC 2

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Estimation of LPC Parameters of Speech Signals in

Noisy Environment
Akshya K. Swain Waleed Abdulla
Department of Electrical and Computer Engineering,
University of Auckland, Private Bag-92019 ,
Auckland, New Zealand
email: a.swain@auckland.ac.nz, w.abdulla@auckland.ac.nz

Abstract— The performance of LPC based algorithm deteri- , which is a byproduct of the OLS algorithm, is used to find
orates significantly in the presence of background noise. The the best possible order for a given speech segment and to
present study proposes a new approach based on orthogonal include only those terms which are significant into the model.
least squares (OLS) algorithm with structure selection to obtain
unbiased LPC parameters from noisy speech samples. Instead Estimation of the model coefficients become straightforward
of fitting a fixed order model to all segments of speech, the once the terms to be included are known. The model terms
algorithm selects the best possible model order for a given are quantified according to the contribution they make to the
speech segment using an error reduction ratio (ERR) test. A variance of speech segment. The best model is selected as
noise model is appended to the conventional LPC model to make the model which explains the total signal variance and for
the LPC parameters unbiased. The proposed algorithm gives
superior performance compared to the commonly used LPC which the number of terms is a minimum. This implies that the
based algorithm under high levels of noise. model terms are selected according to their significance to the
signal variance. The biasing effects of noise on signal related
I. I NTRODUCTION parameters (LPC parameters) can be reduced significantly by
Linear prediction based speech analysis has received consid- appending a noise model where the estimation of signal and
erable attention in the past four decades. In linear prediction, noise model parameters are decoupled. The performance of
the speech waveform is represented by a set of parameters the algorithm has been demonstrated with examples of spoken
of an all-pole model, called the linear predictive coefficients words with high levels of noise and has been found to be
(LPC), which are closely related to speech production transfer superior compared to LPC based algorithms.
function. The LPC analysis essentially attempts to find an The organization of the paper is as follows. Section-II gives
optimal fit to the envelope of the speech spectrum from a given the details of the problem formulation when the speech signal
sequence of speech samples. The LPC feature computed by au- is corrupted by noise. A brief review of the orthogonal least
tocorrelation or covariance method (Makhoul,1975) minimizes squares algorithm and some model validation methods are
the sum of squares of the prediction residuals and therefore presented in section-III . The effectiveness of the proposed
resembles a least squares fit. algorithm is illustrated in section-IV with conclusions in
The performance of LPC technique, which is equivalent to section-V.
auto regressive (AR) modeling of the speech signal, however
degrades significantly in the presence of background noise. II. P ROBLEM F ORMULATION
The additive noise changes speech signal process from AR The block diagram of a LPC based speech production model
to an auto regressive moving average (ARMA) process. The
least squares estimates of the LPC parameters from a noise as the output of an all pole filter

 
in noise is shown in Fig.1 where the speech signal is modeled
which is
corrupted sequence using an all-pole speech model therefore excited by a sequence of pseudo periodic pulses for voiced
become biased ; the bias being proportional to the inverse of speech or pseudo random noise for unvoiced speech.
the signal-to-noise ratio (Soderstrom and Stoica,1989). Several Thus within a certain window length of speech the output
techniques based around robust statistic, instrumental variables speech sequence is modeled (without noise and preemphasis
and higher order Yule-Walker equations have been suggested filter) as
in the past to obtain improved estimates of linear predictive        !#"$ %'&(&(&)*)+  ,*- /.10325467/
8 9;:A@
 <
coefficients from noisy speech samples (Lee,1988; Ramachan-
dran et al,1995; Gong, 1995,Hernando and Nadeu,1997, Shi- =?>
mamura, 2001) . CD E B )G   H,JIK L.'MON  H
F
The present study proposes an alternative approach to (1)
estimate the LPC parameters using orthogonal least squares
algorithm with structure selection (Billings et al et al,1989;
Swain and Billings,1998). An error reduction ratio (ERR) test
N  P QN/5 SRPTU is the windowed version of the
function, 
where p is the order of the LPC model,w(n) is the widow

     
process takes the form of an Auto Regressive Moving Average
         with eXogenous input( ARMAX) process
   + 
YX
    B   , S. D
D  !
+
TS VU TS -W Z U2[ \W
 F U
F
All Pole Filter Preemphasis .3MON   ("  /  /.   
"! ][
(7)

where
[  H  P H /R TU5
^[
. If the effects of unknown input
are included in the residual, the ARMAX structure of eqn(7)
Fig. 1. Block diagram of Linear Prediction Based Speech Production Model
reduces to an ARMA model:
 X
4 7/  D B
# S 4 7  . BD
  # S
] _ 467  . 7/
Z ]_ n
additive noise,   H P  H LR TU5
input excitation u(n), G is the gain of the filter, v(n) is the
is the windowed speech
E
8 F  9:
 E
< 8 F  ;9 :

Yj
<
(8)

` a )cbRdfehgTi & d k g l & ehgmi & d


 
signal,  ’s are the LPC coefficients characterizing the filter
and is the residual error. Since it is customary to pass n 7/ 032 4 H7/ ;" 
0!fo
j
L; %. 467/
"  /)

the speech signal through a preemphasis filter of the form
"!
to boost the signal spectrum by 6dB/octave, the
where . Thus, while the
structure of the speech model under the clean environment
model of eqn.(1) with preemphasis filter becomes resembles an AR(p) process, contamination of noise changes
# 4 7/   #
  4 H 7 #"$ #&
&(&  4 7  5
 #
%$ the structure to ARMA(p,p+1) process. Although all the spec-
"!  B"
.J032 4  7/ % 032 4 7 ' tral information of the original AR(p) process are preserved
8 9;: (2) < in the AR parameters of ARMA model Z of _ noisy process, " &(&(&
 . "
p$ the
&(' *) presence of noise related parameters  S # 1 1
makes the estimation of AR(p) parameters difficult. Applica-
The model of eqn.(2) resembles an Autoregressive with eX- tion of Levinson-Durbin algorithm or any least squares based
ogenous input (ARX)model of the form
E  E
algorithm which uses an AR(p) model structure and ignores
+ H7/  D E B  +
7 -,? L. D E . %2 7 -,? S.0/ 7/ the MA part results in getting biased estimates.

F F
(3) Several techniques based on the principle of instrumental
variables and Yule-Walker method have been developed in the
Note that the model of eqn.(2) is often referred to as ARMA past (Stoica et al,1988) to yield asymptotically unbiased esti-
model in speech processing literature (Morikawa,1990). Con- mates of AR(p) parameters of general ARMA processes. The
sidering the usual assumption that the input signal is not present study uses orthogonal least squares based algorithm

as residuals
/ 4 7/
known, the eXogenous variables of eqn(2) is usually treated
in LPC analysis.
(Billings et al, 1989) to estimate the parameters of the AR(p)
process The algorithm is briefly reviewed in the following

 &
&(&
In the classical stochastic formulation of linear prediction, section. Unfortunately eqn(8) is unsuitable for estimating the
the parameters  21  of eqn(1) are estimated by minimiz- parameters ; since the noise terms are not usually known.
3
ing the cost function B However, these terms can be replaced by the one step ahead
547698 # ;# : <8 =?>  4 H7/ % 4 7/
(4)
prediction residuals or by the background noise measured
separately.
where
#:4 H7/ P    # 4 H7 ,"   = #
4 7 A@) ,
 E &(&
&K  # 4 H7  $5 III. O RTHOGONAL L EAST S QUARES A LGORITHM WITH

B (5) S TRUCTURE S ELECTION

Minimization of eqn.(4)with respect to  s yields a set of linear The orthogonal least squares algorithm (Billings et al,1989;
E
equations which are of the form
CB B
Swain and Billings,1998) usually estimates the parameters of
DE B  
D, 1FE " &
&(&;&
&(&
CG 1HE 1JILK2MNE D$
the model of the form
 1 + 7/
rq H7 '
 "
s t : ]n . H7/
F
(6)
B (9)
D, 1FE  467   4 7 
4O # P, # E RQ is the signal q O7! " uv 7 "
where
autocorrelation at lag
8, 8
E . Since the speech processing is
where is a vector which contains both signal
and residual terms upto and including time (n-1) and
n 7/
carried out by segmenting the speech into overlapping frames are the identification residuals,. Note that the speech signal
and windowing each frame, the number of data points used
for computing the correlation functions are finite and therefore
7 "
model of eqn.(8) can bet put into the form of eqn.(9). The
u v
parameter vector which contains both the signal and
the problem looses its probabilistic nature. noise model parameters can be estimated using orthogonal
When the speech samples are corrupted by an additive techniques which can effectively overcome numerical ill-
white noise v(n) ,the model structure of the speech production conditioning. The parameter estimation is performed by first
1
transforming model of eqn.(9) into
) E E
7/ D E
0.9 III

+   ]n H7/ /. H7/ 0.8

E F (10)
0.7

Normalised Mean Square Error


O  7/ Q
1 , " 1 &
&(&7u are orthogonal over
0.6

where the terms 0.5

the data records. The parameters of the auxiliary model in


eqn.(10) can be estimated by (Billings etE al,1989)
0.4

E
II

 ) F k  + H7/ E  7/
k 0.3

:
7/ =
0.2

 ) F  
(11) I

E E
0.1

O Q )
Finally, parameters of the model of eqn.(9) can be computed 0 10 20 30 40 50 60

from the  F
Frame Number

.
A criterion for selecting the most important terms in the Fig. 2. Normalized mean square error : Curve-I NMSE of clean speech;
Curve-II : NMSE of noisy speech at SNR=10 dB with OLS algorithm ; Curve-
model can be devised as a byproduct of the orthogonal parame- III : NMSE of noisy speech without noise model
ter estimation procedure. Note from eqn.(9) that the maximum
mean squared prediction error (MPSE) is achieved when no

equals +
7/
terms are included in the model. In this case, the MPSE
=
signal can be regarded as a good objective measure to assess
E E where the over bar indicates time averaging.
E E the quality of the processed signal (Quackenbush et al,1988).

term,  
7/
The reduction in the MPSE due to the inclusion of the i-th
= =
in the auxiliary model of eqn(10) is   .
7/ The Itakura-Saito (IS) or the log likelihood ratio is based
on the dissimilarity between all-pole models of the clean
Expressing this reduction as a fraction of the total MPSE yields reference  (say) and the distorted speech wave form i
the error reduction ratio (ERR)
BB E E E ( Itakura,1975). The IS method compares the LPC vectors 
F  7/ and  i of two windowed speech wave forms  and i and
k
64 >  =  )
 =
B
F  7/
k + = (12) can be defined as:
 )
 `
= B
&
   i Y Si  K    i $#
=
. E #"
= =
  S 
(14)
Only those terms with large values of ERR are selected to  i 
form the model. An advantage of the OLS estimator is that
= = "! B B
significant parameters can be determined recursively and quite where  i and  represent all-pole gains for the processed
independently of the other terms. Furthermore, the estimation and its corresponding speech frame respectively and is the
of the signal and noise model parameters can be decoupled.
This is particularly useful for the identification of the model of
autocorrelation matrix of the clean
considers # S and i # :S . 4
signal. The present study
4
equation(9). A parsimonious signal model is first determined.
IV. R ESULTS AND D ISCUSSION
This model is not affected by whatever noise model is pro-
duced later. The initial prediction errors are computed based The performance of the algorithm has been investigated
on this signal model and a noise model can then be selected. by collecting speech samples from variety of situations and
A revised prediction error sequence is then calculated and an with different levels of noise. However, only the results of
improved noise model is determined. estimating the LPC parameters of one spoken word “SEVEN”
are discussed here. The signal was sampled at 22050 Hz.
A. Model Validation An window length of 512 samples with 200 samples of
A critical step in any successful application of parametric overlapping were used in the analysis. The normalized mean
estimation is to check the adequacy of the fitted model i.e square error of all frames were computed both for clean and
to check if the model is a true representation of the signal noisy signals with and without the noise model and is shown
characteristics or just a curve fit to one data set. Several in Fig-2. From Fig.2 it is obvious that the OLS algorithm
techniques have been proposed in the past to validate a which appends a noise model to the conventional LPC model
model (Soderstrom and Stoica,1989). In the present study the outperforms the all pole LPC model under noisy conditions.
normalized mean square error and the Itakura-Saito distance To further demonstrate the effectiveness of the algorithm the
have been used to validate the models. The normalized mean Itakura-Saito(IS) distance were compared and the results are
square error (NMSE) is defined as : shown in Fig.3. The results further proves that the OLS


 4

2+ :  H7/ % H 7/ 
+ =
 # :S # S =  4  7/  4  7/ 
algorithm gives better performance compared to the standard

 + +  H7/ % H/7 
=
 # S # S =  4 7/  4  7/
LPC based models. The adequacy of OLS based model was
further studied by comparing the spectra of different segments
(13) speech computed from LPC models and OLS models.
of

where +
: 7/ Fig.4 shows the spectra of segment-25 of the word “SEVEN”.
is the one step ahead prediction of the signal The results further establish that the models fitted with OLS
y(n). The distortion level between a processed and the original algorithm is superior than the LPC based models.
12
[2] Gong, Y(1995), “ Speech recognition in noisy environments: A survey
“, Speech Communication, vol.16, pp.261-291.
10
[3] Hernando,J. and Nadeu,C (1997), “Linear prediction of one-sided auto-
correlation sequence for noisy speech recognition”, IEEE Trans. Speech
8 and Audio Processing, vol.5, No.1 pp.80-84.
Log Likelihood Distance

[4] Itakura, F. (1975), “ Minimum prediction residual principle applied to


II speech recognition”, IEEE Trans. ASSP, vol.23, No. 1, pp.67-72.
6
[5] Lee, C.H.(1988), “ On robust linear prediction of speech”, IEEE Trans.
ASSP, vol.36, pp.642-650
4 [6] Makhoul,J (1975) “ Linear prediction: A tutorial review,” Proc. IEEE,
vol.63, pp.561-580.
2
[7] Morikawa,H. (1990), “ Adaptive estimation of time-varying model order
I
in the ARMA speech analysis”,IEEE Trans. ASSP, vol.38(7), pp.1073-
1083.
0
0 10 20 30 40 50 60 [8] Ramachandran, R.P., Zilovic, M.S and Mammone,R.J. (1995), “ A
Frame Number
comparative study of robust linear predictive analysis methods with
applications to speaker identification”, IEEE Trans. Speech and Audio
Fig. 3. Likelihood Distance, Curve-I : IS distance between models of clean Processing, vol.3(2), pp.117-125.
and noisy speech with noise model; Curve-II : IS distance between models [9] Quackenbush,S.R, Barnwell-III, T. P and Clements, M.A (1988), “Ob-
of clean and noisy speech without noise model jective Measures of Speech Quality”, NJ: Prentice Hall.
[10] Shimamura,T. (2001), “ A performance comparison of robust speech
14 analysis methods in noisy environments”, Proc. Int. Symposium on
Intelligent Multimedia, Video and Speech Processing, Hong Kong
12 [11] Soderstrom, T and Stoica, P. (1989), “ System Identification “ , Prentice
Hall, London.
10 [12] Stoica,P., Friedlander, B. and Soderstrom, T. (1988), “ A high-order
Yule-Walker method for estimation of the AR parameters of an ARMA
Spectral Amplitude

8 model”, Systems and Control Letters, vol.11, pp.99-105.


[13] Swain A.K. and Billings, S.A. (1998), “ Weighted complex orthogonal
6 estimator for identifying linear and nonlinear continuous time models
from generalized frequency response functions”, Mech. Systems and
4 III
Signal Processing, vol.12, No.2, pp.269-292.

2
II

I
0
0 50 100 150 200 250 300
Frame Number

Fig. 4. Spectral Density for segment no. 25; Curve-I Spectra of clean speech;
Curve-II : Spectra of noisy speech at SNR=10 dB with OLS algorithm ; Curve-
III : Spectra of noisy speech without noise mode

V. C ONCLUSIONS
The performance of orthogonal least squares algorithm with
structure selection have been illustrated considering examples
of speech signals corrupted with high levels of noise. The
algorithm is simple to implement; can provide a parsimonious
model structure and has the feature of adding a noise model
to the standard LPC model structure. The inclusion of noise
model significantly reduces the bias of the LPC parameters
under noisy conditions. Results of simulation from spoken
words demonstrate that the OLS algorithm performs better
compared to the LPC based algorithm.

ACKNOWLEDGMENTS
Akshya Swain is thankful to the Board of Governors,
National Institute of Technology, Rourkela, India for granting
extra ordinary leave. Waleed Abdulla gratefully acknowledges
the financial support from University of Auckland, New
Zealand under NSRF grant ref. 3602239/9273.

R EFERENCES
[1] Billings, S.A., Chen,S and Korenberg,M.J (1989), “ Identification of
MIMO nonlinear systems using a forward regression orthogonal esti-
mator, “ Int. J. Control, vol.49, No.6, pp.2157-2189.

You might also like