Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Estimation of LPC Parameters of Speech Signals in Noisy Environment

Akshya K. Swain Waleed Abdulla Department of Electrical and Computer Engineering, University of Auckland, Private Bag-92019 , Auckland, New Zealand email: a.swain@auckland.ac.nz, w.abdulla@auckland.ac.nz
Abstract The performance of LPC based algorithm deteriorates signicantly in the presence of background noise. The present study proposes a new approach based on orthogonal least squares (OLS) algorithm with structure selection to obtain unbiased LPC parameters from noisy speech samples. Instead of tting a xed order model to all segments of speech, the algorithm selects the best possible model order for a given speech segment using an error reduction ratio (ERR) test. A noise model is appended to the conventional LPC model to make the LPC parameters unbiased. The proposed algorithm gives superior performance compared to the commonly used LPC based algorithm under high levels of noise.

I. I NTRODUCTION Linear prediction based speech analysis has received considerable attention in the past four decades. In linear prediction, the speech waveform is represented by a set of parameters of an all-pole model, called the linear predictive coefcients (LPC), which are closely related to speech production transfer function. The LPC analysis essentially attempts to nd an optimal t to the envelope of the speech spectrum from a given sequence of speech samples. The LPC feature computed by autocorrelation or covariance method (Makhoul,1975) minimizes the sum of squares of the prediction residuals and therefore resembles a least squares t. The performance of LPC technique, which is equivalent to auto regressive (AR) modeling of the speech signal, however degrades signicantly in the presence of background noise. The additive noise changes speech signal process from AR to an auto regressive moving average (ARMA) process. The least squares estimates of the LPC parameters from a noise corrupted sequence using an all-pole speech model therefore become biased ; the bias being proportional to the inverse of the signal-to-noise ratio (Soderstrom and Stoica,1989). Several techniques based around robust statistic, instrumental variables and higher order Yule-Walker equations have been suggested in the past to obtain improved estimates of linear predictive coefcients from noisy speech samples (Lee,1988; Ramachandran et al,1995; Gong, 1995,Hernando and Nadeu,1997, Shimamura, 2001) . The present study proposes an alternative approach to estimate the LPC parameters using orthogonal least squares algorithm with structure selection (Billings et al et al,1989; Swain and Billings,1998). An error reduction ratio (ERR) test

, which is a byproduct of the OLS algorithm, is used to nd the best possible order for a given speech segment and to include only those terms which are signicant into the model. Estimation of the model coefcients become straightforward once the terms to be included are known. The model terms are quantied according to the contribution they make to the variance of speech segment. The best model is selected as the model which explains the total signal variance and for which the number of terms is a minimum. This implies that the model terms are selected according to their signicance to the signal variance. The biasing effects of noise on signal related parameters (LPC parameters) can be reduced signicantly by appending a noise model where the estimation of signal and noise model parameters are decoupled. The performance of the algorithm has been demonstrated with examples of spoken words with high levels of noise and has been found to be superior compared to LPC based algorithms. The organization of the paper is as follows. Section-II gives the details of the problem formulation when the speech signal is corrupted by noise. A brief review of the orthogonal least squares algorithm and some model validation methods are presented in section-III . The effectiveness of the proposed algorithm is illustrated in section-IV with conclusions in section-V. II. P ROBLEM F ORMULATION The block diagram of a LPC based speech production model in noise is shown in Fig.1 where the speech signal is modeled as the output of an all pole lter which is excited by a sequence of pseudo periodic pulses for voiced speech or pseudo random noise for unvoiced speech. Thus within a certain window length of speech the output speech sequence is modeled (without noise and preemphasis lter) as

where p is the order of the LPC model,w(n) is the widow function, is the windowed version of the

x  )(S(G)( & x p x w F s 1 )r( & y9vutDr( & % A2 g1  q f ih b # ca ` edYW V X R P H F E 1 & C2 1888 1 4 1 & 2 GUTSQIG)BD( $% ABA@@97653( '% 1 0)( '%  &

#!   $"   

(1)

All Pole Filter

Preemphasis

Fig. 1. Block diagram of Linear Prediction Based Speech Production Model

where . If the effects of unknown input are included in the residual, the ARMAX structure of eqn(7) reduces to an ARMA model:

The model of eqn.(2) resembles an Autoregressive with eXogenous input (ARX)model of the form

(3)

Note that the model of eqn.(2) is often referred to as ARMA model in speech processing literature (Morikawa,1990). Considering the usual assumption that the input signal is not known, the eXogenous variables of eqn(2) is usually treated as residuals in LPC analysis. In the classical stochastic formulation of linear prediction, the parameters of eqn(1) are estimated by minimizing the cost function

(4)

where

(5)

III. O RTHOGONAL L EAST S QUARES A LGORITHM S TRUCTURE S ELECTION

GU

v u

where is the signal autocorrelation at lag . Since the speech processing is carried out by segmenting the speech into overlapping frames and windowing each frame, the number of data points used for computing the correlation functions are nite and therefore the problem looses its probabilistic nature. When the speech samples are corrupted by an additive white noise v(n) ,the model structure of the speech production

where is a vector which contains both signal and residual terms upto and including time (n-1) and are the identication residuals,. Note that the speech signal model of eqn.(8) can be put into the form of eqn.(9). The parameter vector which contains both the signal and noise model parameters can be estimated using orthogonal techniques which can effectively overcome numerical illconditioning. The parameter estimation is performed by rst

n GrU F

4 $U 4 3Uyq v u 1 4 91 rU r GU C q

(6)

s X t

68 8 8 8 8 h@Y@8 I

4 

V i 1 VD 1 U R 5 $1 UTR 5 R D y

ixPuvstI w

Minimization of eqn.(4)with respect to equations which are of the form

s yields a set of linear

The orthogonal least squares algorithm (Billings et al,1989; Swain and Billings,1998) usually estimates the parameters of the model of the form (9)

4 F p6@@8 I 4  88

0 I 5

(2)

where . Thus, while the structure of the speech model under the clean environment resembles an AR(p) process, contamination of noise changes the structure to ARMA(p,p+1) process. Although all the spectral information of the original AR(p) process are preserved in the AR parameters of ARMA model of noisy process, the presence of noise related parameters makes the estimation of AR(p) parameters difcult. Application of Levinson-Durbin algorithm or any least squares based algorithm which uses an AR(p) model structure and ignores the MA part results in getting biased estimates. Several techniques based on the principle of instrumental variables and Yule-Walker method have been developed in the past (Stoica et al,1988) to yield asymptotically unbiased estimates of AR(p) parameters of general ARMA processes. The present study uses orthogonal least squares based algorithm (Billings et al, 1989) to estimate the parameters of the AR(p) process The algorithm is briey reviewed in the following section. Unfortunately eqn(8) is unsuitable for estimating the parameters ; since the noise terms are not usually known. However, these terms can be replaced by the one step ahead prediction residuals or by the background noise measured separately.
WITH

F GUTR j 7Yv

8 l

o 3 fH1

4 P 'YGrU R QH  GU

input excitation u(n), G is the gain of the lter, v(n) is the is the windowed speech additive noise, signal, s are the LPC coefcients characterizing the lter and is the residual error. Since it is customary to pass the speech signal through a preemphasis lter of the form to boost the signal spectrum by 6dB/octave, the model of eqn.(1) with preemphasis lter becomes

ig ig ` d mhe YW g k p 0 V ` d hfXe Wd A p 0 V X i i n j GU F 1 UTR 0 hf F 1 U R 5 0 ) f h 1  GU R 5

 S( G)r()r( & )( & G G4$@)( & y0 QF F 3 1 4 x w p p 13( & P F 1 & ED( % 2 f h 1  )( % & hC

6 S1 rU R 5

# BAYW ` X V 91 U R Q47GU R QtF 4f P H3 1 P H 6 S71 U R 5 ) 1 @5651 rU R 5 ) 1 888 1 4

G F D GU HcE1 U7P

 

u1@D1 Ae1 U R 5 a ) 1 D1 rU R 5 ) 1 GrU R X 5 888i d 4 

b V U ca `G

% # ! &$" 9 @8

R Y7GrU R 5 WVUS X5 1 T R

p p F D U ) f i h 1  GrU F i h vcE1 C i i

 S( v)r( % )r( & %

  i q rI g

 f

process takes the form of an Auto Regressive Moving Average with eXogenous input( ARMAX) process

' i  pI hD

88 I @8 P

GU R G

 ipI hD pf f ih

 0GU R 5

% # &'! f gi )

3 4 AG41 $ )1(0 ) & 2

 C

(7)

(8)

transforming model of eqn.(9) into

Normalised Mean Square Error

are orthogonal over where the terms the data records. The parameters of the auxiliary model in eqn.(10) can be estimated by (Billings et al,1989)

(12)

and represent all-pole gains for the processed where and its corresponding speech frame respectively and is the autocorrelation matrix of the clean signal. The present study considers and . IV. R ESULTS AND D ISCUSSION

The performance of the algorithm has been investigated by collecting speech samples from variety of situations and with different levels of noise. However, only the results of estimating the LPC parameters of one spoken word SEVEN are discussed here. The signal was sampled at 22050 Hz. A. Model Validation An window length of 512 samples with 200 samples of A critical step in any successful application of parametric overlapping were used in the analysis. The normalized mean estimation is to check the adequacy of the tted model i.e square error of all frames were computed both for clean and to check if the model is a true representation of the signal noisy signals with and without the noise model and is shown characteristics or just a curve t to one data set. Several in Fig-2. From Fig.2 it is obvious that the OLS algorithm techniques have been proposed in the past to validate a which appends a noise model to the conventional LPC model model (Soderstrom and Stoica,1989). In the present study the outperforms the all pole LPC model under noisy conditions. normalized mean square error and the Itakura-Saito distance To further demonstrate the effectiveness of the algorithm the have been used to validate the models. The normalized mean Itakura-Saito(IS) distance were compared and the results are shown in Fig.3. The results further proves that the OLS square error (NMSE) is dened as : algorithm gives better performance compared to the standard LPC based models. The adequacy of OLS based model was further studied by comparing the spectra of different segments of speech computed from LPC models and OLS models. (13) Fig.4 shows the spectra of segment-25 of the word SEVEN. where is the one step ahead prediction of the signal The results further establish that the models tted with OLS y(n). The distortion level between a processed and the original algorithm is superior than the LPC based models.

R 5  X

i 

R 5 

 !

 

ai

Only those terms with large values of ERR are selected to form the model. An advantage of the OLS estimator is that signicant parameters can be determined recursively and quite independently of the other terms. Furthermore, the estimation of the signal and noise model parameters can be decoupled. This is particularly useful for the identication of the model of equation(9). A parsimonious signal model is rst determined. This model is not affected by whatever noise model is produced later. The initial prediction errors are computed based on this signal model and a noise model can then be selected. A revised prediction error sequence is then calculated and an improved noise model is determined.

(14)

 ) i 

i 

f 3 4f ! 4 5 51 6i ai a !

 

1 2 u 0F )

 #  )  () # $ai ! ) & ! i ' f i ) %8  " a

 

Finally, parameters of the model of eqn.(9) can be computed from the . A criterion for selecting the most important terms in the model can be devised as a byproduct of the orthogonal parameter estimation procedure. Note from eqn.(9) that the maximum mean squared prediction error (MPSE) is achieved when no terms are included in the model. In this case, the MPSE equals where the over bar indicates time averaging. The reduction in the MPSE due to the inclusion of the i-th term, in the auxiliary model of eqn(10) is . Expressing this reduction as a fraction of the total MPSE yields the error reduction ratio (ERR)

 

i )

GU

a GU R 5 1 GU R 5 'GU R 5 1 GU R X 5

i i

a a

a GU p kA GU i GrU C p

n F GrU GGrU i
u U8 8 $$@8 I


GU a C p kA GU a p kA a i i

a 'GrU C 7GrU C 1  a 'GrU C 7GrU PCX 1

4 

i i
k A

0.9

III

DI

GU p i   i

hi  GU
f f
y

(10)

0.8

p A y

RT

0.7

0.6

0.5

0.4

II

0.3

(11)

0.2 I 0.1

i i

10

20

30 Frame Number

40

50

60

Fig. 2. Normalized mean square error : Curve-I NMSE of clean speech; Curve-II : NMSE of noisy speech at SNR=10 dB with OLS algorithm ; CurveIII : NMSE of noisy speech without noise model

signal can be regarded as a good objective measure to assess the quality of the processed signal (Quackenbush et al,1988). The Itakura-Saito (IS) or the log likelihood ratio is based on the dissimilarity between all-pole models of the clean reference (say) and the distorted speech wave form ( Itakura,1975). The IS method compares the LPC vectors and of two windowed speech wave forms and and can be dened as:

GU i i GU a C
 

GU X C

R &

12

10

Log Likelihood Distance

II 6

2 I 0

10

20

30 Frame Number

40

50

60

Fig. 3. Likelihood Distance, Curve-I : IS distance between models of clean and noisy speech with noise model; Curve-II : IS distance between models of clean and noisy speech without noise model
14

12

10

III

[2] Gong, Y(1995), Speech recognition in noisy environments: A survey , Speech Communication, vol.16, pp.261-291. [3] Hernando,J. and Nadeu,C (1997), Linear prediction of one-sided autocorrelation sequence for noisy speech recognition, IEEE Trans. Speech and Audio Processing, vol.5, No.1 pp.80-84. [4] Itakura, F. (1975), Minimum prediction residual principle applied to speech recognition, IEEE Trans. ASSP, vol.23, No. 1, pp.67-72. [5] Lee, C.H.(1988), On robust linear prediction of speech, IEEE Trans. ASSP, vol.36, pp.642-650 [6] Makhoul,J (1975) Linear prediction: A tutorial review, Proc. IEEE, vol.63, pp.561-580. [7] Morikawa,H. (1990), Adaptive estimation of time-varying model order in the ARMA speech analysis,IEEE Trans. ASSP, vol.38(7), pp.10731083. [8] Ramachandran, R.P., Zilovic, M.S and Mammone,R.J. (1995), A comparative study of robust linear predictive analysis methods with applications to speaker identication, IEEE Trans. Speech and Audio Processing, vol.3(2), pp.117-125. [9] Quackenbush,S.R, Barnwell-III, T. P and Clements, M.A (1988), Objective Measures of Speech Quality, NJ: Prentice Hall. [10] Shimamura,T. (2001), A performance comparison of robust speech analysis methods in noisy environments, Proc. Int. Symposium on Intelligent Multimedia, Video and Speech Processing, Hong Kong [11] Soderstrom, T and Stoica, P. (1989), System Identication , Prentice Hall, London. [12] Stoica,P., Friedlander, B. and Soderstrom, T. (1988), A high-order Yule-Walker method for estimation of the AR parameters of an ARMA model, Systems and Control Letters, vol.11, pp.99-105. [13] Swain A.K. and Billings, S.A. (1998), Weighted complex orthogonal estimator for identifying linear and nonlinear continuous time models from generalized frequency response functions, Mech. Systems and Signal Processing, vol.12, No.2, pp.269-292.

Spectral Amplitude

2 II I 0 0 50 100 150 Frame Number 200 250 300

Fig. 4. Spectral Density for segment no. 25; Curve-I Spectra of clean speech; Curve-II : Spectra of noisy speech at SNR=10 dB with OLS algorithm ; CurveIII : Spectra of noisy speech without noise mode

V. C ONCLUSIONS The performance of orthogonal least squares algorithm with structure selection have been illustrated considering examples of speech signals corrupted with high levels of noise. The algorithm is simple to implement; can provide a parsimonious model structure and has the feature of adding a noise model to the standard LPC model structure. The inclusion of noise model signicantly reduces the bias of the LPC parameters under noisy conditions. Results of simulation from spoken words demonstrate that the OLS algorithm performs better compared to the LPC based algorithm. ACKNOWLEDGMENTS Akshya Swain is thankful to the Board of Governors, National Institute of Technology, Rourkela, India for granting extra ordinary leave. Waleed Abdulla gratefully acknowledges the nancial support from University of Auckland, New Zealand under NSRF grant ref. 3602239/9273. R EFERENCES
[1] Billings, S.A., Chen,S and Korenberg,M.J (1989), Identication of MIMO nonlinear systems using a forward regression orthogonal estimator, Int. J. Control, vol.49, No.6, pp.2157-2189.

You might also like