Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

H, FILTERING FOR SPEECH ENHANCEMENT

Xuemin Shen, Li Deng and Anisa Yasmin


Dept. of Elec. & Comp. Eng.,Univ. of Waterloo, Waterloo, Ontario, Canada N2L 3G1

ABSTRACT differs from the traditiond-modified WienerlKalman filtering ap-


proach in the following two aspects: 1) No U priori knowledge
In t h i s paper, a new approach based on the R, filtering is presented of the noise statistics is required. The only assumption is that the
for speech enhancement. This approach differs from the traditional noise signals have finite energy; and 2) The estimation criterion in
modified Wiener/Kalman filtering approach in the following two as- the H, filter design is to minimize the worst possible effects of
pects: 1) no U priori knowledge of the noise statistics is required; the modeling errors and additive noise on the signal estimation er-
instead the noise signals are only assumed to have finite energy; 2) rors. Since the noise added to speech is not Gaussian in general,
the estimation criterion for the filter design is to minimize the worst this filtering approach appears highly robust and more appropriate
possible amplificationof the estimation error signal in terms of the in practical speech enhancement. Furthermore, the Ho0 filtering al-
modeling m rs and additive noises. Since most additive noises in gorithm is straightfoward to implement. Our experimental resulrs
speech are not Gaussian, this approach is highly robust and is more have demonstrated that the filtering performance of the HODesti-
appropriate in practical speech enhancement. The global signal-to- mation noticeably superior to that of the Kalman estimation. The
noise ratio (SNR),time domain speech representation and listening remainder of this paper is organized as follows. In Section 2, the
evaluations are used to verify the performance of the H, filtering speech source model (vocal tract) is characterized by an all pole fil-
algorithm. Experimentedresults show that the filteringperformance ter. This speech source model and the observation model are then
is better than other speech enhancementapproaches in the literaaue combined into a canonical state space f o d a t i for the speech en-
under similar experimental conditions. hancement problem. Based on this formalati, Section 3 presents
a H- filtering algorithm. In section 4, we investigate the perfor-
1. INTRODUCTION mance of the H- filter for the speech enhancement, and compare
with that of the Kalman filter. Conclttsionsand discussionsare fol-
Noise contaminated speech nsults in various degms of reduction lowed.
of speech discrimination. For example, background acoustic noise
degrades speech signal quality of mobile telephone systems; air- 2. PROBLEM FORMULATION
plane engine noise affects the conversation between a pilot and an
air traffic controller. With the objectiveof enhancingthe quality and Short segments of speech can be npresenced by the response of an
intelligibility of speech, speech enhancementinvolves manipulation all pole filter which models the vocal tract [I]. The filter is ex-
of the contaminated speech signal to mitigate noise effects. In order cited by a pulse train separatedby the pitch period for voice sounds,
to enhancethe quality and intelligibility of speech. speech enhance- or pseudorandom noise for unvoiced sounds. Thus the speech zt
ment involves processing of speech signal to improve these aspects. (clean speech) is given by
There have been numerous studies involving this subject [1]-[4].
Based on stochastic speech models, the previous studies have been
focused on the minimization of the variance of the estimation error
of the speech, i.e.. the celebrated Wiener and/or Kalman filtering
approach. This type of estimation assumes that both dynamics of where n is the number of modeled poles, ais arc the tap-gain param-
signal generating processes and the statistical properties of noise eters characterizingthe filter and W k is an excitation. If ‘the speech
sources are known in advance. However, this unrealistic assump- signal is corrupted with a noise signal u t , the observed noisy speech
tion naturally limit the application of the estimators since in many signd 8 k is describedas follows
situations only approximate signal models are available and/or the
statistics of the noise sources are not fully known or unavailable. Sk = Z k +Vk (2)
Furthermore, both Wiener and Kalman estimators may not be suf-
The speech generating mechanism (1) and observation process (2)
ficiently robust to the signal model errors. In this paper. a new ap-
are illustrated in Figure 1.
proach based on the ncently developed optimal filtering [51-[101-
H, filtering is presented to speech enhancement. This approach Equations (1)-(2) can be represented by the following state-space

873
wk lrPEEChlxk!
generator
UL - - - - - L
measurement
system
-
? - * -I vk- ,
l s k F /
fdter
U
is minimum. The estimation e m r e k is defined by the equation
ek =Xk -2k

For the state-space model (3)-(4), the Kalman filtering algorithm is


given by
a k = &k-1 + Kk[Sk -C ~ d k - l ]
(6)

(7)
with the initial condition XO= [OIn x l . The filter gain and error
variance equations are
Figure 1: The noisy speech generating and filtering mechanism.
K k = PkIk._1c[R.f CPkp-lCT]-l (8)
model [2]-[3] Pklk-1 = A P k - i A T + BQBT (9)
p k = [I- K k c ] P k I k - 1 (10)
where K k is a Kal" gain Vector, P k l k - 1 = E [ ( X k
Xk,k-l)*(+ -akp,, )] is a priori error covariance mamx, P k =
E [ ( x k - X k ) T ( X k - 8,')
is a posteriori error covariance matrix.

I 0
0
1
0 O1 ...
. . . O0
. ... . .
0 \
O
The initial condition PO = [ O l n x n . I is an n x n identity matrix.
The estimated speech sample e k can be obtained by

... . . i k =c x k (1 1)
0 0 0 ... 0 1 If the additive noise { U k } is a colored Gaussian process, the %an
an Un-1 an-2 -.. U? 01 filter algorithm for such speech estimation is given in [3].
BT = C=[OO ... 011 3.2. H , Filtering Algorithm
The objective of speech enhancement is to estimate Z k given
{ s i } (0 5 i 5 k). Previous studies have assumed that both W k Consider the state space model (3)-(4), we make no assumptionson
and U k are white, uncomlated Gaussian processes in order to de- the nature of unknown quantities W k and U k , and are interested not
veloping enhancement algorithms. i.e. to estimate X k [1]-[4]. How- necessarily in the estimation of x k but X k using the observations
ever neither the speech nor the noise may be Gaussian. This is be- {Si} (0 5 i 5 k). Let
cause W k could be a pulse train for voiced speech, random noise Zk =c x k (1 2)
for unvoiced speech or the modeling mor, U k could be any kind of Different from that of the Kalman filter which minimizes the vaxi-
noise. The Gaussian assumptions may provide an estimate which is ance of the estimation error, the design criterion for the H- filter
highly vulnerable to statistical outliers, i.e., a small number of large is to provide an uniformly small estimation error, e k = Z k - i k . for any
measurement errors would have a large influence on the resulting W k , U k E 22 and XO E 72". NO= that Z k i s equal to X k . The
estimate, so that the viability of the algorithms was checked by ex- measure of performance is then given by
periment [3]. In the following, we present an new approach based
on the Boofiltering algorithm for speech enhancement, where both
W k and V k may not be white or colored Gaussian processes. For
comparison, the Kalman filtering algorithm is briefly reviewed.
where((X0 - 2 o ) , W k , U k ) +
~,Xoisanapnbn'estimateof~o,
3. KALMANANDH,FIL!T"G Q 2 0 , ~ ; ' > 0,W > OandV > Oaretheweightingmatrices,
ALGORITHMS and are left to the choice of the designer and depend on perfor-
mance requirements. The notation IZk; 1 is defined as the square of
3.1. Kalman Filtering Algorithm the weighted (by Q) h n o m O f Z k , i.e., lZk1$ = z z Q z k . The irdo
filterwill search z k such that the optimal estimate Z k among ail pos-
In the Kalman filtering, the clean speech signal { Z k } is considered sible &(i.e. the worsetase perfonnance measure) should satisfy
to be a random process. Assuming that both exciting term W k and supJ<r2 (14)
observation noise U k (additive noise) are white Gaussian processes
with zero mean and uncomlated variances Q and R where "sup" stands for supremum and 7 > 0 is a prescribed level
of noise attenuation. The E, filtering can be interpreted as a
E { w k d } = Q mini- problem where the estimator strategy i k play against the
E{vkvr} = R exogenous inputs W k , U k and the initial sfate XO.The performance
criterion becomes
E{wkUr) = 0.
E means ezpectation. The design objective of "anfilter is to
determine the optimai estimate i k based on the (3;) (0 5 i 5 k) N
such that
Pk = E { e k e z ) (5)

874
where '"in" stands for minimization and "max" maximization. 4. EXPERIMENTAL RESULTS
Note that unlike the traditional minimum variance filtering ap-
proachwiener and/or Kalman filtering), the lTm filtering deals Both Kalman and IToo filtering algorithms described in Section 3
with deterministic disturbances and no a priori knowledge of the are applied to speech enhancement as follows.
noise statistics is required. Since the the observation s k is given, V k
can be uniquely detennined by (2) once the optimal values of W k Noisy speech is divided into equal-length segments. Within each
and xo are found, and z k = cxk,& = c x k , We C a n rewrke the segment, we first estimate the tap-gain parameters uj j = 1,...,n,
performance criterion (15 ) as then filter the noisy speech with the parameters {aj}. The Kalman
and H, filtering algorithms are initialized only for the first seg-
ment. In our experiment, we choose the state vector XO= 0, and
weight matrix po >> 0. In the subsequent segments, XOand f i
w are initialized using the corresponding last values from the previ-
ous segment. There exists a trade off in the choice of the length
of the segments. Large segments improve the accuracy of the pre-
diction parameters for stationary sounds (e.g. vowels) and short
where (3 = CTQC.In [9]-[lo], it has been proved that the follow- segments improve the accuracy for nonstationary sounds. In our
ing theorem presents a complete solution to the HODfiltering prob experiment, the segment length used for calculating the parameters
lem for the state-space model (3)-(4) with the performance criterion
(16).
.
{ u j , i = 1,2,.. n} is 128 samples, which corresponds to 16 ms
(with a sampling frequence of 8 KHz). The order of the all pole filter
is 10, which is a commonly used value in linear predictive analysis
of speech signal and the input S N R is 5 dB.Two types of noise are
Theorem : Let 7 > 0 be a p-rescribed level of noise attenuation. used: white noise (stationary) and helicopter noise (nonstationary).
Then, there exists an Hm filter for Z k if and only if there exists a The performance of both filtering algorithms is measured in tenns
stabilizing symmetric solution pk > 0 to the following discrete- of SNR, time domain speech representation, and listening evalua-
time Riccati equation tion. In the testing, the order of state space model is set to be equal
pk+1 = APkAT - APkCT(V + CPkCT)-'CPkAT (17) to the order of the all pole filter. Experiment results obtained for
the sentence "Woe beride the interviewee if he answered vaguely"
+BWBT - 7-'pk+iLT(Q-' + 7-'LA+1LT)-'LPk+1 are shown in the following tables. Table 1 shows the output S N R
Po = Cpc' + 7-*LQLT)-'. results for the testing where the weightings are W = 1, V = 3,the
attenuation parameter 7 = 1.02, andpo = 1OOO. The S N R values
If this is the case, then an lT, filter can be given by
throughout this paper are the global signal to noise ratios calculated
i k = c x k , k = l Y 2y . . . , N (18) by
where
&k = &k-1 + H k ( S k - C&k-l), 3 0 =0 (19)
where N is the total number of samples of each sentence, 2 k is
Hk is the gain of the ri, filter and given by the clean (noise-&) sequence and % is the enhanced speech. Ta-
ble l gives the performance comparison between the Kalman and
H k = APkCT(V + CP*CT)" (20) HoOfiltering algorithms for the situations that the speech signal is
Solving Riccati equation (17) for the solution Pk is not trivial contaminated by white Gaussian noise and helicopter noise respec-
due to its nonlinearity. Applying the following maaix inversion tively. In both cases, the IT, filtering algorithm has over 1.0 dB
le"a(MIL) advantage in the output S N R values over the Kalman filtering algo-
rithm [3]. Figure 2 shows the clean speech signal. Figures 3 and
- + +
A AB(C BTAB)"BTR = (A-' BC"BT)", (21) 4 show the original noisy speech signals (contaminated by white
(17) can be written as Gaussian noise and by helicopter noise respectively) as wed as the
corresponding noise suppressed signals in the time domain by us-
PF:l = [A(PF' +CTV"C)"AT+BWBT]-' - T ' ~ C ~ Q C ing the H,. filtering algorithm. Informal subjective tests c o n h
(22) the improvement of SNR values, intelligibility and voice quality.
so that we can obtain the solution of Pk from (22) recursively. The figures clearly show the effectiveness of the Hm filtering for
speech enhancement.
Comparing the Kalman filtering algorithm (7>( 10) and the H, fil-
tering algorithm (17>(20), we can observe
Table 1: Performance Comoarison with h u t Sm=5 dB
1) The Kalman filtering algorithm gives the minimum mean-square-
error estimate of xk based on the {si} ( 0 5 i5 k); Algorithm White Noise I Helicopter Noise
KalIXlan 8.7276 I 8.9119
2) The Ifmfiltering algorithm gives the optimal estimate of z k I EI, I 9.8781 I 10.0693 I
based on the {si} (0 5 i 5 k) such that the effect of the worst
disnrrbance(noises)on the estimation error is minimized.

875
Figure 2: The clean speech signal.

5. CONCLUSIONS
-x“
-4ooo0 0.5 1

Sample”&
15 2

x lo’

A new speech enhancement method based on the Bwfiltering has


been developed. This method exploits a speech production model
without requiring the knowledge of noise statistics. The effective-
ness of the method has been demonstrated based on measurement
data and computer simulations. Since the design criterion is based
on the worst case disturbance, the H, filtering approachis less sen-
sitive to uncertainty in the exogenous signal statistics and system
model dynamics.

6. REFERENCES
1. J.S. Lim and A.V. Oppenheim, “All-Pole Modeling of De-
graded Speech’’, IEEE T m .Acowt., Speech, Signal P m e s s - Figure 3: The white Gaussian noise contaminated speech signal
ing, Vol. ASSP-26,197-210,1978. and the estimated speech signal using the H, filtering.
2. K.K. Paliwal and A. Basu, ‘A Speech Enhancement Method
Based on Kalman Filtering”, Proc. IEEE ICASSP, 177-180,
1987.
3. J.D.Gibson, B.Koo and S.D. Gray, “Fiitcring of Colored Noise
for Speech Enhancement and Coding”, IEEE Tiuns. S i g d
Proce$sing, Vol. SP-39,1732-1742,1991. B
U
4. Y. Ephraim. “A Mini” Mean Squan Error Approach for s o
Speech Enhancement”, Proc. IEEE ICASSP, 829-832,1990. E
4.
5. R.N. Banavar and JL.Speyer, “A Linear Quadratic Game The- -2ooo-
ory Approach to Estimation and Smoothing”. Pmc. IEEEACC,
2818-2822,1991.
6. M.J.Grimble and A. Elsayed, “Solution of the H, Optimal
Linear Fiitexing Problem for Discrete-Time Systems”, IEEE
Trans. Acout., Speech, Signal Processing, Vol. ASSP-38,
1092-1104,1990.
7. U. Shaked and Y. Theodor, “ H ~ O ~ t i mEstimation:
al A Tuto-
rial“, PWC.31st IEEE CDC, 2278-2286,1992.
8. KM. Nagpal and P.P. Khargonekar, “Filming and Smoothing
in an Hw Setting’’, IEEE Trans. Auto. Control, Vol. AC-36,
152-166,1991. -2wo.

9. X. Shen and L. Deng, “Discrete H, Filter Design with Appli-


cation to Speech Enhancement”. Proc. IEEE ICASSP’95, De-
mit, 1504-1507.1995.
10. B. Hassibi and T. Kailath, “If, Adaptive Filtering”, Proc.
IEEE ICASSP’95, Demit, 949-952,1995. F’igure 4: The helicopter noise contaminated speech signal and the
estimated speech signal using the Ho0 filtering.

876

You might also like