Linear Predict

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

121

Linear Predict
7. Linear Prediction
J. Benesty, J. Chen, Y. Huang

7.3 Backward Linear Prediction ................... 123


Linear prediction plays a fundamental role in

Part B 7
all aspects of speech. Its use seems natural and 7.4 Levinson–Durbin Algorithm ................... 124
obvious in this context since for a speech signal the
value of its current sample can be well modeled 7.5 Lattice Predictor ................................... 126
as a linear combination of its past values. In
7.6 Spectral Representation ........................ 127
this chapter, we attempt to present the most
important ideas on linear prediction. We derive 7.7 Linear Interpolation ............................. 128
the principal results, widely recognized by speech
experts, in a very intuitive way without sacrificing 7.8 Line Spectrum Pair Representation......... 129
mathematical rigor. 7.9 Multichannel Linear Prediction .............. 130
7.10 Conclusions .......................................... 133
7.1 Fundamentals ...................................... 121
7.2 Forward Linear Prediction ..................... 122 References .................................................. 133

7.1 Fundamentals
Linear prediction (LP) is a fundamental tool in many source produces unvoiced or fricated sounds such as
diverse areas such as adaptive filtering, system identifi- the fricatives. The parameters, al , determine the spec-
cation, economics, geophysics, spectral estimation, and tral characteristics of the particular sound for each of
speech. Recently, a nice history of LP in the context of the two types of excitation and are widely used directly
speech coding was written by Atal [7.1]. Readers are in many speech coding schemes and automatic speech
invited to consult this reference for more information recognition systems [7.4].
about this topic and how it has evolved. Equation (7.1) can be rewritten in the frequency do-
Linear prediction is widely used in speech applica- main, by using the z-transform. If H(z) is the transfer
tions (recognition, compression, modeling, etc.) [7.2,3]. function of the system, we have:
This is due to the fact that the speech production process G
is well modeled with LP. Indeed, it is well recognized H(z) = L
1 − l=1 al z −l
that a speech signal can be written in the following
form [7.4, 5], G
= , (7.2)
L A(z)
x(k) = al x(k − l) + Gu(k) , (7.1) which is an all-pole transfer function. This filter [H(z)]
l=1 is a good model of the human vocal tract [7.2]. Our main
where k is the time index, L represents the number of concern is to determine the predictor coefficients, al , l =
coefficients in the model (the order of the predictor), 1, 2, · · · , L, and to study the properties of the filter A(z).
al , l = 1, · · · , L, are defined as the linear prediction The applications of LP are numerous. Before ad-
coefficients, G is the gain of the system, and u(k) is dressing the estimation of LP coefficients, we give some
the excitation signal, which can be either a quasiperi- examples to show the importance of LP. In many aspects
odic train of impulses or a random noise source (also of speech processing (noise reduction, speech separa-
a combination of both signals for voiced fricatives such tion, speech dereverberation, speech coding, etc.), it is
as ‘v’, ‘z’, and ‘zh’). The periodic source produces of great interest to compare the closeness of the spec-
voiced sounds such as vowels and nasals, and the noise tral envelope of two speech signals (the desired and
122 Part B Signal Processing for Speech

the processed ones) [7.6, 7]. One way of doing this is Like the Itakura distance, this measure is not symmetric
through comparing their LP coefficients. Consider the either; therefore, it is not a true metric.
two speech signals x(k) (desired) and x̂(k) (processed). The Itakura–Saito distance has many interesting
Without entering too much into the details, one possible properties. It has been shown that this measure is highly
measure to evaluate the closeness of these two signals is correlated with subjective quality judgements [7.6]. For
the Itakura distance: example, a recent report on speech codec evaluation re-
Ex veals that, if the Itakura–Saito measure between two
IDx x̂ = ln ,
Part B 7.2

(7.3) speech signals is less than 0.5, the difference in their


E x̂
mean opinion score would be less than 1.6 [7.9]. Many
where E x and E x̂ are the prediction-error powers of the
other reported experiments also confirmed that when
signals x(k) and x̂(k), respectively (see the following
the Itakura–Saito distance between two speech signals
sections for more details). Note that the Itakura distance
is below 0.1, they would be perceived nearly identi-
is not symmetric, i. e.,
cally by human ears. As a result, the Itakura–Saito
IDx x̂ = IDx̂x , (7.4) distance, which is based on LP, is often used as an
objective measure of speech quality. It is probably the
therefore, it is not a distance metric. However, asym-
most widely used measure of similarity between speech
metry is usually not a problem for applications such as
signals.
speech quality evaluation.
The two previous examples of the vocal-tract fil-
A more-powerful distance was proposed by Itakura
ter and the speech quality measure clearly show the
and Saito in their formulation of linear prediction as
importance of LP in speech applications.
an approximate maximum-likelihood estimation [7.8].
In this chapter, we study the theory of linear predic-
This distance between the two signals x(k) and x̂(k) is
tion and derive the most important LP techniques that are
defined as,
often encountered in many speech applications. We as-
E E
ISDx x̂ = x̂ − ln x̂ − 1 . (7.5) sume here that all signals of interest are real, stationary,
Ex Ex and zero mean.

7.2 Forward Linear Prediction


Consider a stationary random signal x(k). The objective We would like to find the optimal Wiener predictor.
of the forward linear prediction is to predict the value For that, we seek to minimize the mean-square error
of the sample x(k) from its past values, i. e., x(k − 1), (MSE):
x(k − 2), etc. We define the forward prediction error  
Jf (a L ) = E e2f,L (k) , (7.7)
as [7.10, 11],
where E{·} denotes mathematical expectation. Taking
ef,L (k) = x(k) − x̂(k)
the gradient of Jf (a L ) with respect to a L and equating to

L
0 L×1 (a vector of length L containing only zeroes), we
= x(k) − a L,l x(k − l) easily find the Wiener–Hopf equations:
l=1
= x(k) − aTL x(k − 1) , (7.6) R L ao,L = rf,L , (7.8)

where the superscript ‘T’ denotes transposition, x̂(k) is where the subscript ‘o’ in ao,L stands for optimal,
the predicted sample, R L = E{x(k − 1)xT (k − 1)}
a L = [a L,1 a L,2 · · · a L,L ]T = E{x(k)xT (k)}
⎛ ⎞
is the forward predictor of length L, and r(0) r(1) · · · r(L − 1)
⎜ ⎟
⎜ r(1) r(0) · · · r(L − 2)⎟
x(k − 1) = [x(k − 1) x(k − 2) · · · x(k − L)]T =⎜⎜ .. .. .. .. ⎟
⎟ (7.9)
⎝ . . . . ⎠
is a vector containing the L most recent samples starting
with and including x(k − 1). r(L − 1) r(L − 2) · · · r(0)
Linear Prediction 7.3 Backward Linear Prediction 123

is the correlation matrix, and where ‘det’ stands for determinant.


rf,L = E{x(k − 1)x(k)} Let us now write the forward prediction errors for
the optimal predictors of orders L and L − i:
= [r(1) r(2) · · · r(L)]T (7.10)
is the correlation vector. The matrix R L has a Toeplitz 
L
structure (i. e., all the entries along the diagonals are the ef,o,L (k) = x(k) − ao,L,l x(k − l) , (7.16)
same); assuming that it is nonsingular, we deduce the l=1

Part B 7.3
optimal forward predictor: 
L−i

ao,L = R−1 ef,o,L−i (k) = x(k) − ao,L−i,l x(k − l) . (7.17)


L rf,L . (7.11)
l=1
Expanding e2f,L (k) in (7.7) and using (7.8) shows that
the minimum mean-square error (MMSE), From the principle of orthogonality [7.11], we know
Jf,min = Jf (ao,L ) that:
= r(0) − rf,L
T
ao,L = E f,L . (7.12) E{ef,o,L (k)x(k − 1)} = 0 L×1 . (7.18)
This is also called the forward prediction-error power.
Define the augmented correlation matrix: For 1 ≤ i ≤ L, we can verify by using (7.18), that:

T
r(0) rf,L
R L+1 = , (7.13) E{ef,o,L (k)ef,o,L−i (k − i)} = 0 . (7.19)
rf,L R L
equations (7.8) and (7.12) may be combined in a con- As a result,
venient way:
lim E{ef,o,L (k)ef,o,L−i (k − i)}
1 E f,L L→∞
R L+1 = . (7.14)
−ao,L 0 L×1 = E{ef,o (k)ef,o (k − i)} = 0 . (7.20)
We refer to (7.14) as the augmented Wiener–Hopf equa-
This indicates that the signal ef,o (k) is a white noise. So
tions of a forward predictor of order L. From (7.13) we
the optimal forward predictor has this important property
derive that,
of being able to whiten a stationary random process,
det(R L+1 ) = E f,L det(R L ) , (7.15) provided that the order of the predictor is high enough.

7.3 Backward Linear Prediction


The aim of the backward linear prediction is to predict leads to the Wiener–Hopf equations:
the value of the sample x(k − L) from its future val- R L bo,L = rb,L , (7.23)
ues, i. e., x(k), x(k − 1), · · · , x(k − L + 1). We define the
backward prediction error as, where
eb,L (k) = x(k − L) − x̂(k − L) rb,L = E{x(k)x(k − L)} (7.24)


L = [r(L) r(L − 1) · · · r(1)] . T

= x(k − L) − b L,l x(k − l + 1) Therefore, the optimal backward predictor is:


l=1
bo,L = R−1
L rb,L . (7.25)
= x(k − L) − bTL x(k) , (7.21)
The MMSE for backward prediction,
where x̂(k − L) is the predicted sample,
Jb,min = Jb (bo,L )
b L = [b L,1 b L,2 · · · b L,L ]T
= r(0) − rb,L
T
bo,L = E b,L , (7.26)
is the backward predictor of order L, and
is also called the backward prediction-error power.
x(k) = [x(k) x(k − 1) · · · x(k − L + 1)]T . Define the augmented correlation matrix:

The minimization of the MSE, R L rb,L
  R L+1 = T , (7.27)
Jb (b L ) = E e2b,L (k) , (7.22) rb,L r(0)
124 Part B Signal Processing for Speech

 
equations (7.23) and (7.26) may be combined in a con- E eb,o (k)eTb,o (k) = LR L LT . (7.32)
venient way:
By definition, the previous matrix is symmetric. The
−bo,L 0 L×1 matrix product R L LT is a lower triangular matrix be-
R L+1 = . (7.28)
1 E b,L cause of (7.28) and the main diagonal contains the
We refer to this expression as the augmented Wiener– backward prediction-error powers E b,l (0 ≤ l ≤ L − 1).
Hopf equations of a backward predictor of order L. Since L is also a lower triangular matrix, the product
Part B 7.4

One important property of backward predic- between the two matrices L and R L LT should have
tion is that the error signals of different orders the same structure and, since it has to be symmet-
with the optimal predictors are uncorrelated, i. e., ric, the only possibility is that this resulting matrix is
E{eb,o,i (k)eb,o,l (k)} = 0, i = l, i, l = 0, 1, · · · , L − 1. To diagonal:
prove this, let us rewrite the error signal in vector form:  
E eb,o (k)eTb,o (k) = diag[E b,0 , E b,1 , · · · , E b,L−1 ] ,
eb,o (k) = Lx(k) , (7.29) (7.33)

where and hence the prediction errors are uncorre-


eb,o (k) = [eb,o,0 (k) eb,o,1 (k) · · · eb,o,L−1 (k)]T lated.
Furthermore,
(7.30)
and LR L LT = diag[E b,0 , E b,1 , · · · , E b,L−1 ] , (7.34)
⎛ ⎞
1 0 0 ··· 0 taking the inverse of the previous equation,
⎜ ⎟ −1 −1 
⎜ −bTo,1 1 0 ··· 0⎟
⎜ ⎟ L−T R−1
L L
−1
= diag E b,0 −1
, E b,1 , · · · , E b,L−1 ,
⎜ 0⎟
L = ⎜ −bo,2 1 ···
T
⎟ (7.31) (7.35)
⎜ .. .. .. .. ⎟
⎜ ⎟
⎝ . . . .⎠ we finally get:
−bTo,L−1 1 −1 −1 
R−1 −1
L = L diag E b,0 , E b,1 , · · · , E b,L−1 L .
T
(7.36)
is a lower triangular matrix with 1s along its main diag-
onal. The covariance matrix corresponding to the vector Expression (7.36) defines the Cholesky factorization of
signal eb,o (k) is: the inverse matrix R−1
L [7.10, 12].

7.4 Levinson–Durbin Algorithm


The Levinson–Durbin algorithm is an efficient way to We can easily check that:
solve the Wiener–Hopf equations for the forward and
backward prediction coefficients. This efficient method RL JL = JL RL . (7.37)
can be derived thanks to the Toeplitz structure of
the correlation matrix R L . This algorithm was first
invented by Levinson [7.13] and independently refor- The matrix R L is said to be persymmetric. We also
mulated at a later date by Durbin [7.14, 15]. Burg gave have, rf,L = J L rb,L . If we left-multiply both sides of the
a more-elegant presentation [7.16]. Before describing Wiener–Hopf equations (7.23) by J L , we get:
this algorithm, we first need to show some important
relations between the forward and backward predictors.
We define the co-identity matrix as: J L R L bo,L = J L rb,L = rf,L
⎛ ⎞
0 0 ··· 0 1 = R L J L bo,L = R L ao,L , (7.38)
⎜0 0 · · · 1 0⎟
⎜ ⎟
⎜. . . . .⎟
J L = ⎜ .. .. . . .. .. ⎟

⎟. and, assuming that R L is nonsingular, we see that:
⎜ ⎟
⎝0 1 · · · 0 0⎠
ao,L = J L bo,L . (7.39)
1 0 ··· 0 0
Linear Prediction 7.4 Levinson–Durbin Algorithm 125

Furthermore, tions:
ao,L−1 bo,L−1
E b,L = r(0) − rb,L
T
bo,L ao,L = − κL , (7.47)
0 −1
= r(0) − rb,L J L J L bo,L
T

E L = E L−1 (1 − κ L2 ) , (7.48)
= r(0) − rf,L
T
ao,L
ao,L,L = κ L . (7.49)
= E f,L = E L . (7.40)

Part B 7.4
Iterating on the prediction-error power given in (7.48),
Therefore, for a stationary process, the forward and we find that,
backward prediction-error powers are equal and the L
 
coefficients of the optimal forward predictor are the E L = r(0) 1 − κl2 , (7.50)
same as those of the optimal backward predictor, but l=1
in a reverse order. and since E L ≥ 0, this implies that,
The Levinson–Durbin algorithm is based on recur- |κl | ≤ 1, ∀l ≥ 1 . (7.51)
sions of the orders of the prediction equations. Consider
Also, from (7.48) we see that we have,
the following expression,
⎛ ⎞ ⎛ ⎞ 0 ≤ El ≤ El−1 , ∀l ≥ 1 , (7.52)
1 E L−1
R L rb,L ⎜ ⎟ ⎜ ⎟ so, as the order of the predictors increases, the
T ⎝−ao,L−1 ⎠ = ⎝0(L−1)×1 ⎠ , (7.41) prediction-error power decreases.
rb,L r(0)
0 KL Table 7.1 summarizes the Levinson–Durbin algo-
rithm, whose arithmetic complexity is proportional to
where
L 2 . This algorithm is much more efficient than stand-
K L = r(L) − ao,L−1
T
rb,L−1 ard methods such as the Gauss elimination technique,
whose complexity is on the order of L 3 . The saving in
= r(L) − ao,L−1
T
J L−1rf,L−1 . (7.42) number of operations to find the optimal Wiener predic-
tor can be very important, especially when L is large.
We define the reflection coefficient as, The other advantage of the Levinson–Durbin algorithm
KL is that it gives the predictors of all orders and the al-
κL = . (7.43) gorithm can be stopped if the prediction-error power is
E L−1
under a threshold, which can be very useful in practice
From backward linear prediction, we have: when the choice of the predictor order is not easy to get in
⎛ ⎞ ⎛ ⎞ advance. A slightly more-efficient approach, called the
0 KL
T
r(0) rf,L ⎜ ⎟ ⎜ ⎟ split Levinson algorithm, can be found in [7.17]. This
⎝−bo,L−1 ⎠ = ⎝0(L−1)×1 ⎠ . (7.44) algorithm requires roughly half the number of multipli-
rf,L R L
1 E L−1 cations and the same number of additions as the classical
Levison–Durbin algorithm. Even more-efficient algo-
Multiplying both sides of the previous equation by κ L ,
rithms have been proposed (see, for example, [7.18]) but
we get,
⎛ ⎞ ⎛ ⎞ they are numerically unstable, which is not acceptable
0 κ L2 E L−1 in most speech applications.
⎜ ⎟ ⎜ ⎟
R L+1 ⎝−κ L bo,L−1 ⎠ = ⎝0(L−1)×1 ⎠ . (7.45)
Table 7.1 Levinson–Durbin algorithm
κL KL
If we now subtract (7.45) from (7.41), we obtain, Initialization: E 0 = r(0)
⎛ ⎞ For 1 ≤ l ≤ L
1
1 
⎜ ⎟ E L−1 (1 − κ L2 ) κl = r(l) − aTo,l−1 Jl−1 r f,l−1
R L+1 ⎝κ L bo,L−1 − ao,L−1 ⎠ = . El−1
0 L×1
−κ L ao,l−1 −1
ao,l = − κl Jl
(7.46) 0 ao,l−1
 
Assuming that R L+1 is nonsingular and identifying El = El−1 1 − κl 2

(7.46) with (7.14), we can deduce the recursive equa-


126 Part B Signal Processing for Speech

7.5 Lattice Predictor


In this section, we will show that the order-recursive depicts the l-th stage of a lattice predictor. For the whole
structure of the forward and backward prediction er- lattice predictor, L of these stages are needed and are
rors has the form of a ladder, which is called a lattice connected in cascade, one to each other, starting from
predictor. order 0 to order L.
Inserting (7.47) into the forward prediction error for Now let us compute the variance of eb,o,L (k) from
Part B 7.5

the optimal predictor of order L, (7.57),


 
ef,o,L (k) = x(k) − ao,L
T
x(k − 1) , (7.53) E e2b,o,L (k) = E L
we obtain, = E L−1 + κ L2 E L−1
ef,o,L (k) = ef,o,L−1 (k) − 2κ L E{ef,o,L−1 (k)eb,o,L−1 (k − 1)}
   
= E L−1 1 − κ L2 . (7.59)
− κ L −bTo,L−1 1 x(k − 1) . (7.54)
Developing the previous expression, we obtain,
The second term (without the reflection coefficient) on
the right-hand side of (7.54) is the backward prediction E{ef,o,L−1 (k)eb,o,L−1 (k − 1)}
error, at time k − 1, for the optimal predictor of order κL =
E L−1
L − 1. Therefore, (7.54) can be rewritten E{ef,o,L−1 (k)eb,o,L−1 (k − 1)}
=     . (7.60)
ef,o,L (k) = ef,o,L−1 (k) − κ L eb,o,L−1 (k − 1) . (7.55) E e2f,o,L−1 (k) E e2b,o,L−1 (k − 1)
If we insert (7.47) again into the backward prediction
We see from (7.60) that the reflection coefficients are
error for the optimal predictor of order L,
also the normalized cross-correlation coefficients be-
eb,o,L (k) = x(k − L) − bTo,L x(k) tween the forward and backward prediction errors,
which is why they are also often called partial correlation
= x(k − L) − ao,L
T
J L x(k) , (7.56)
(PARCOR) coefficients [7.3, 19, 20]. These coefficients
we get, are linked to the zeroes of the forward prediction-error
FIR filter of order L, whose transfer function is
eb,o,L (k) = eb,o,L−1 (k − 1) − κ L ef,o,L−1 (k) . (7.57)

L 
L
 
If we put (7.55) and (7.57) into a matrix form, we have, Ao,L (z) = 1 − ao,L,l z −l = 1 − z o , lz −1 ,
l=1 l=1
ef,o,L (k) 1 −κ L ef,o,L−1 (k) (7.61)
=
eb,o,L (k) −κ L 1 eb,o,L−1 (k − 1)
where z o,l are the roots of Ao,L (z). Since κ L = ao,L,L ,
L
1 −κl x(k) we have,
= , (7.58)
l=1
−κl 1 x(k − 1) 
L
κ L = (−1) L+1 z o,l . (7.62)
where we have taken for initial conditions (order 0),
l=1
ef,o,0 (k) = x(k) and eb,o,0 (k − 1) = x(k − 1). Figure 7.1
The filter
  Ao,L (z) can be shown to be minimum phase,
i. e., z o,l  ≤ 1, ∀l. As a result [because of the relation
ef,o,L–1 (k) ef,o,L (k) (7.39)], the filter Bo,L (z) corresponding to the backward
predictor is maximum phase. We will now show this
very important property that the forward predictor is
kL minimum phase. As far as we know, this simple and
elegant proof was first shown by M. Mohan Sondhi but
was never been published. A similar proof can be found
eb,o,L–1 (k) z –1 eb,o,L (k) in [7.21] and [7.22].
To avoid cumbersome notation, redefine the coeffi-
Fig. 7.1 Stage l of a lattice predictor cients wl = −ao,L,l , with w0 = 1, so that the polynomial
Linear Prediction 7.6 Spectral Representation 127

becomes, Thus,
  2  2
L
 H 
Ao,L (z) = wl z −l . (7.63) g̃ R L+1 g = |λ|2 g̃H R L+1 g̃ . (7.69)
l=0
Using the Schwartz inequality,
Also, define the vector,  2 
 H   
w = [w0 w1 · · · w L ]T . g̃ R L+1 g ≤ g̃H R L+1 g̃ gH R L+1 g . (7.70)

Part B 7.6
We know that, However,

EL   r(0) r T
R L+1 w = . (7.64)
g̃ R L+1 g̃ = 0 g
H  H f,L 0
0 L×1 rf,L R L g
If λ is a root of the polynomial, it follows that,
= g R L g ,
H
(7.71)

L−1
Similarly,
Ao,L (z) = (1 − λz −1 ) gl z −l , with g0 = 1 .
 
l=0 R L rb,L g
g R L+1 g = g H
H
(7.65) 0 T
rb,L r(0) 0
(Note that since λ can be complex, the coefficients gl are,
= g R L g .
H
(7.72)
in general, complex.) Thus the vector w can be written
w = g − λg̃ , (7.66) Therefore, g̃H R L+1 g̃ = gH R L+1 g, and the Schwartz in-
equality becomes,
where  2 
 H  2
g = [1 g1 g2 · · · g L−1 0]T = [g T 0]T , g̃ R L+1 g ≤ g̃H R L+1 g̃ . (7.73)

g̃ = [0 1 g1 g2 · · · g L−1 ]T = [0 g T ]T . From (7.69) we see that |λ|2 ≤ 1. This completes the


Substituting (7.66) in (7.64), we obtain, proof.
This property allows one easily to ensure that the
EL
R L+1 g = λR L+1 g̃ + . (7.67) all-pole system in (7.2) is stable (when the correlation
0 L×1 matrix is positive definite) by simply imposing the con-
straint that the PARCOR coefficients are less than 1 in
Now, premultiplying by g̃H (where the superscript H
magnitude. As a result, in speech communication, trans-
denotes conjugate transpose) gives,
mitting PARCOR coefficients is more advantageous than
g̃H R L+1 g = λg̃H R L+1 g̃ . (7.68) directly transmitting linear predication coefficients.

7.6 Spectral Representation


It is important to understand the link between the spec- where ω is the angular frequency, H( eiω ) is the fre-
trum of a speech signal and its prediction coefficients. quency response of the filter H(z), and Su (ω) is the
Let us again take the speech model given in Sect. 7.1, spectrum of u. We have Su (ω) = 1 (u is white). Using
(7.2), we deduce the spectrum of x,

L
x(k) = al x(k − l) + Gu(k) , (7.74) G2
l=1 Sx (ω) =
|A( eiω )|2
where we now assume that u(k) is a white random signal G2
with variance σu2 = 1. Since x(k) is the output of the filter = 2 . (7.76)
 L 
H(z) (see Sect. 7.1), whose input is u(k), its spectrum 1 − l=1 al e−ilω 
is [7.11],
 2 Therefore, the spectrum of a speech signal can be mod-
Sx (ω) =  H( eiω ) Su (ω) , (7.75) eled by the frequency response of an all-pole filter,
128 Part B Signal Processing for Speech

whose elements are the prediction coefficients [7.23– |E L ( eiω )|2 of the error signal, e L (k), will tend to be flat.
25]. Hence,
Consider the prediction error signal,  2
lim  E L ( eiω ) = G 2 . (7.79)
e L (k) = x(k) − aTL x(k − 1) . (7.77) L→∞

Taking the z-transform of (7.77) and setting z = eiω , we As a result,


obtain:
G2
Part B 7.7

|E L ( eiω )|2 lim Sx,L (ω) =    . (7.80)


Sx,L (ω) = , (7.78) L→∞ 1 − ∞ al e−ilω 2
|A L ( eiω )|2 l=1

where |E L ( eiω )|2 is the spectrum of e L (k). From This confirms that (7.76) can be a very good approxi-
Sect. 7.2, we know that, for a large order L, linear pre- mation of the spectrum of a speech signal, as long as the
diction tends to whiten the signal, so the power spectrum order of the predictor is large enough.

7.7 Linear Interpolation


Linear interpolation can be seen as a straightforward is a vector of length L + 1 with its i-th component equal
generalization of forward and backward linear predic- to one and all others equal to zero. By using a La-
tions. Indeed, in linear interpolation, we try to predict grange multiplier, it is easy to see that the solution to
the value of the sample x(k − i) from its past and future this optimization problem is
values [7.26, 27]. We define the interpolation error as
R L+1 co,i = E i vi , (7.84)
ei (k) = x(k − i) − x̂(k − i)
where

L
= x(k − i) − ci,l x(k − l)
E i = cTo,i R L+1 co,i
l=0,l=i
1
= ciT xL+1 (k), i = 0, 1, · · · , L, (7.81) = T −1 (7.85)
vi R L+1 vi
where x̂(k − i) is the interpolated sample,
is the interpolation-error power.
ci = [−ci,0 −ci,1 · · · ci,i · · · −ci,L ]T From (7.84) we find,
is a vector of length L + 1 containing the interpolation co,i
= R−1
L+1 vi , (7.86)
coefficients, with ci,i = 1, and Ei
xL+1 (k) = [x(k) x(k − 1) · · · x(k − L)]T . hence the i-th column of R−1 L+1 is co,i /E i . We can now
see that R−1
L+1 can be factorized as follows [7.28]:
The special cases i = 0 and i = L are the forward and
backward prediction errors, respectively. ⎛ ⎞
1 −co,1,0 · · · −co,L,0
To find the optimal Wiener interpolator, we need to ⎜ ⎟
minimize the cost function, ⎜ −co,0,1 1 · · · −co,L,1 ⎟
−1 ⎜
R L+1 = ⎜ . .. .. ⎟
  .. ⎟
Ji (ci ) = E ei2 (k) ⎝ .. . . . ⎠
= ciT R L+1 ci , −co,0,L −co,1,L · · · 1
(7.82) ⎛ ⎞
1/E 0 0 · · · 0
subject to the constraint ⎜ ⎟
⎜ 0 1/E 1 · · · 0 ⎟
ciT vi = ci,i = 1 , ⎜
·⎜ . .. . . .. ⎟
(7.83) ⎟
⎝ .. . . . ⎠
where 0 0 · · · 1/E L−1
vi = [0 0 · · · 0 1 0 · · · 0]T = CTo D−1
e . (7.87)
Linear Prediction 7.8 Line Spectrum Pair Representation 129

Furthermore, since R−1


L+1 is a symmetric matrix, (7.87) and
can be written as,  −1   −2 1/2
R 
⎛ ⎞ L+1 F = tr R L+1 . (7.92)
1/E 0 0 · · · 0 From (7.86), we have,
⎜ ⎟
⎜ 0 1/E 1 · · · 0 ⎟ cTo,i co,i
R−1 = ⎜ . .. ⎟
L+1 ⎜ . .. . . ⎟ = viT R−2
L+1 vi , (7.93)
⎝ . . . . ⎠ E i2
· · · 1/E L−1

Part B 7.8
0 0 which implies that,
⎛ ⎞
1 −co,0,1 · · · −co,0,L 
L
cTo,i co,i 
L
⎜ ⎟ = viT R−2
⎜ −co,1,0 1 · · · −co,1,L ⎟ L+1 vi
·⎜
⎜ .. .. .. .. ⎟ ⎟ i=0
E i2 i=0
⎝ . . . . ⎠  
= tr R−2
L+1 . (7.94)
−co,L,0 −co,L,1 ··· 1
Also, we can easily check that,
= D−1
e Co . (7.88)
  
L
Therefore, we deduce that, tr R2L+1 = (L + 1)r 2 (0) + 2 (L + 1 − l)r 2 (l) .
co,i,l co,l,i l=1
= , i, l = 0, 1, · · · , L . (7.89) (7.95)
Ei El
Therefore, the square of the condition number of the
The first and last columns of R−1 L+1 contain, respec- correlation matrix associated with the Frobenius norm
tively, the normalized forward and backward predictors is
and all the columns between contain the normalized  

L
interpolators. χF2 (R L+1 ) = (L + 1)r 2 (0) + 2 (L + 1 − l)r 2 (l)
We are now going to show how the condition number l=1
of the correlation matrix depends on the interpolators.
L
cTo,i co,i
The condition number of the matrix R L+1 is defined × . (7.96)
as [7.29]: i=0
E i2
 
χ(R L+1 ) = R L+1 R−1 
L+1 , (7.90) Some other interesting relations between the forward
predictors and the condition number can be found
where  ·  can be any matrix norm. Note that χ(R) de- in [7.30].
pends on the underlying norm. Let us compute χ(R L+1 ) To conclude this section, we would like to let read-
using the Frobenius norm: ers know that several algorithms exist to compute the
 1/2 optimal predictors efficiently, see for example, [7.31]
R L+1 F = tr RTL+1 R L+1
 1/2 and [7.32]. All these algorithms are based on Levinson–
= tr R2L+1 (7.91) Durbin recursions.

7.8 Line Spectrum Pair Representation


Line spectrum pair (LSP) representation, first introduced Let
by Itakura [7.33], is a more-robust way to represent the
A(z) = 1 − a1 z −1 − a2 z −2 − · · · − a L z −L (7.99)
coefficients of linear predictive models. The LSP poly-
nomials have some very interesting properties shown be the optimal polynomial predictor of order L. It is well
in [7.34]. known that in speech compression the coefficients of this
A polynomial P(z) of order L is said to be symmetric polynomial are inappropriate for quantization because
if of their relatively large dynamic range and also because,
P(z) = z −L P(z −1 ) (7.97) as stated earlier, quantization can change a stable LPC
filter into an unstable one [7.35]. From (7.99), we can
and a polynomial Q(z) is antisymmetric if construct two artificial (L + 1)-th-order (symmetric and
Q(z) = −z −L Q(z −1 ) . (7.98) antisymmetric) polynomials by setting the (L + 1)-th
130 Part B Signal Processing for Speech

reflection coefficient, κ L+1 , to be +1 and −1. These to the two optimal Wiener predictors ao+ and ao− . This is
two cases correspond, respectively, to an entirely closed easy to see if we rewrite, e+ (k) for example as
or to an entirely open end at the last section of an acoustic  
tube of L + 1 piecewise-uniform sections [7.35], + a+T
e (k) = 1 − xL+1 (n)
2
P(z) = A(z) + z −L A(z −1 ) , (7.100)  +T 
a
Q(z) = A(z) − z −L A(z −1 ) . (7.101) + − 0 xL+1 (n)
2
Part B 7.9

The polynomial A(z) can be easily reconstructed from = gT ITd xL+1 (n) , (7.105)
P(z) and Q(z) by
1 where
A(z) = [P(z) + Q(z)] . (7.102) ⎛ ⎞
2 1 1 0 0 ···
It was proved in [7.36] and [7.34] that the LSP poly- ⎜0 0⎟···
⎜ 1 1 ⎟
nomials, P(z) and Q(z), have the following important ⎜. .. .. .. ⎟
..
properties: Id = ⎜
⎜ .. . . .⎟

. (7.106)
⎜ ⎟
• all zeros of LSP polynomials are on the unit circle, ⎝0 0 · · · 1 1⎠
• the zeros of P(z) and Q(z) are interlaced, and 0 0 ··· 0 1
• the minimum-phase property of A(z) can be easily
preserved if the first two properties are intact after and
⎛ ⎞
quantization. 1
Now, define the two prediction error signals: g = ⎝ a+ ⎠ . (7.107)

1 2
e+ (k) = x(n) − [x(n − 1) + x(n)]T a+ , (7.103)
2 By minimizing the MSE, E{e+2 (k)}, with respect to
1 g, with the constraint gT v1 = 1, one can find the most
e− (k) = x(n) − [x(n − 1) − x(n)]T a− . (7.104)
2 important results. For readers who are interested in
It is shown in [7.37] and [7.38] that the LSP polynomials, more details on the properties of LSP polynomials, we
whose trivial zeroes have been removed, are equivalent recommend the paper by Bäckström and Magi [7.39].

7.9 Multichannel Linear Prediction


Multichannel linear prediction can be very useful in ef,L (k) = χ(k) − χ̂(k)
stereo or multichannel speech compression. In an in-

L
creasing number of speech or audio applications, we = χ(k) − A L,l χ(k − l)
have at least two channels available, which are often l=1
highly correlated with each other. Therefore, it makes
= χ(k) − ATL x(k − 1) , (7.108)
sense to take this interchannel correlation into account
in order to obtain more-efficient compression schemes. where
Multichannel linear prediction is the best way to do
this. A L = [A L,1 A L,2 · · · A L,L ]T
Let
is the forward predictor matrix of size ML × M, each one
 T of the square matrices A L,l is of size M × M, and
χ(k) = x1 (k) x2 (k) · · · x M (k)
x(k − 1) = [χ T (k − 1) χ T (k − 2) · · · χ T (k − L)]T
be a real, zero-mean, stationary M-channel time series. is a vector of length ML. (For convenience, some of the
We define the multichannel forward prediction error notation used in this section is the same as that in the
vector as, previous sections.)
Linear Prediction 7.9 Multichannel Linear Prediction 131

To derive the optimal Wiener forward predictors, we We will proceed with the same philosophy to derive
need to minimize the MSE, important equations for the multichannel backward pre-
  diction. We define the multichannel backward prediction
Jf (A L ) = E eTf,L (k)ef,L (k) . (7.109) error vector as
We find the multichannel Wiener–Hopf equations: eb,L (k) = χ(k − L) − χ̂(k − L)
R L Ao,L = Rf (1/L) , (7.110) 
L

Part B 7.9
= χ(k − L) − B L,l χ(k − l + 1)
where l=1
= χ(k − L) − BTL x(k) , (7.117)
R L = E{x(k − 1)xT (k − 1)} (7.111)
= E{x(k)xT (k)} where
⎛ ⎞
R(0) R(1) ··· R(L − 1) B L = [B L,1 B L,2 · · · B L,L ]T
⎜ ⎟
⎜ R T (1) R(0) ··· R(L − 2)⎟
=⎜⎜ .. .. .. .. ⎟
⎟ is the backward predictor matrix of size ML × M with
⎝ . . . . ⎠ each one of the square submatrices B L,l being of size
RT (L − 1) RT (L − 2) · · · R(0) M × M.
(7.112) The minimization of the MSE,
 
is the block-Toeplitz covariance matrix of size ML× ML, Jb (B L ) = E eTb,L (k)eb,L (k) , (7.118)

R(l) = E{χ(k)χ T (k − l)}, l = 0, 1, · · · , L − 1 , leads to the multichannel Wiener–Hopf equations for the
backward prediction:
R(−l) = E{χ(k − l)χ T (k)} = RT (l) ,
R L Bo,L = Rb (1/L) , (7.119)
and
where
Rf (1/L) = [R(1) R(2) · · · R(L)]T
= E{x(k − 1)χ T (k)} Rb (1/L) = E{x(k)χ T (k − L)} (7.120)
= [RT (L) RT (L − 1) ··· RT (1)]
T
.
is the intercorrelation matrix of size ML × M.
Using the augmented block-Toeplitz covariance ma- By using the augmented block-Toeplitz covariance
trix of size (ML + M) × (ML + M): matrix:

R(0) RTf (1/L) RL Rb (1/L)
R L+1 = , (7.113) R L+1 = , (7.121)
Rf (1/L) RL RTb (1/L) R(0)
we deduce the augmented multichannel Wiener–Hopf we find the augmented multichannel Wiener–Hopf equa-
equations: tions:

I M×M Ef,L −Bo,L 0ML×M
R L+1 = , (7.114) R L+1 = , (7.122)
−Ao,L 0ML×M I M×M Eb,L

where I M×M is the identity matrix of size M × M and where


   
Ef,L = E ef,o,L (k)eTf,o,L (k) Eb,L = E eb,o,L (k)eTb,o,L (k)
= R(0) − RTf (1/L)Ao,L (7.115) = R(0) − RTb (1/L)Bo,L (7.123)

is the forward error covariance matrix of size M × M, is the backward error covariance matrix of size M × M,
with with

ef,o,L (k) = χ(k) − ATo,L x(k − 1) . (7.116) eb,o,L (k) = χ(k − L) − BTo,L x(k) . (7.124)
132 Part B Signal Processing for Speech

⎛ ⎞
To solve the multichannel Wiener–Hopf equations I M×M
efficiently, we need to derive some important rela- ⎜ ⎟
R L+1 ⎝−Ao,L−1 ⎠ E−1 f,L−1 Kb,L
tions [7.40]. Consider the following system,
⎛ ⎞ 0 M×M
I M×M ⎛ ⎞
RL Rb (1/L) ⎜ ⎟ Kb,L
⎝−Ao,L−1 ⎠ ⎜ ⎟
RTb (1/L) R(0) = ⎝ 0(ML−M)×M ⎠ . (7.132)
⎛ ⎞ 0 M×M
Kf,L E−1
Part B 7.9

Ef,L−1 f,L−1 Kb,L


⎜ ⎟ Subtracting (7.132) from (7.127) and identifying
= ⎝0(ML−M)×M ⎠ , (7.125)
the resulting system with the augmented multichan-
Kf,L
nel Wiener–Hopf equations for backward prediction
where (7.122), we deduce the two recursions:
Kf,L = RT (L) − RTb (1/L − 1)Ao,L−1 . (7.126) Eb,L = Eb,L−1 − Kf,L E−1
f,L−1 Kb,L , (7.133)

Consider the other system, 0 M×M
⎛ ⎞ Bo,L =
0 M×M Bo,L−1
R(0) Rf (1/L) ⎜
T

⎝−Bo,L−1 ⎠ −I M×M
Rf (1/L) RL − E−1
f,L−1 Kb,L . (7.134)
⎛ ⎞ I M×M Ao,L−1
Kb,L
⎜ ⎟ Relations (7.130), (7.131), (7.133), and (7.134) were
= ⎝0(ML−M)×M ⎠ , (7.127) independently discovered by Whittle [7.41] and Wiggins
Eb,L−1 and Robinson [7.42].
where Another important relation needs to be found. In-
deed, using (7.116) and (7.124), we can easily verify,
Kb,L = R(L) − RTf (1/L − 1)Bo,L−1 . (7.128)  
E ef,o,L−1 (k)eTb,o,L−1 (k − 1) = Kb,L , (7.135)
If we post-multiply both sides of (7.127) by  
E−1
b,L−1 Kf,L , we get:
E eb,o,L−1 (k − 1)eTf,o,L−1 (k) = Kf,L , (7.136)
⎛ ⎞ which implies that,
0 M×M
⎜ ⎟
R L+1 ⎝−Bo,L−1 ⎠ E−1 b,L−1 Kf,L Kb,L = KTf,L . (7.137)
I M×M Table 7.2 summarizes the Levinson–Wiggins–
⎛ ⎞
Kb,L E−1
b,L−1 Kf,L
Robinson algorithm [7.42–44], which is a generalization
⎜ ⎟ of the Levinson–Durbin algorithm to the multichannel
= ⎝ 0(ML−M)×M ⎠ . (7.129)
case.
Kf,L
Subtracting (7.129) from (7.125) and identifying the Table 7.2 Levinson–Wiggins–Robinson algorithm
resulting system with the augmented multichannel
Wiener–Hopf equations for forward prediction [eq. Initialization:Ef,0 = Eb,0 = R(0)
(7.114)], we deduce the two recursions: For1 ≤ l ≤ L
Kb,l = R(l) − RTf (1 : l − 1)Bo,l−1
Ef,L = Ef,L−1 − Kb,L E−1
b,L−1 Kf,L , (7.130)    
Ao,l−1 Bo,l−1
Ao,l = − E−1 T
b,l−1 Kb,l
Ao,L−1 −I M×M
Ao,L = 
0 M×M
  
0 M×M −I M×M
Bo,l =
0 M×M
− E−1
f,l−1 Kb,l
Bo,L−1 Bo,l−1 Ao,l−1
− E−1
b,L−1 Kf,L . (7.131)
−I M×M Ef,l = Ef,l−1 − Kb,l E−1 T
b,l−1 Kb,l

Similarly, if we post-multiply both sides of (7.125) Eb,l = Eb,l−1 − KTb,l E−1


f,l−1 Kb,l
by E−1
f,L−1 Kb,L , we obtain:
Linear Prediction References 133

7.10 Conclusions
In this chapter, we have tried to present the most im- and backward prediction coefficients. We have explained
portant results in linear prediction for speech. We have the idea behind the lattice predictor. We have shown how
explained the principle of forward linear prediction and the spectrum of a speech signal can easily be estimated
have shown that the optimal prediction error signal tends thanks to the prediction coefficients. We have given some
to be a white signal. We have extended the principle notions of linear interpolation and have demonstrated

Part B 7
of forward linear prediction to backward linear pre- how the condition number of the correlation matrix is
diction and derived the Cholesky factorization of the related to the optimal interpolators. We have also pre-
inverse correlation matrix. We have developed the classi- sented some notions of line spectrum pair polynomials.
cal Levinson–Durbin algorithm, which is a very efficient Finally, in the last section, we have generalized some of
way to solve the Wiener–Hopf equations for the forward these ideas to the multichannel case.

References

7.1 B.S. Atal: The history of linear prediction, IEEE Signal 7.16 J.P. Burg: Maximum Entropy Spectral Analysis Ph.D.
Proc. Mag. 23, 154–161 (2006) Dissertation (Stanford Univ., Stanford 1975)
7.2 L.R. Rabiner, R.W. Schaffer: Digital Processing of 7.17 P. Delsarte, Y.V. Genin: The split Levinson algorithm,
Speech Signals (Prentice-Hall, Englewood Cliffs 1976) IEEE Trans. Acoust. Speech ASSP-34, 470–478 (1986)
7.3 J.D. Markel, A.H. Gray Jr.: Linear Prediction of Speech 7.18 R. Kumar: A fast algorithm for solving a Toeplitz
(Springer, New York 1976) system of equations, IEEE Trans. Acoust. Speech
7.4 J. Makhoul: Linear prediction: a tutorial review, ASSP-33, 254–265 (1985)
Proc. IEEE 63, 561–580 (1975) 7.19 J. Makhoul: Stable and efficient lattice methods for
7.5 J.W. Picone: Signal modeling techniques in speech linear prediction, IEEE Trans. Acoust. Speech ASSP-
recognition, Proc. IEEE 81, 1215–1247 (1993) 25, 423–428 (1977)
7.6 S.R. Quackenbush, T.P. Barnwell, M.A. Clements: Ob- 7.20 F. Itakura, S. Saito: Digital filtering techniques for
jective Measures of Speech Quality (Prentice-Hall, speech analysis and synthesis, Proc. 7th Int. Conf.
Englewood Cliffs 1988) Acoust. 25(C-1), 261–264 (1971)
7.7 Y. Huang, J. Benesty, J. Chen: Acoustic MIMO Signal 7.21 S.W. Lang, J.H. McClellan: A simple proof of stability
Processing (Springer, New York 2006) for all-pole linear prediction models, Proc. IEEE 67,
7.8 F. Itakura, S. Saito: A statistical method for es- 860–861 (1979)
timation of speech spectral density and formant 7.22 M.H. Hayes: Statistical Digital Signal Processing and
frequencies, Electron. Commun. Jpn. 53(A), 36–43 Modeling (Wiley, New York 1996)
(1970) 7.23 J. Makhoul: Spectral analysis of speech by linear pre-
7.9 G. Chen, S.N. Koh, I.Y. Soon: Enhanced Itakura diction, IEEE Trans. Acoust. Speech AU-21(3), 140–148
measure incorporating masking properties of hu- (1973)
man auditory system, Signal Process. 83, 1445–1456 7.24 S.M. Kay, S.L. Marple Jr.: Spectrum analysis – a
(2003) modern perspective, Proc. IEEE 69, 1380–1419 (1981)
7.10 M.G. Bellanger: Adaptive Digital Filters and Signal 7.25 J. M.Cadzow: Spectral estimation: an overdeter-
analysis (Marcel Dekker, New York 1987) mined rational model equation approach, Proc. IEEE
7.11 S. Haykin: Adaptive Filter Theory, 4th edn. (Prentice- 70, 907–939 (1982)
Hall, Upper Saddle River 2002) 7.26 S. Kay: Some results in linear interpolation theory,
7.12 C.W. Therrien: On the relation between triangular IEEE Trans. Acoust. Speech ASSP-31, 746–749 (1983)
matrix decomposition and linear prediction, Proc. 7.27 B. Picinbono, J.-M. Kerilis: Some properties of pre-
IEEE 71, 1459–1460 (1983) diction and interpolation errors, IEEE Trans. Acoust.
7.13 N. Levinson: The Wiener RMS (root mean square) er- Speech ASSP-36, 525–531 (1988)
ror criterion in filter design and prediction, J. Math. 7.28 J. Benesty, T. Gaensler: New insights into the RLS
Phys. 25(4), 261–278 (1947), Also Appendix B, in N. algorithm, EURASIP J. Appl. Si. Pr. 2004, 331–339
Wiener, Extrapolation, Interpolation and Smoothing (2004)
of Stationary Time Series (MIT, Cambridge 1949) 7.29 G.H. Golub, C.F. Van Loan: Matrix Computations
7.14 J. Durbin: Efficient estimation of parameters in (Johns Hopkins Univ. Press, Baltimore 1996)
moving-average models, Biometrika 46, 306–316 7.30 J. Benesty, T. Gaensler: Computation of the condi-
(1959), Parts 1 and 2 tion number of a non-singular symmetric Toeplitz
7.15 J. Durbin: The fitting of time-series models, Rev. matrix with the Levinson-Durbin algorithm, IEEE
Inst. Int. Stat. 28(3), 233–243 (1960) Trans. Signal Proces. 54, 2362–2364 (2006)
134 Part B Signal Processing for Speech

7.31 C.K. Coursey, J.A. Stuller: Linear interpolation lattice, tion, IEEE Trans. Speech Audio Process. 12, 554–560
IEEE Trans. Signal Proces. 39, 965–967 (1991) (2004)
7.32 M.R.K. Khansari, A. Leon-Garcia: A fast algorithm 7.39 T. Bäckström, C. Magi: Properties of line spectrum
for optimal linear interpolation, IEEE Trans. Signal pair polynomials – A review, Elsevier Signal Process.
Process. 41, 2934–2937 (1993) 86, 3286–3298 (2006)
7.33 F. Itakura: Line spectrum representation of linear 7.40 T. Kailath: A view of three decades of linear fil-
predictive coefficients of speech signal, J. Acoust. tering theory, IEEE Trans. Inf. Theory IT-20, 146–181
Soc. Am. 57(1), 35 (1975) (1974)
Part B 7

7.34 F.K. Soong, B.-H. Juang: Line spectrum pair (LSP) 7.41 P. Whittle: On the fitting of multivariate autoregres-
and speech data compression, Proc. ICASSP (1984) sions and the approximate canonical factorization
pp. 1.10.1–1.10.4 of a spectral density matrix, Biometrika 50, 129–134
7.35 F.K. Soong, B.-H. Juang: Optimal quantization of (1963)
LSP paramaters, IEEE Trans. Speech Audio Process. 1, 7.42 R.A. Wiggins, E.A. Robinson: Recursive solution to
15–24 (1993) the multichannel filtering problem, J. Geophys. Res.
7.36 S. Sagayama: Stability condition of LSP speech syn- 70, 1885–1891 (1965)
thesis didgital filter, Proc. Acoust. Soc. Jpn. (1982) 7.43 O.N. Strand: Multichannel complex maximum en-
pp. 153–154, (in Japanese) tropy (autoregressive) spectral analysis, IEEE T.
7.37 B. Kleijn, T. Bäckström, P. Alku: On line spectral Automat. Contr. AC-22, 634–640 (1977)
frequencies, IEEE Signal Proc. Lett. 10, 75–77 (2003) 7.44 P. Delsarte, Y.V. Genin: Multichannel singular pre-
7.38 T. Bäckström, P. Alku, T. Paatero, B. Kleijn: A dictor polynomials, IEEE Trans. Circuits Syst. 35,
time-domain interpretation for the LSP decomposi- 190–200 (1988)

You might also like