Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Lecture 2 1 Background 2

During this lecture you will learn about


The method of the Steepest descent that was studies at the last lecture
is a recursive algorithm for calculation of the Wiener filter when the
• The Least Mean Squares algorithm (LMS) statistics of the signals are known (knowledge about R och p).
• Convergence analysis of the LMS
• Equalizer (Kanalutjämnare) The problem is that this information is oftenly unknown!

LMS is a method that is based on the same principles as the met-


hod of the Steepest descent, but where the statistics is estimated
continuously.

Since the statistics is estimated continuously, the LMS algorithm can


adapt to changes in the signal statistics; The LMS algorithm is thus an
adaptive filter.

Because of estimated statistics the gradient becomes noisy. The LMS


algorithm belongs to a group of methods referred to as stochastic
gradient methods, while the method of the Steepest descent belongs
to the group deterministic gradient methods.
Adaptive Signal Processing 2011 Lecture 2
Adaptive Signal Processing 2011 Lecture 2

The Least Mean Square (LMS) algorithm 3 The Least Mean Square (LMS) algorithm 4

We want to create an algorithm that minimizes E{|e(n)|2 }, just like For the SD, the update of the filter weights is given by
the SD, but based on unkown statistics. 1
w(n+1) = w(n) + µ[−∇J(n)]
2
A strategy that then can be used is to uses estimates of the autocorre-
lation matrix R and the cross correlationen vector p. If instantaneous where ∇J(n) = −2p + 2Rw(n).
estimates are chosen,
In the LMS we use the estimates R b och b b
p to calculate ∇J(n) .
b (n) = u(n)uH (n)
R Thus, also the updated filter vector becomes an estimate. It is therefore
b ∗
p(n) = u(n)d (n) denoted wb (n);

b
∇J(n) = −2b b (n)w
p(n) + 2R b (n)
the resulting method is the Least Mean Squares algorithm.

Adaptive Signal Processing 2011 Lecture 2 Adaptive Signal Processing 2011 Lecture 2
The Least Mean Square (LMS) algorithm 5 The Least Mean Square (LMS) algorithm 6

For the LMS algorithm, the update equation of the filter weights Illustration of gradient noise. R och p in the example from Lecture 1.
becomes µ = 0.1.
SD LMS

b (n + 1) = w
w b (n) + µu(n)e (n)
2 2

10 10.0

10 10.0
9.1

9.1
7.3

7.3
.96 6

.96 6
6

6
6

6
1.5 1.5

where e∗ (n) = d∗(n) − uH (n)w

8.2

8.2
b (n).

6
1 1

1.06

1.06
0.5 0.5

Compare with the corresponding equation for the SD:

w0

w0
0 0


w(n + 1) = w(n) + µE{u(n)e (n)} −0.5
1.9
6 −0.5
1.9
6

5. 5.
56 2.8 56 2.8
6 6
6.4 6.4
−1 −1

where e∗ (n) = d∗(n) − uH (n)w(n).


6 6
3.76 3.76

8.2 4.66 8.2 4.66


−1.5 6 −1.5 6
10 10
11 11

6
.0 .0
.8 10.9 6 .8 10.9 6

The difference is thus that E{u(n)e∗ (n)} is estimated as u(n)e∗ (n).

8.2

8.2
6 9.1 6 9.1
6 6 7.36 .36 6 6 7.36 6
−2 7 −2 7.3
−2 −1.5 −1 −0.5 0 0.5 1 1.5 2 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2

This leads to gradient noise. w1 w1


Because of the noise in the gradientm the LMS never reaches Jmin,
which the SD does.

Adaptive Signal Processing 2011 Lecture 2 Adaptive Signal Processing 2011 Lecture 2

Convergence analysis of the LMS, overview 7 1. The filter weight error vector 8

Consider the update equation The LMS contains feedback, and just like for the SD there is a risk that
the algorithm diverges, if the stepsize µ is not chosen appropriately.

b (n + 1) = w
w b (n)+µu(n)e (n) .
In order to investigate the stability, the filter weight error vector is
We want to know how fast and towards what the algorithm converges,in introduced ǫ(n):
b (n).
terms of J(n) and w
−1
b (n) − wo
ǫ(n) = w (wo = R p Wiener-Hopf)
Strategy:

b (n)− wo .
1. Introduce the filter weight error vector ǫ(n) = w Note that ǫ(n) corresponds to c(n) = w(n) − wo for the Steepest
b (n).
descent, but that ǫ(n) is stochastic because of w
2. Express the update equation in ǫ(n).
3. Express J(n) in ǫ(n). This leads to K(n) = E{ǫ(n)ǫH (n)}.
4. Calculate the difference equation in K(n), which controls the
convergence.
5. Make a variable change from K(n) to X(n) which converges
equally fast.

Adaptive Signal Processing 2011 Lecture 2 Adaptive Signal Processing 2011 Lecture 2
2. Update equation in ǫ(n) 9 3. Express J(n) in ǫ(n) 10

The update equation for ǫ(n) at time n+1 can be expressed recursively The convergence analysis of the LMS is considerably more complicated
as than that for the SD. Therefore, two tools are needed (approximations).

b (n)]ǫ(n) + µ[b
ǫ(n+1) = [I − µR b (n)wo ]
p(n) − R These are
H ∗
= [I − µu(n)u (n)]ǫ(n) + µu(n)eo (n)
• Independence theory
where e∗o (n) = d(n) − woH u(n). • Direct-averaging method

Compare to that of the SD:

c(n + 1) = (I − µR)c(n)

Adaptive Signal Processing 2011 Lecture 2 Adaptive Signal Processing 2011 Lecture 2

3. Express J(n) i ǫ(n), cont. 11 3. Express J(n) in ǫ(n), cont. 12

Independence Theory Direct-averaging

1. The input vectors of different time instants n, Direct-averaging means that in the update equation for ǫ,
u(1), u(2), . . . , u(n) , H ∗
is independet of each other (and thereby uncorrelated). ǫ(n + 1) = [I − µu(n)u (n)]ǫ(n) + µu(n)eo (n) ,

2. The input signal vector at time n is independent of the desired the instantaneous estimate Rb (n) = u(n)uH (n) is substituted with
signal the expectation over the ensemble, R = E{u(n)uH (n)}, such that
at all earlier time instants,

u(n) idependent of d(1), d(2), . . . , d(n−1) . ǫ(n + 1) = (I − µR)ǫ(n) + µu(n)eo (n)

3. The desired signal at time n, d(n), results.


is independent of all earlier desired signals, d(1), d(2), . . . , d(n−
1), but dependent of the input signal vector at time n, u(n).

4. The signals d(n) and u(n) at the same time instant n is mutually
normally distributed.

Adaptive Signal Processing 2011 Lecture 2 Adaptive Signal Processing 2011 Lecture 2
3. Express J(n) i ǫ(n), cont. 13 3. Express J(n) i ǫ(n), cont. 14

For the SD we could show that The behavior of Jex (n) = J(n) − Jmin = E{ǫH (n)R b ǫ(n)}
H determines the convergence.
J(n) = Jmin + [w(n) − wo ] R[w(n) − wo ] Since the following holds for the trace of a matrix,
and that the eigenvalues of R controls the convergence speed. 1. tr(scalar) = scalar
2. tr(AB) = tr(BA)
b (n) − wo ,
For the LMS, ǫ(n) = w 3. E{tr(A)} = tr(E{A})

J(n) = E{|e(n)|2 } = E{|d(n) − w


b H (n)u(n)|2 } Jex (n) can be written as
1,2
= E{|eo (n) − ǫH (n)u(n)|2} = b (n)ǫ(n)ǫ (n)]}
Jex (n) = E{tr[R
H

i
= E{|eo (n)|2 } +E{ǫH (n) u(n)uH (n) ǫ(n)} 3 b (n)ǫ(n)ǫH (n)}]
= tr[E{R
| {z } | {z }
Jmin b (n)
R i b (n)}E{ǫ(n)ǫH (n)}]
= tr[E{R
Here, (i) indicates that the indpendence assupmtion is used. Since ǫ = tr[RK(n)]
b are estimates, and in addition not independent, the expectation
and R
b
operator cannot just be “moved into R. The convergence thus depends on K(n) = E{ǫ(n)ǫH (n)}.
Adaptive Signal Processing 2011 Lecture 2 Adaptive Signal Processing 2011 Lecture 2

4. Difference equation of K(n) 15 5. Changing of variables from K(n) to X(n) 16

The update equation for the filter weight error is, as shown previously, The variable changes
H ∗ H
ǫ(n + 1) = [I − µu(n)u (n)]ǫ(n) + µu(n)eo (n) Q RQ = Λ
H
The solution to this stochastic difference equation is for small µ close Q K(n)Q = X(n)
to the solution for
where Λ is the diagonal eigenvalue matrix of R, and where X(n)
∗ normally is not diagonal, yields
ǫ(n + 1) = (I − µR)ǫ(n) + µu(n)eo (n) ,

which is resulting from Direct-averaging. Jex (n) = tr(RK(n)) = tr(ΛX(n))

This equation is still difficult to solve, but now the behavior of K(n) The convergence can be discribed by
can be described by the following difference equation 2
X(n + 1) = (I − µΛ)X(n)(I − µΛ) + µ JminΛ
2
K(n + 1) = (I − µR)K(n)(I − µR) + µ Jmin R

Adaptive Signal Processing 2011 Lecture 2 Adaptive Signal Processing 2011 Lecture 2
5. Changing of variables from K(n) to X(n),
cont. 17 Properties of the LMS 18
Jex (n) depends only on the diagonal elements of X(n) (because of
M
X • Convergence for the same interval as the SD:
the trace tr[ΛX(n)]). We can therefore write Jex (n) = λixi (n). 2
i=1 0<µ<
λmax
The update of the diagonal elements is controlled by M
X µλi
2 2
• Jex (∞) = Jmin
xi (n + 1) = (1 − µλi) xi (n) + µ Jminλi 2 − µλi
i=1
Jex (∞)
This is an inhomogenous difference equation.This means that the solu- • Misadjustment M = is a measure of how close the the
Jmin
tion will contain a transient part, and a stationary part that depends on optimal solution the LMS (in mean-square sense).
Jmin and µ. The cost function can therefore be written

J(n) = Jmin + Jex (∞) + Jtrans (n)

where Jmin +Jex (∞) och Jtrans (n) is the stationary and transient
solutions, respectively.

At the best, the LMS can thus reach Jmin +Jex (∞).

Adaptive Signal Processing 2011 Lecture 2


Adaptive Signal Processing 2011 Lecture 2

Rules of thumb LMS 19 Summary LMS 20

The eigenvalues of R are rarely known. Therefore, a set of rules of Summary of the iterations:
thumbs is commonly used, that is based on the tap-input power,
P M −1
k=0
E{|u(n − k)|2 } = M r(0) = tr(R). 1. Set start values for the filter coefficients, ŵ(0)
2. Calculate the error e(n) = d(n) − ŵH (n)u(n)
• The stepsize 0 < µ < PM −1 2
E{|u(n−k)|2 } 3. Update the filter coefficients
k=0 ∗ H
PM −1 ŵ(n + 1) = ŵ(n) + µu(n)[d (n) − u (n)ŵ(n)]
• Misadjustment M ≈ µ
2
2
k=0 E{|u(n − k)| } 4. Repeat steps 2 and 3

• Average time constant τmse,av ≈ 2µλ 1 där


PM av
λav = M 1
i=1 λi .
When λi is unknown, the following relationship is useful.

• Approximate relationship M and τmse,av M


M ≈ 4τmse,av

Here, τmse,av denotes the time it takes for J(n) to decay a factor of
e−1 .
Adaptive Signal Processing 2011 Lecture 2 Adaptive Signal Processing 2011 Lecture 2
Example: Channel equalizer in training phase 21 Example, cont: Learning Curve 22

p(n) = . . . , 1, 1, −1, 1, −1, . . . d(n)


z−7
-
100 Learning curves based on 200 realiza-
Fördröjning tions.
The decay is related to τmse,av .
A large µ gives fast convergence, but

Average of MSE, J(n)



 +
 b ?
p(n) Channel +- u(n) FIR d(n)  also a large Jex (∞).
- Σ - -Σ
h(n)  M=  11 – 
6 
+

 10−1 µ = 0.0075

WGN v(n)
2 = 0.001 - 
e(n)
σv LMS µ = 0.075
h(n) = {0, 0.2798, 1.000, 0.2798, 0}
µ = 0.025
Sent information in the form of a pn-sequence is affected by a channel 10−2
(h(n)). An adaptive inverse filter is used to compensate for the 0 500 1000 1500
channel such that the total effect can be approximated by a delay. Iteration n
White Gaussian noise is added during the transmission (WGN). The curves illustrates learning curves for two different µ.

Adaptive Signal Processing 2011 Lecture 2 Adaptive Signal Processing 2011 Lecture 2

What to read? 23

• Least Mean Squares, Haykin chap.5.1–5.3, 5.4 (see lecture), 5.5–5.7

Exercises: 4.1, 4.2, 1.2, 4.3, 4.5, (4.4)

Computer exercise, theme: Implement the LMS algorithm in Matlab, and generate
the results in Haykin chap.5.7

Adaptive Signal Processing 2011 Lecture 2

You might also like