UNIT IV: Adaptive Filtering: N N, 0 N, 1 N, P T N N n-1 N-P T

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

UNIT IV: Adaptive filtering

Introduction

Most of natural signals are nonstationary. This makes the earlier techniques considered as
inappropriate. One way around this limitation would be to process these nonstationary processes
in blocks, over short time intervals, for which the process may be assumed to be approximately
stationary (quasi-stationary). However, the efficiency of this approach is limited for several
reasons.
1. For rapidly varying processes, the quasi-stationarity interval may be too short to be
sufficient for desire resolution in parameters estimation.
2. It is not easy to accommodate step changes within the analysis intervals.
3. This solution imposes an incorrect (i.e., piecewise stationary) model on the non-
stationary data.

FIR adaptive filters

Considering the Wiener filtering problem within the context of nonstationary processes, let wn be
the unit pulse response of the FIR Wiener filter producing the MMS estimate of the desired
process dn.

If dn and xn are nonstationary, the filter coefficients minimizing the error

will depend on n and the filter will be time-varying, i.e.

where wn,k is the value of the kth coefficient at time n.


The last equation can be rewritten using the vector notation:

Where Wn = [Wn,0, Wn,1, ………., Wn,p]T is a vector of filter coefficients at time n and Xn
= [xn, xn-1, ………., xn-p]T .

The design of a time-varying (adaptive) filter is much more difficult than the design of a
traditional (time invariant) Wiener filter since it is necessary to find a set of optimum coefficients
wn,k for k = 0,1,…,p and for each value of n.
However, the problem may be simplified considerably if we do not require that wn minimize the
MS error at each time n and consider, instead, a coefficient update equation of the form
w n +1 = w n +Δwn
where ∆w n is a correction that is applied to the filter coefficients wn at time n to form a new set
of coefficients wn+1 at time n+1.
The design of an adaptive filter involves defining how this correction should be formed. This
approach may be preferred even for stationary cases. For instance, if the order p is large, it might
be difficult to solve Wiener-Hopf equations directly.
Also, if Rx is ill-conditioned (nearly singular), the solution to the Wiener-Hopf equations will be
numerically sensitive to round-off errors and finite precision effects. Finally, since
autocorrelation and cross-correlations are unknown, there is a need to estimate them. Since the
process may be changing, these estimates need to be updated continuously.

The key concept of an adaptive filter is the set of rules defining how the correction ∆wn is
formed. Despite the particular algorithm, the adaptive filter should have the following properties:
1. In a stationary situation, the filter should produce a sequence of corrections ∆wn such that
wn converges to the solution of the Wiener-Hopf equations:
lim wn = R x −1r dx
n→∞
2. It should not be necessary to know the signal statistics rx(k) and rdx(k) in order to compute
∆wn. The estimation of these statistics should be a part of the adaptive filter.
3. In nonstationary situations, the filter should be able to adapt to the changing signal
statistics and “track” the solution as it evolves in time.
It is also important that the error signal en should be available to the filter since this error signal
allows filter to measure its performance and determine how to modify filter coefficients (adapt).

FIR adaptive filters are quite popular for the following reasons:
1. Stability can be easily controlled by ensuring that the filter coefficients are bounded.
2. There are simple and efficient algorithms for adjusting the filter coefficients
3. These algorithms are well understood in terms of their convergence and stability.
4. FIR adaptive filters usually perform well enough to satisfy the design criteria.

An FIR adaptive filtering for estimatating a desired signal d n from a related signal xn as

is shown

Assuming that xn and dn are nonstationary random processes, the goal is to find the coefficient
vector wn at time n that minimizes the MS error
Similarly to the Wiener filter case, the coefficients minimizing the MS error can be found by
setting the derivative of ξn with respect to wn,k* equal to zero for k = 0,1,…,p. Therefore:

Which after rearranging the terms becomes

a set of p+1 linear equations in the p+1 unknowns wn,l, the solution to which depends on n.
Expressed in vector form:

Hence in the case of wss processes, it reduces to the Wiener-Hopf equations, and the solution wn
becomes time-independent.

FIR adaptive filters: the steepest descent adaptive filter

The vector wn minimizing the quadratic error function can be found by setting its
derivative with respect to filter coefficients wn to zero. An alternative approach is to search for
the solution using the iterative method of steepest descent.
Let wn be an estimate of the vector that minimizes the MS error ξn at time n. At time n+1, a new
estimate is formed by adding a correction to wn that is designed to bring wn closer to the desired
solution. The correction involves taking a step of size µ in the direction of maximum descent
down the quadratic error surface. For instance, for a 3D plot of a quadratic function of 2 real-
valued coefficients w0 and w1 such as: ξn = 6 − 6 w0 − 4 w1 + 6 w02 + w12 + 6w0 w1. We
notice that the contours of constant error, when projected onto w0 w1 plane, form a set of
concentric ellipses. The direction of steepest descent at any point in the plane is the direction that
a marble would take if it was placed inside of this quadratic bowl.
Mathematically, this direction is given by the gradient: the vector of partial derivatives of ξn with
respect to the coefficients wk. For the quadratic function the gradient vector is

The gradient is orthogonal to the line that is tangent to the contour of constant error at w.
However, since the gradient vector points in the direction of the steepest ascent, the direction of
steepest descent points in the negative gradient direction. Hence the update equation of the
form
w n +1 = w n - µ ξn

The step size µ affects the rate at which the weight vector moves down the quadratic surface and
must be a positive number. For very small values of µ, the correction to wn is small and the
movement down the quadratic surface is slow and, as µ increases, the rate of descent increases.
However, an upper limit exists on how large the step size could be. For values of µ exceeding
this limit, the trajectory of wn becomes unstable and unbounded.

The steepest descent algorithm may be summarized as follows:

1. Initialize the steepest descent algorithm with an initial estimate w0 of the optimum weight
vector w.
2. Evaluate the gradient of ξn at the current estimate of wn.
3. Update the estimate at time n by adding a correction in the negative gradient direction as
follows: w n +1 = w n - µ∇ξn
4. Go back to step 2 and repeat the process.
FIR adaptive filters:

LMS algorithm

Incorporating the correlation estimate into the steepest descent method yields:

In a special case, a one-point sample mean (L = 1) is used

And the filter-update equation becomes

which is known as the LMS algorithm.


The simplicity of the algorithm comes from the fact that the update of the kth filter coefficient
only requires one multiplication and one addition:

An LMS adaptive filter with p+1 coefficients requires p+1 multiplications and p+1 additions
to update the filter coefficients. One addition is needed to form an error en and one
multiplication is required to form the product µ en. Finally, p+1 multiplications and p
additions are needed to calculate the output yn. Therefore, a total of 2p+3 multiplications and
2p+2 additions per output sample are required.
A summary of an LMS algorithm is given

-
Normalized LMS (NLMS)

One of the difficulties in the design of adaptive LMS filters is the selection of the step size µ.
For stationary processes, the LMS algorithm converges in the mean if 0 < µ < 2/λmax, and
converges in the mean-square if 0 < µ < 2/tr(Rx). However, since Rx is generally unknown,
then either λmax or Rx must be estimated.
One way is to use the fact that, for stationary processes,

Therefore, the comdition for mean-square convergence may be replaced by

where E{|xn|2} is the power in the process xn that may be estimated as

which leads to the following bound on the step size for mean-square convergence:

A convenient way to incorporate this bound into the LMS adaptive filter is to use a time
varying step size of the form

where β is a normalized step size with 0 < β < 2. Replacing µ in the LMS weight vector
update equation with µn leads to the Normalized LMS (NLMS) algorithm:
The effect of normalization by ||xn||2 is to change the magnitude but not the direction of the
estimated gradient vector. Therefore, with the appropriate set of statistical assumptions, it
may be shown that the NLMS algorithm converges in the mean-square if 0 < β < 2.
In the LMS algorithm, the correction applied to wn is proportional to the input vector xn.
Therefore, when xn is large, the LMS algorithm has a problem of gradient noise
amplification. In the NLMS algorithm, however, the normalization diminishes this problem.
On the other hand, the NLMS algorithm has a similar problem when ||xn|| becomes too small.
An alternative, therefore, is to use the following modification to the NLMS algorithm:

Where ε is some small positive number.


Lastly, the normalization term can be computed recursively:

Adaptive Channel Equalizers

The schematic of an adaptive channel equalizer has been depicted in Figure below.
The input binary message s(n) is passed through the nonlinear channel whose output is added
with white additive noise q(n) to produce the received signal x(n). The communication
channel has been modeled as a nonlinear system followed by an additive white noise. The
received signal x(n) is a distorted version o f the transmitted signal s(n) which is affected due
to ISI and the additive noise. The purpose of the equalizer is to reconstruct the transmitted
message faithfully at the receiver. The received message is passed through a tap delay filter
to produce the input vector values x(n), x(n-l), x(n-2),..., x(n-M +l) for an M -tap delay filter,
to be used as input to the equalizer. The equalizer produces an output y a(n) which is then
compared with the delayed version of the transmitted signal s(n). The resulting difference is
known as the error e(n). The training signal is generated by delaying the message by some
clock cycles which are half the order of the equalizer. The knowledge of the error signal and
the input received signal vector to the equalizer is used to adjust the equalizer weights with
the help of some training algorithm. The training of the equalizer weights continues by the
application o f the received signal samples. The training is stopped when the mean square
error (MSE) of the error attains a value below a prescribed error. The design of the equalizer
is achieved, when the MSE settles to some steady state as the steady state weights remain
constant.
Channel equalizers are important for reliable communication of digital data over non-
ideal channels. Let {d(n)} be the digital signal to be transmitted over the channel. This signal
takes as values plus or minus 1. This signal is input to a pulse generator which produces a
pulse of amplitude A at time n if d(n) = 1 or −A otherwise. These pulses are modulated and
transmitted over the channel. The receiver demodulates and samples the received waveform
which produces the signal {x(n)}. The demodulated signal is distorted by the channel. The
pulse shapes are distorted causing neighboring pulses to interfere with each other, which is
known as Intersymbol Interference (ISI).

A model for {x(n)} is

This model is motivated by physical reasons: the signal is subject to multi-path fading which
means that the received signal is the sum of delayed scaled versions. Here v(n) is additive
noise and h(n) is the unit sample response of the channel.

The decision on the transmitted bit is a simple threshold device:

To improve the chances of correct decisions, an equalizer is used to minimise the channel
distortion.
To compensate for the signal distortion, the adaptive channel equalization system
completes the following two modes:
 Training mode – This method takes place during an initial training phase, which
occurs when the transmitter and receiver first establish a connection. During this phase,
the transmitter sends a sequence of pseudorandom digits that is known to the receiver.
With knowledge of d(n) the error sequence is easily determined and the tap weight of
the equalizer may be initialized.
 Decision-directed mode – Once the training period has ended and data is being
exchanged between the transmitter and receiver, the receiver has no prior knowledge of
what is being sent. However, there is a clever scheme that may be used to extract the
error sequence from the output of the threshold device. If no errors are made at the
output of the threshold device and d^(n)= d(n), the error sequence may be formed by
taking the difference between the equalizer output, y(n), and the output of the threshold
device, e(n) = y(n) - d^(n)
This approach is said to be decision directed since it is based on the decisions made by
the receiver. Although based on the threshold device making a correct decision, this
approach will work even in the presence of errors, provided that thsy are infrequent
enough. Once the error rate exceeds a certain level, the inaccuracies in the error signal
will cause the equalizer to diverge away from the correct solution thereby causing an
increase in the error rate and eventually a loss of reliable communication. However, the
receiver may request that the training sequence be retransmitted in order to re-initialize
the equalizer.
Adaptive Noise cancellation

The problem of noise cancellation implies estimation of the process dn from a corrupted
observation

It is impossible to separate dn and v1,n without any information about these processes.
However, given a reference signal v2,n that is correlated with v1,n, this reference signal can be
used to estimate the noise v1,n, and this estimate may be subtracted from xn to form an
estimate for dn:

For example, if dn, v1,n, and v2,n are jointly wss processes, and if the autocorrelation rv2(k) and
the cross-correlation rv1v2(k) are known, a Wiener filter may be designed to find the MMS
estimate of v1,n as shown.

In practice, however, a stationarity assumption is not generally appropriate and the statistics
of v1,n and v2,n are generally unknown. Therefore, as an alternative to the Wiener filter, we
consider the adaptive noise canceller shown below.

If the reference signal v2,n is uncorrelated with dn, then it follows that minimizing the MS
error E{|en |2} is equivalent to minimizing E{| v1,n – v ˆ 1,n |2}
In other words, the output of the adaptive filter is the MMS estimate of v1,n since there is no
information about the desired signal dn in the reference v2,n. Therefore, en is the MMS
estimate of dn.
Adaptive Echo Cancellation
A hands-free unit includes a microphone and a loudspeaker. Voice of far speaker, comes out
of loudspeaker, reflected (the echo), and sent back through mic to the far speaker. Echoes
prohibit a normal conversation and must be cancelled.
Mathematically,
u (n) our of loudspeaker (speech signal of the far speaker)
s (n) signal of near speaker
Hroom[u](n), the echo
Calling the signal into the mic d (n),

Suppose that a channel used to transmit a signal d(n) introduce an echo so that the received
signal is

Where |α| < 1 and N is the delay associated with the echo. If both α and N are known, the the
ideal echo canceller for recovering d(n) from x(n) is an IIR filter that has a system function
given by

However, since α and N are generally unknown and possible time-varying, then it is more
appropriate for the echo canceller to be an adaptive recursive filter. Although a non recursive
adaptive filter is considered, the order of the filter required for a sufficiently accurate estimate
of d(n) may be too large. The inverse filter, H(z), is expanded in a geometric series as
follows:

If p is large enough so that |α|p « 1, then a finite-order approximation to H(z) is formad as


follows:

and consider an adaptive nonrecursive echo canceller of the form

However, if α ≈ 1, which forces p to be large, or if N » 1, then the order of the adaptive filter,
Np, required to produce a sufficiently accurate approximation to the inverse filter may be too
large for this to be a viable solution.

Recursive least squares (RLS)

In each of the adaptive filtering methods discussed so far, gradient descent algorithms were
considered minimizing the MS error

The problem with these methods is that they all require knowledge of the autocorrelation of
the input process E{xn xn-k*}, and the cross-correlation between the input and the desired
process E{dnxn-k*}. When the statistical information is unknown, these statistics from the data
are extimated.

Although this approach may be adequate in some applications, in others this gradient
estimate may not provide a sufficiently rapid convergence or a sufficiently small excess MS
error. An alternative, therefore, is to consider error measures that do not include expectations
and may be computed directly from the data.

For example, a least squares (LS) error


does not require statistical information about xn and dn and may be evaluated directly from xn
and dn. Note that minimizing the MS error produces the same set of filter coefficients for all
sequences having the same statistics: the filter coefficients do not depend on the incoming
data.

Minimizing the LS error that depends explicitly on the specific values of xn and dn will
produce different filters for different signals even if the signals have the same statistics. In
other words, different realizations of xn and dn will lead to different filters.

Exponentially weighted RLS

Let us reconsider the design of an FIR adaptive Wiener filter and find the coefficients

that minimize, at time n, the weighted LS error

where 0 < λ ≤ 1 is an exponential weighting (forgetting)


factor and

Note that ei is the difference between the desired signal di and the filtered output at time i,
using the latest set of filter coefficients wn,k. Thus, in minimizing εn it is assumed that the
weights wn are constant over the entire observation interval [0, n].

To find the coefficients minimizing the LS error, we set the derivative of the error to zero for
k = 0, 1,…, p as

Which, in the matrix form, becomes

Where Rx(n) is a (p+1) x (p+1) exponentially weighted deterministic autocorrelation matrix


for xn:

with xi the data vector


and where rdx(n) is the deterministic cross-correlation between dn and xn and are called as
deterministic normal equations.

For the set of optimum coefficients, the error will be

If wn,l are the coefficients minimizing the squared error, the second term in is zero and the
minimum error is

Alternatively, using vector format, the minimum mean square error is

where |dn |λ2 is the weighted norm of the vector dn = [dn, dn-1, …,d0]T.

Since both Rx(n) and rdx(n) depend on n, instead of solving the deterministic normal
equations directly for each value of n, we derive a recursive solution of the form:
wn = wn-1 +Δwn-1

where ∆wn-1 is a correction that is applied to the solution at time n – 1. Observe that

The cross-correlation may be updated recursively as


Similarly, the autocorrelation matrix may also be updated recursively as

R x ( n ) = λ R x (n − 1) + x *n xTn

However, since we are interested in the inverse of Rx(n), we need to use the Woodbury’s
identity to obtain the following:
Simplifying the notation, we denote the inverse of the autocorrelation matrix at time n as

and define the gain vector as


Sliding window RLS (WRLS)

The RLS algorithm minimizes the exponentially weighted least squares error εn. With the
growing window RLS, each of the squared errors |ei|2 from i = 0 to i = n are equally weighted,
whereas with an exponentially weighted RLS, the squared errors |ei|2 become less important for
values of i that are small compared to n. In both cases, however, the RLS algorithm has infinite
memory in the sense that all the data from n = 0 will affect the values of coefficients wn. In
certain applications (for instance, for nonstationary processes) this may be undesirable.

An alternative is to minimize the sum of the squares of ei over a finite window:

The finite window RLS algorithm tracks nonstationary processes more easily and is able to
“forget” any data outliers after a finite number of iterations.

The filter coefficients can be found by solving the equations recursively with a computational
complexity on the order of p2 operations
The sliding window RLS algorithm consists of the following steps:
1. Given the solution wn-1 at time n-1, with the new data value xn, the weight vector wn is found
that minimizes the error

2. The weight vector wn that minimizes εL,n is then determined by discarding the last data point
xn-L-1.

The growing window RLS algorithm is used in the first step as follows:

In the second step of the recursion, the last data point xn -L-1 is discarded to restore the L+1 point
window. Therefore, we begin with a matrix update

Finally, with the matrix inversion lemma and following the steps used to derive the RLS
algorithm, we obtain the update equations:

Compared to the exponentially weighted RLS, the sliding window RLS requires about twice the
number of multiplications and additions. It also requires that p+L values of xn be stored. This
storage requirement may potentially be a problem for long windows.

You might also like