Advanced Digital Signal Processing: Linear Prediction and Optimum Linear Filters

Advanced Digital Signal Processing
Lecture 6
Linear Prediction and Optimum Linear Filters
Dr. Mohammed Najm Abdullah

Stochastic Proceses
• The term “stochastic process” is broadly used to describe a random

process that generates sequential signals such as speech or noise.
• In signal processing terminology, a stochastic process is a probability
model of a class of random signals, e.g. Gaussian process, Markov
process, Poisson process,etc.
Definitions
• Stochastic process
• Value of variable changes in an uncertain way
• Discrete time vs. continuous time
• When can variable change?
• What values can the variable take?
• Markov property
• Only the current value of variable is relevant for future
predictions
• No information from past prices or path
Stationary and Non-Stationary Random Processes
• The amplitude of a signal x(m) fluctuates with time m, the characteristics

of the process that generates the signal may be time-invariant
(stationary) or time-varying (non-stationary).
• A process is stationary if the parameters of the probability model of the
process are time invariant; otherwise it is non-stationary.
Strict-Sense Stationary Proceses
A random process X(m) is stationary in a strict sense if

all its distributions and statistical parameters such as the
mean, the variance, the power spectral composition and
the higher-order moments of the process, are time-
invariant.
• E[x(m)] = μx ; mean
• E[x(m)x(m + k)] = rxx(k) ; variance
• E[|X(f,m)|2] = E[|X(f)|2] = Pxx(f) ; power spectrum
Wide-Sense Stationary Proceses
A process is said to be wide sense stationary if the mean

and the autocorrelation functions of the process are time
invariant:
• E[x(m)] = μx
• E[x(m)x(m + k)]= rxx (k)
Non-Stationary Processes
• A random process is non-stationary if its distributions or statistics
vary with time.
• Most stochastic processes such as video signals, audio signals,
financial data, meteorological data, biomedical signals, etc., are
nonstationary, because they are generated by systems whose
environments and parameters vary over time.
Adaptive Filters
Adaptive filters work on the principle of minimizing the mean squared
difference (or error) between the filter output and a target (or desired)
signal.
Adaptive filters are used in applications that involve a combination of
three broad signal processing problems:
1. De-noising and channel equalization – filtering a time-varying noisy
signal to remove the effect of noise and channel distortions;
2. Trajectory estimation – tracking and prediction of the trajectory of a
nonstationary signal or parameter observed in noise;
3. System identification – adaptive estimation of the parameters of a
time-varying system from a related observation.
Adaptive linear filters work on the principle that the desired signal or
parameters can be extracted from the input through a filtering or
estimation operation. The adaptation of the filter parameters is based on
minimizing the mean squared error between the filter output and a target
(or desired) signal. The use of the LSE criterion is equivalent to the
principal of orthogonality in which at any discrete time m the estimator is
expected to use all the available information such that any estimation error
at time m is orthogonal to all the information available up to time m.
An adaptive filter can be a combination of the following types of filters:
 single-input or multi-input filters;
 linear or nonlinear filters;
 finite impulse response FIR or infinite impulse response IIR filters.
Applications of adaptive filters include
 Multichannel noise reduction,
 Radar/sonar signal processing,
 Channel equalization for cellular mobile phones,
 Echo cancellation and low-delay speech coding.
Adaptive Filters
• An adaptive filter is in reality a nonlinear device, in the sense that it

does not obey the principle of superposition.
• Adaptive filters are commonly classified as:
• Linear
An adaptive filter is said to be linear if the estimate of quantity
of interest is computed adaptively (at the output of the filter) as
a linear combination of the available set of observations applied
to the filter input.
• Nonlinear
Neural Networks
Linear Filter Structures
• The operation of a linear adaptive filtering algorithm involves two

basic processes:
• a filtering process designed to produce an output in response to a
sequence of input data
• an adaptive process, the purpose of which is to provide mechanism for
the adaptive control of an adjustable set of parameters used in the
filtering process.
• These two processes work interactively with each other.
• There are three types of filter structures with finite memory :
• Transversal Filter,
• Lattice Predictor,
• Systolic Array.
Linear Filter Structures
• For stationary inputs, the resulting solution is commonly known as the Wiener
filter, which is said to be optimum in the mean-square sense.
• A plot of the mean-square value of the error signal vs. the adjustable parameters
of a linear filter is referred to as the error-performance surface.
• The minimum point of this surface represents the Wiener solution.
• The Wiener filter is inadequate for dealing with situations in which non-
stationarity of the signal and/or noise is intrinsic to the problem.
• A highly successful solution to this more difficult problem is found in the
Kalman filter, a powerful device with a wide variety of engineering
applications.
Transversal Filter
Lattice Predictor
It has the advantage of simplifying the computation

Systolic Array
The use of systolic arrays has made

it possible to achieve a high throughput,
which is required for many advanced signal-
processing algorithms to operate
in real time
Linear Adaptive Filtering Algorithms
• Stochastic Gradient Approach
• Least-Mean-Square (LMS) algorithm
• Gradient Adaptive Lattice (GAL) algorithm
• Least-Squares Estimation
• Recursive least-squares (RLS) estimation
• Standard RLS algorithm
• Square-root RLS algorithms
• Fast RLS algorithms
Wiener Filters
• The design of a Wiener filter requires a priori information

about the statistics of the data to be processed.
• The filter is optimum only when the statistical characteristics
of the input data match the a priori information on which the
design of the filter is based.
• When this information is not known completely, however, it
may not be possible to design the Wiener filter or else the
design may no longer be optimum.
Wiener Filters: Least Square Error Estimation
• The filter input–output relation is given by:
• The Wiener filter error signal, e(m) is defined as the difference

between the desired signal x(m) and the filter output signal xˆ (m) :
• error signal e(m) for N samples of the signals x(m) and y(m):
• The Wiener filter coefficients are obtained by minimising an average
squared error function E[e2(m)] with respect to the filter coefficient
vector w
• Ryy=E [y(m)yT(m)] is the autocorrelation matrix of the input signal

• rxy=E [x(m)y(m)] is the cross-correlation vector of the input and the
desired signals
For example, for a filter with only two coefficients (w0, w1),
the mean square error function is a bowl-shaped surface, with
a single minimum point
• The gradient vector is defined as
• Where the gradient of the mean square error function with respect to the filter
coefficient vector is given by
• The minimum mean square error Wiener filter is obtained by setting equation
to zero
• The calculation of the Wiener filter coefficients
requires the autocorrelation matrix of the input signal
and the crosscorrelation vector of the input and the
desired signals.
• The optimum w value is wo = Ryy-1 ryx

The LMS Filter
• A computationally simpler version of the gradient search method is

the least mean square (LMS) filter, in which the gradient of the
mean square error is substituted with the gradient of the
instantaneous squared error function.
• Note that the feedback equation for the time update of the filter
coefficients is essentially a recursive (infinite impulse response)
system with input μ[y(m)e(m)] and its poles at α.
The LMS Filter
• The LMS adaptation method is defined as
• The instantaneous gradient of the squared error can be expressed as
• Substituting this equation into the recursion update equation of the filter
parameters, yields the LMS adaptation equation
The LMS Filter
• The main advantage of the LMS algorithm is its simplicity both in

terms of the memory requirement and the computational
complexity which is O(P), where P is the filter length
• Leaky LMS Algorithm
• The stability and the adaptability of the recursive LMS adaptation can
improved by introducing a so-calledleakage factor α as
w(m +1) =α.w(m) + μ.[y(m).e(m)]
• When the parameter α<1, the effect is to introduce more stability and
accelerate the filter adaptation to the changes in input signal
characteristics.
Stability of LMS
• The LMS algorithm is convergent in the mean square if and
only if the step-size parameter satisfy
2
0
max
• Here max is the largest eigenvalue of the correlation matrix of
the input data
2
• More practical test for stability is 0   
input signal power
• Larger values for step size

• Increases adaptation rate (faster adaptation)
• Increases residual mean-squared error 27
Applications of Adaptive Filters:
Identification
• Used to provide a linear model of an unknown plant
• Applications:
• System identification
• Layered earth modeling
28
Applications of Adaptive Filters: Inverse Modeling
• Used to provide an inverse model of an unknown plant
• Applications:
Predictive deconvolution
Adaptive equalization
Blind equalization 29
Applications of Adaptive Filters: Prediction
• Used to provide a prediction of the present value of a random signal
• Applications:
Linear predictive coding
Adaptive differential pulse-code modulation
Autoregressive spectrum analysis
Signal detection
30
Applications of Adaptive Filters: Interference Cancellation
• Used to cancel unknown interference from a primary signal
• Applications:
hands-free carphone, aircraft headphones etc
Adaptive noise canceling (Echo / Noise cancellation: hands-free carphone,
aircraft headphones etc
Echo cancelation
Adaptive beamforming 31
Example:
Acoustic Echo Cancellation
32
Linear prediction
Linear prediction (LP) model predicts the future value of a signal
from a linear combination of its past values.
LP models are used in a diverse applications, such as:
 Data forecasting,
 Speech coding,
 Video coding,
 Speech recognition,
 Model-based spectral analysis,
 Model-based signal interpolation,
 Signal restoration,
 Noise reduction,
 Impulse detection and change detection.
In terms of usefulness and application as a signal processing tool, linear
prediction models rank almost alongside the Fourier transform. Indeed,
modern digital mobile phones employ voice coders based on linear
prediction modelling of speech for efficient coding.
There are two main motivations for the use of predictors in signal
processing applications:
1. To predict the trajectory of a signal;
2. to remove the predictable part of a signal in order to avoid
retransmitting parts of a signal that can be predicted at the receiver
and thereby save storage, bandwidth, time and power.
The accuracy with which a signal can be predicted from its past samples
depends on:
1. The autocorrelation function, or
2. Equivalently the bandwidth and the power spectrum, of the signal.
.
Linear prediction models are extensively used in speech processing applications such as
in low bit-rate speech coders, speech enhancement and speech recognition. Speech is
generated by inhaling air and then exhaling it through the glottis and the vocal tract. The
noise-like air, from the lung, is modulated and shaped by vibrations of the glottal cords
and the resonance of the vocal tract. Figure 2 illustrates a source-filter model of speech.
The source models the lung, and emits a random input excitation signal that is filtered by
a pitch filter. The pitch filter models the vibrations of the glottal cords, and generates a
sequence of quasiperiodic excitation pulses for voiced sounds
A Prediction Problem
• Problem: Given a sample set of a stationary

processes to predict the value of the process some
time into the future as
{x[n], x[n  1], x[n  2],..., x[n  M ]}
 The function may be linear or non-linear. We
concentrate only on linear prediction functions
x[n  m]  f ( x[n], x[n  1], x[n  2],..., x[n  M ])
• Linear Prediction dates back to Gauss in the 18 th century.
• Extensively used in DSP theory and applications (spectrum
analysis, speech processing, radar, sonar, seismology, mobile
telephony, financial systems etc.)
• The difference between the predicted and actual value at a
specific point in time is called the prediction error.
38
• The objective of prediction is: given the data, to

select a linear function that minimizes the
prediction error.
• The Wiener approach examined earlier may be cast
into a predictive form in which the desired signal to
follow is the next sample of the given process
39
Forward & Backward Prediction
• If the prediction is written as
xˆ[n]  f ( x[n  1], x[n  2],..., x[n  M ])

Then we have a one-step forward prediction
• If the prediction is written as
xˆ[n  M ]  f ( x[n], x[n  1], x[n  2],..., x[n  M  1])

Then we have a one-step backward prediction
• The forward prediction error is then
e f [n]  x[n]  xˆ[n]
• Write the prediction equation as
M
xˆ[n]   w[k ]x[n  k ]
k 1
• And as in the Wiener case we minimize the

second order norm of the prediction error
• Thus the solution accrues from
2 2
J  min E{(e f [n]) }  min E{( x[n]  xˆ[n]) }
w w
• Expanding we have
 2 2
J  min E{( x[n]) }  2 E{( x[n]xˆ[n])  E{( xˆ[n]) }
w

• Differentiating with resoect to the weight vector we
obtain
J xˆ[n] xˆ[n]
 2 E{( x[n] )  2 E{xˆ[n] }
wi wi wi
• However
xˆ[n]
 x[n  i ]
wi
• And hence
J
 2 E{( x[n]x[n  i ])  2 E{xˆ[n]x[n  i ]}
wi
• or
J M
 2E{(x[n]x[n  i])  2E{w[k]x[n  k]x[n  i]}
wi k1
• On substituting with the correspending correlation
sequences we have
J M
 2r[i]  2w[k]rxx[i  k]
wi k 1
• Set this expression to zero for minimisation to yield
M
w[k]rxx[i  k]  rxx[i] i 1,2,3,...,M
k1
• These are the Normal Equations, or Wiener-Hopf ,
or Yule-Walker equations structured for the one-
step forward predictor
• In this specific case it is clear that we need only

know the autocorrelation properties of the given
process to determine the predictor coefficients
• Set 1 m0
a M [ m ]   w[ m ] m  1,.., M
0 mM
• And rewrite earlier expression as
M rxx[0] k 0
aM [m]rxx[m  k] 
m0 0 k 1,2,...,M
• These equations are sometimes known as the

augmented forward prediction normal equations
• The prediction error is then given as
M
ef [n]  aM [k]x[n  k]
m0
• This is a FIR filter known as the prediction-error filter
1 2 M
A f ( z )  1  a1[1]z  aM [2]z  ...  aM [ M ]z
Professor A G Constantinides© 47
Backward Prediction Problem
• In a similar manner for the backward prediction case we
write
eb [n]  x[n  M ]  xˆ[n  M ]
• And
M
~[k ]x[n  k  1]
xˆ[n  M ]   w
k 1
• Where we assume that the backward predictor filter

weights are different from the forward case
• Thus on comparing the the forward and backward
formulations with the Wiener least squares conditions we
see that the desirable signal is now
x[n  M ]
• Hence the normal equations for the backward case can be
written as
M
~[m]r [m  k]  r [M  k 1]
w k 1,2,3,...,M
xx xx
m1
• This can be slightly adjusted as
M
~[M  m1]r [k  m]  r [k]
w k 1,2,3,...,M
xx xx
m1
• On comparing this equation with the corresponding

forward case it is seen that the two have the same
mathematical form and
~[M  m1]
w[m]  w m 1,2,...,M
• Or equivalently
~[m]  w[M  m1] 
w m 1,2,...,M
• Backward prediction filter has the same weights as the
forward case but reversed.
1 2 M
Ab ( z )  a M [ M ]  a M [ M  1] z  a M [ M  2] z  ...  z
• This result is significant from which many properties of

efficient predictors ensue.
• Observe that the ratio of the backward prediction
error filter to the forward prediction error filter is
allpass.
• This yields the lattice predictor structures.
Differences between estimation and approximation?
What is the difference between prediction and estimation intervals in

regression?
Differences between prediction, expectation, projection, estimation,

extrapolation?

Advanced Digital Signal Processing: Linear Prediction and Optimum Linear Filters

Uploaded by

Copyright:

Available Formats

You might also like

Advanced Digital Signal Processing: Linear Prediction and Optimum Linear Filters

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Advanced Digital Signal Processing: Linear Prediction and Optimum Linear Filters

Uploaded by

Copyright:

Available Formats

Advanced Digital Signal Processing

Dr. Mohammed Najm Abdullah

• The term “stochastic process” is broadly used to describe a random

• The amplitude of a signal x(m) fluctuates with time m, the characteristics

A random process X(m) is stationary in a strict sense if

A process is said to be wide sense stationary if the mean

• An adaptive filter is in reality a nonlinear device, in the sense that it

• The operation of a linear adaptive filtering algorithm involves two

It has the advantage of simplifying the computation

The use of systolic arrays has made

• The design of a Wiener filter requires a priori information

• The filter input–output relation is given by:

• The Wiener filter error signal, e(m) is defined as the difference

• Ryy=E [y(m)yT(m)] is the autocorrelation matrix of the input signal

• The optimum w value is wo = Ryy-1 ryx

• A computationally simpler version of the gradient search method is

• The LMS adaptation method is defined as

• The instantaneous gradient of the squared error can be expressed as

• The main advantage of the LMS algorithm is its simplicity both in

w(m +1) =α.w(m) + μ.[y(m).e(m)]

• Larger values for step size

• Used to cancel unknown interference from a primary signal

• Problem: Given a sample set of a stationary

• The objective of prediction is: given the data, to

• If the prediction is written as

xˆ[n]  f ( x[n  1], x[n  2],..., x[n  M ])

xˆ[n  M ]  f ( x[n], x[n  1], x[n  2],..., x[n  M  1])

• And as in the Wiener case we minimize the

• Set this expression to zero for minimisation to yield

• In this specific case it is clear that we need only

• These equations are sometimes known as the

• This is a FIR filter known as the prediction-error filter

• Where we assume that the backward predictor filter

• On comparing this equation with the corresponding

• This result is significant from which many properties of

What is the difference between prediction and estimation intervals in

Differences between prediction, expectation, projection, estimation,

You might also like