Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

ADAPTIVE SIGNAL PROCESSING

or: INTRODUCTION TO STATISTICAL


SIGNAL PROCESSING

S.V. Hanly

July 29, 2010

1 Introduction

1.1 Overview

The problem is to extract a signal of interest from some noisy observations. Applications
include mobile cellular phones, modems, TV, DVD, cameras, radar, medical imaging...

signal ^
X(t) noisy channel Y(t) processing X(t)
signal observation estimate

The signal can be analog (continuous values) e.g. radar signal, sensor signal, medical
image, audio, or it can be a digital signal (i.e. takes on one of several possible discrete
values) e.g. coded digital data. Many devices convert an analog signal into a digital signal
through source coding e.g. telephone, camera, medical imaging device, ... basically most
measurement or sensor devices today will do this.
The channel can be virtually anything: wireless, satellite, twisted pair, optical fibre, electri-
cal circuit, computer memory, biological system, ... It might be time-invariant e.g. optical
fibre) or it might be time varying e.g. mobile cellphone channel. It might be linear or
nonlinear. We will use statistical models since many channels are best treated as random
phenomena. Noise is also modeled using random processes (collections of random variables).
Signal processing attempts to recover X(t) via:
detection: in the case that X(t) is digital, we usually want to pick the most likely value
(but not always, sometimes “soft decisions” are better, as in modern iterative decoding al-
gorithms). For example, if X(t) represents a bit of information - 0 or 1 - then we might want
to minimize the error probability. Maximum a posteriori estimation (MAP) or maximum
likelihood estimation (MLE) are possible approaches.

1
estimation: in the case that X(t) is analog, we want to find an estimate X̂(t) that is close
to X(t) in some sense. One approach is to minimize the mean square error (MSE).
Adaptive signal processing is an advanced topic which deals with the case that the statistics
of the signal, channel and maybe noise, are not at first known. The signal processor must
make a series of measurements and adapt its behaviour so that it converges to what it
would be doing if the statistics were known. This is very useful in practice, especially when
the statistics are changing over time and the signal processor needs to track the changes
e.g. radar - tracking a moving target, or adaptive equalization - adapting the equalizer to a
time-varying cell-phone radio channel. Topics such as Kalman filtering and recursive least
squares (RLS) arise in this area.
We will treat this module as a first course in statistical signal processing, and we will
only meet adaptive techniques in the latter part of the semester. Students should have
some background in probability and random processes, but we will review the necessary
probability theory in the first few weeks. I will give a test in the first class to determine the
pace that we will take in this review of probability. You don’t need to record your name
or student number on your test paper, I’m only interested in the overall background of the
class.

1.2 Skills to be aquired

You are probably doing this module because either you are a PhD student who needs to
pick up some signal processing techniques for your research, or you are a masters student
(possibly in industry) who needs to understand some of the myriad possible uses of statisti-
cal signal processing in the real-world. In either case, the main stumbling block is probably
the mathematics. Once you have that, you can learn a whole range of applications your-
self. Therefore, the key skills I’m going to teach you are related to the basic math and
canonical models needed to understand statistical signal processing. Once you have that,
you can translate most real-world problems into the basic mathematical problems that we
will address. I will be giving out homework problems, which will reinforce the techniques
covered in lectures. There will also be an assignment/project that will help you learn how
to translate a real-world problem into one of the canonical models studied in this module.
So the skills to be developed are:

1. basic mathematics, models and techniques required to do statistical signal processing


2. critical thinking and problem solving skills
3. research skills: how to find an application and translate the basic models learnt in
class into real-life scenarios

1.3 Assessment

We will use a mixture of continuous assessment and test/exam. We need continuous as-
sessment to make sure you are keeping up with the lectures. We need an exam to test your

2
knowledge under exam condition, when collaboration is not allowed.

1. Homework problems worth 10%

2. Assignment/project worth 10%

3. Midsemester test worth 10%

4. Final exam worth 70%

The midsemester test will take place during class time after several weeks. It will occur
after we have completed the probability review and will probably be based on that (unless
we do that very quickly). I will give you good notice of the midsemester test date: I will
tell you at the lecture in week 2, after the initial test has been analyzed, and you will have
at least a month’s notice.
The project will be to write a report of up to 10 pages on an application of estimation to
a real-life engineering or scientific problem. You need to show how to construct a math-
ematical model of the form covered in lectures (or similar) and how to apply the results
from class to the application that you are considering. This report needs to be written as
a scientific article, with introduction, conclusion and bibliography. You must properly cite
your references and you cannot plagiarize: it must be in your own words. Possible areas to
consider include equalization, beamforming using antenna arrays, echo cancellation, noise
cancellation, interference cancellation, predictive coding, or any other application of your
choice.

1.4 A mobile cellphone example

Throughout the semester, we will use a particular example of a mobile cellphone network
as a way to illustrate the various techniques that we will develop. Lets introduce this model
now.

In mobile communications, we have some mobiles (say, K of them) trying to send signals to
a base station (BS). Assume that each mobile has some data, each mobile samples the data
at the same rate (in Hz) and they do this synchronously, so that we can assume each sample

3
is created at the same time (just for simplicity). So we have a discrete time model in which
unit time corresponds to the sampling period. At a given time, mobile k has a data symbol
Xk that it wants to convey to the BS. Typically, Xk will be complex valued (amplitude
and phase) and from a discrete set of possible values (digital) but in the simplest case it is
real-valued and binary i.e. Xk = ±1.
In a traditional cellphone system, each mobile would modulate a different carrier frequency
than the others, or use a different time than the others, to avoid interference between the
mobiles. But in some of today’s wideband systems, a different approach is used called “code
division multiple access” (CDMA). In CDMA, all mobiles send at the same time, using
“spreading codes”, otherwise known as “signature sequences”, to distinguish the signals at
the BS. Signal processing is required to disentangle the received signals.
As suggested by the term “spreading code”, the mobiles actually transmit using a much
wider bandwidth. Think about it: from a system perspective, this would also be the case
with, say, frequency division multiple access (FDMA). With FDMA, the individual mobile
signals would not be spread, but the totality of the whole system bandwidth would be
proportional to K, since each mobile would occupy a different chunk of spectrum. The
same is true with CDMA, except that all mobiles signals share the same band, so they have
to spread their signals by a bandwidth expansion factor, N , that is proportional to K.

power spectral power spectral


density density

f f
W N*W
baseband bandwidth after
data spreading by
bandwidth signature sequence

The mobile samples at N times the data rate producing a sequence of bits call “chips”.
Figure 1 illustrates a possible signature sequence for the case N = 16. The mobile modulates
this sequence with its data i.e. it sends the sequence when the data bit is 1 and it sends
the inverse of the sequence when the data bit is 0.
Mathematically, we can represent the signature sequence of user k by a vector, sk , of
dimension N , e.g., for the sequence depicted above:
sk = (+1, −1, −1, +1, +1, −1, +1, −1, −1, +1, −1, +1, −1, −1, +1, +1)t
Here, we write the signature sequence as a row vector, but the “t” at the end means
“transpose” so actually it is a column vector. The receiver at the BS also samples at the
chip rate, so to demodulate one symbol from each mobile, it puts N samples into a received
vector, Y , also of dimension N :
Y = X1 s1 + X2 s2 + . . . + XK sK + W (1)

4
+1

t
−1

N chips = 1 data symbol

(here N=16)

Figure 1: A signature sequence of chips

Here, W = (W1 , W2 , . . . , WN )t represents the receiver noise at the BS, with a different noise
sample for each chip.
What should the base station do with Y to recover X1 , X2 , . . . , XK ?
This is a statistical signal processing problem because X1 X2 , . . . , XK are random variables
(which each might take values in ±1), and W is a random vector of noise samples, usually
best modeled as independent Gaussian random variables. In most of the lectures we will
treat s1 , s2 , . . . sK as known, deterministic signature sequence vectors, but when we treat
adaptive signal processing, they will become unknown, random quantities that also have to
be estimated.
Lets start by considering the relatively simple problem when there is just one mobile in the
cell (K = 1) and transmission is binary (X1 = ±1). In this case, (1) becomes

Y = X1 s1 + W . (2)

Suppose we observe a particular received signal Y = y at the BS and we have to decide if


X1 = 1 or X1 = −1. The Bayesian approach is to assume a prior probability distribution
for X1 , and its natural to assume 1 and −1 are equally likely (in fact, Shannon theory
tells us they should be if the communication is to be optimal). One then computes the
so-called a posteriori probability distribution on X1 given the observation that Y = y and
it might be that X1 = 1 becomes more likely than X1 = −1, after observing Y = y, so we
estimate X̂1 = 1 in this case. The a posteriori probability distribution is nothing more than
the conditional distribution of X1 , conditioned on the new information that Y = y. This
approach is called maximum a posteriori estimation (MAP) and it is appropriate when X1
takes on a set of discrete values, such as when X1 is binary.
One could also compute the likelihood of getting Y = y given that X1 = 1, versus the
likelihood of getting Y = y given that X1 = −1. If we don’t have any prior information
on X1 it would make sense to pick the estimate to be the one that makes the likelihood of
getting what we observed the highest. So we would pick the estimate that maximizes the
likelihood of Y = y. This approach is called maximum likelihood estimation (MLE). When
X = 1 and X = −1 are equally likely, MAP and MLE give the same result.

5
We can take either approach when there are multiple users. Consider the Bayesian approach
when Xk = 1 and Xk = −1 are equally likely, for k = 1, 2, . . . , K. The MAP estimator picks
the most likely sequence of K bits from all the mobiles. But when K is large this becomes
computationally difficult/impossible. Why? For this situation, we can’t solve the problem
exactly, in practice, (except for small K) so we need to consider suboptimal approaches
when K is large, such as when there are 20 or more mobiles per cell.
One approach is to treat both interference and noise as Gaussian. Consider the problem of
estimating X1 in (1). If we treat the interference and noise for mobile 1:

Z (1) = X2 s2 + X3 s3 + . . . + XK sK + W , (3)

as a Gaussian random vector (and this is not unreasonable) then (1) becomes:

Y (1) = X1 s1 + Z (1) (4)

and now we just have to choose whether X1 = 1 or X1 = −1. We could do the same,
in parallel, for all the mobiles, getting interference and noise vectors Z (1) , Z (2) , . . . , Z (K) ,
respectively, and for user k, we have

Y (k) = Xk sk + Z (k) . (5)

Understanding Gaussian random vectors, like Z (k) , is clearly important if we take this
approach.
In fact, what we could do is pretend that X1 , X2 , . . . , Xk are Gaussian random variables,
instead of ±1 random variables. All we have to do is replace the binary prior probability
distribution, with a Gaussian prior probability distribution. More precisely, we could assume
that
X = (X1 , X2 , . . . , XK )t (6)
is a Gaussian random vector, and, further, that

(X1 , X2 , . . . , XK , W1 , W2 , . . . , WN )t

is a Gaussian random vector. Then (1) becomes:

Y = SX + W (7)

where S is a N × K matrix, whose kth column is the signature sequence sk , and the prior
distribution on X is that of a Gaussian vector. It should be clear by now that estimation of
linear models like (7), involving Gaussian random vectors, will be an important component
of this module.
More generally, we can consider Gaussian estimation problems like

Y = AX + W (8)

where (X1 , X2 , . . . , XK , W1 , W2 , . . . , WN )t is a Gaussian random vector, and A is an arbi-


trary N × K matrix. We observe Y = y, and we want to estimate X, which we can write

6
as X̂(y). If we stand back, and think of many realizations of this experiment, then the
estimator is X̂(Y ), which is itself a random variable, since Y is a random variable.
The simplest case is the scalar case in which N = K = 1. Then

Y = aX + W (9)

where (X, W ) has a two dimensional Gaussian vector prior, and a is a constant. Note that
X will be a Gaussian random variable and hence can take any real value at all, so MAP
doesn’t make much sense - the prior gives all values zero probability, and this will also
be true of the posterior also, after observing any possible outcome Y = y. So we need a
different approach for continuous valued random variables.
One very reasonable approach that is often used in the continuous case is to minimize the
so-called mean square error (MSE). This is a measure of the distance between two analog
signals from a probabilistic point of view. If X̂(Y ) is the estimator of X after observing the
random variable Y , then the mean square error (MSE) is E[|X − X̂(Y )|2 ], where E means
“expectation”. The reason for squaring the difference between the signal and estimate is
to get mathematical tractability: the quadratic function is differentiable, with a unique
minimum.
In summary, to do statistical signal processing for the above systems, we will first need
to study random variables, continuous random variables like the Gaussian distribution,
expectation, conditional probability (since we need to condition on observations such as
Y = y), Gaussian random vectors, and mean square estimation. To do all this, we will
follow the excellent notes of Professor Abbas El Gamal, at Stanford University.
One thing about Gaussian estimation: it will turn out that finding the MSE estimate of
Xk in (7), under the Gaussian prior, for k = 1, 2, . . . , K, will allow us to then solve (5) for
the most likely binary value. So even when Gaussian priors are not the correct priors, the
Gaussian case is still very useful and important. We will study the Gaussian case first, and
then extend later to more general problems.

You might also like