1 Introduction

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 31

Lecture “Fundamentals of Estimation

and Detection”

Part I: Estimation Theory


Part II: Detection Theory

Note: Because detection or decision problems almost always require the estimate of some quantity
on which the detection step is based upon, we start with estimation theory in this course before
we turn to detection theory.
September 7, 2022 Wolfgang Rave Slide 1
Organisational Issues

Time and location (regular teaching in lecture hall; online, in case Corona gets worse)
• Lecture hall BAR 106: Wed, 16:40 – 18:10, Wolfgang Rave
• Homework Session: Thu, 16:40 – 18:10, R. Rojbi, I. Bizon Franco de Almeida,
A. Rezaei
• Questions during online lectures: collect and ask at the end of the lecture or when
I ask for questions (better ideas are welcome; in the lecture hall raise your hand)
Material on OPAL – web page
• lecture slides
• homework problems
• dates & links for GoToMeeting
• subscribe to learning group 2022/23 ⇒access to lecture material

Credits for module ‚Vertiefung Mobile Nachrichtensysteme‘ …


• written exam for 2 of 3 lectures (Machine Learning, Fundamentals of Estima-
tion and Detection, Algorithmen für Mehrantennensysteme; EE students)
• written exam for ‚Fundamentals of Estimation and Detection‘ (NES students)
September 7, 2022 Wolfgang Rave Slide 2
Literature to the Course

The course is based on two standard text books by Steven M. Kay:


‘Fundamentals of Statistical Signal Processing: Estimation Theory’
‘Fundamentals of Statistical Signal Processing: Detection Theory’

September 7, 2022 Wolfgang Rave Slide 3


x-y Notation in Estimation Theory

Remarks on notation:
1) While S. Kay uses the -x notation, we use in this course the so-called x-y notation
which is customary in the literature. Here x denotes the parameter (or parameter
vector) to be estimated while y represents the observation(s) or data that are avail-
able to estimate the parameter(s) of interest. Initially, we will start by estimating a
scalar parameter from one or several observations. Then we will generalize our
consideration to estimating a vector parameter x.

2) We also use boldface notation for random variables such as the noise w and the
observations y. In contrast, constant or deterministic quantities are written in
normal font. Modelling the parameter x to be estimated as a constant which we will
do most of the time during this course (so-called ‘classical’ estimation), it is there-
fore written in normal font. Note that this is a modelling assumption. If x is assumed
to be random, the corresponding approach is called ‘Bayesian’ estimation (because
we assume some prior probability density function, PDF, for x).
3) Thus, in equations relating random variables to each other we use bold face. However, given
realizations that are again deterministic, are written in normal font. This occurs e.g. for
density functions where we write the argument in normal font, such as fx(x).
September 7, 2022 Wolfgang Rave Slide 4
Lecture Overview (Content)

1) Part I (Estimation Theory): Minimum Variance Unbiased Estimators (MVUE)


and the Cramer-Rao Lower Bound (CRLB)
 Unbiased estimators and minimum variance criterion
 Theorem on Cramer-Rao Lower Bound (smallest achievable MSE)
 Transformation of Parameters
 Signal Processing Example(s)
2) Linear Models ‘Classical’ estimation
 Practically important, achieves CRLB (parameter treated as
 Example(s) a deterministic
3) General Minimum Variance Unbiased Estimation variable)
 Sufficient statistics
 Neyman-Fisher factorization theorem (to find the MVUE)
 Rao-Blackwell-Lehman-Scheffe theorem
4) Best Linear Unbiased Estimators (BLUE)
 Definition and Determination of the BLUE
 Example(s)
5) Maximum Likelihood Estimation (MLE)
 Finding the MLE
 Properties and numerical determination
 Examples
6) …
September 7, 2022 Wolfgang Rave Slide 5
Lecture Overview (Content)

6) The ‘Bayesian’ Approach: General Bayesian Estimators


 Prior knowledge and estimation, choosing a prior PDF
 properties of the Gaussian PDF ‘Bayesian’ estimation
 maximum a-posteriori estimators (parameter treated as
7) Linear Bayesian Estimators a random variable)

Part II (Detection Theory):


8) Decision Theory for Binary Hypothesis Testing
 Neyman-Pearson theorem
 Receiver Operating Characteristics (ROC)
 Minimum Probability of Error
9) Deterministic Signals
 (Generalized) Matched Filter
10) Random Signals
 Estimator-Correlator

September 7, 2022 Wolfgang Rave Slide 6


Important Background Material

Review on Basic Concepts from


Probabilty Theory

• Random Variables and Random Vectors


• Distribution and Density Functions

September 7, 2022 Wolfgang Rave Slide 7


Probability of an Event

Classical definition of Probability (Simon Laplace, 18th century)


• The probability of an event A is defined is defined a-priori without actual experimen-
tation (provided all the outcomes are equally likely)
Number of favourable outcomes to A
P( A) 
Total number of possible outcomes
• Relative frequency definition given n experiments (trial)
n
P( A)  lim A
n n

• More rigorous abstract definitions based on set theory can be found in textbooks. As
a standard reference on probability and random variables we recommend the book:
A. Papoulis and S.U. Pillai, ‘Probability, random variables and stochastic processes’,
McGraw Hill (4th edition), 2002

September 7, 2022 Wolfgang Rave Slide 8


Concept of a Random Variable

Random variable  function (‘rule’), that associates a number to every


event within an event space
event space  random variable g() real number
(e.g. face i of die) (e.g. g(i) = 10i )

1 x1 = g(1)

2 x2 = g(2)


xM = g(M)
M

• Example 1 (rolling a die): for the set of events  = {1, 2, 3, 4, 5, 6} the random
variable (‘rule’) g(i) = 10i maps the events i to the real numbers xi ={10, …, 60}
September 7, 2022 Wolfgang Rave Slide 9
Distribution Function of a r.v.

The elements of the set  = {1, 2, 3, 4, 5, 6} contained in the event {x ≤ x} change as x


is varied. The probability Pr{x ≤ x} of the event {x ≤ x} is therefore a number that de-
pends on x.
This number is denoted as Fx(x) and called the (cumulative) distribution function (CDF)
of the (generic) random variable x:
Fx(x) = Pr{x ≤ x}

Example 2: For a fair die with Pr{x(i)} = 1/6, the discrete-valued distribution function
is given by the graph
Fx(x)
1
5/6
4/6
3/6
2/6
1/6
0 x
0 10 20 30 40 50 60
September 7, 2022 Wolfgang Rave Slide 10
Distribution Function of a r.v.

In particular we find:
Fx(35) = 1/2
Fx(30.1) = 1/2
Fx(30) = 1/2
Fx(29.9) = 1/3

Example 3: If a telephone call arrives uniformly at random within the interval x = [0, T]
the distribution function is continuous-valued and has the following form

Fx(x)
1

0 x
0 T
September 7, 2022 Wolfgang Rave Slide 11
Density Function of a r.v.

Interpretation of the distribution function:


Pr  a  x  b   Fx (b)  Fx ( a) with Fx (0)  0 and Fx ()  1

The derivative of the distribution function Fx(x) is called the probability density func-
tion (PDF) fx(x) of the r.v. x. Thus we have
d
f x ( x)  Fx ( x)
dx
and the above probability can be evaluated also from
b
Pr  a  x  b    f x ( x)dx  Fx (b)  Fx (a ).
a
Because Fx(x) is monotonously increasing the derivative has the property fx(x) ≥ 0.
For discrete-valued r.v. the density is a weighted sum of Dirac pulses
f x ( x)   pi ( x  xi )
i

and is usually called probability mass function (PMF).


September 7, 2022 Wolfgang Rave Slide 12
Density Function of a r.v.

Example 4: The densities of the examples 2 and 3 are


fx(x) fx(x)
1 1
5/6
4/6
3/6
2/6
1/6 1/T
0 x 0 x
0 10 20 30 40 50 60 0 T

Note that for discrete probability densities one also uses the term probability mass function (PMF).

Interpretation of probabilities as a definite integral of the density function (PDF) and


normalization constraint:
b 
Pr  a  x  b   f x ( x)dx with  f x ( x)dx  1
a 
So that the distribution function corresponds to the integral …
September 7, 2022 Wolfgang Rave Slide 13
Joint Densities

The dependency between two real-valued random variables {x, y} is characterized by


their joint probability density function. Let fx,y(x, y) denote the joint (continuous) PDF of
x and y; this function allows us to evaluate probabilites of the form
d b
P(a  x  b, c  y  d )    f x , y ( x, y ) dxdy
c a
namely, the probability that x and y assume jointly values within the intervals [a, b] and
[c, d], respectively. Let also fx|y(x|y) denote the conditional PDF of x given y; this func-
tion allows to evaluate the probabilites of events of the form
b
P(a  x  b, y  y )   f x| y ( x | y ) dx
a
namely, the probability that x assumes values within the interval [a, b] given that y is
fixed at the value y.It is well-known that the joint and conditional PDFs of two random
variables are related via Bayes´s rule which states that
f x , y ( x, y )  f y ( y ) f x| y ( x | y )  f x ( x) f y| x ( y | x) ‚conservation law for probability‘

Note: this essentially allows to express one density in terms of the others; e.g. if we know fx|y(x|y)
f ( x) f ( y | x)
and fy(y) , we can evaluate fx,y(x, y) and so on ... f ( x | y)  x| y
f ( y)
x y| x

y
September 7, 2022 Wolfgang Rave Slide 14
Dependent Random Variables / Correlation

The variables{x, y} are said to be independent, iff


f x| y ( x | y )  f x ( x) and f y|x ( y | x)  f y ( y )

in which case the pdfs of x and y are not modified by conditioning on y and x, resp. .
Otherwise, the variables are said to be dependent. In particular, when the variables are
independent, it follows that E[xy] = E[x] E[y]. It also follows that independent random
variables are uncorrelated, meaning that their cross-correlation is zero as can be verified
from the definition of cross-correlation (or cross-covariance): 2
 x  E  x -  x  x -  x  
 xy2  E  x -  x   y -  y   (‚centered‘ r.v.)
corr. coeff.:
 E  xy    x  y  E  x  E  y    x  y  0
 2

 xy  xy
; 1   xy  1
 x y
The converse statement is not true: uncorrelated random variables can be dependent.
Consider the example x = sin, y = cos for uniformly distributed [0,2]. Then we
have x2 + y2 = 1 so that x and y are dependent.
However: E[xy] = E[sincos]= ½ E[sin2] = 0; this means that x and y are uncorrelated.
September 7, 2022 Wolfgang Rave Slide 15
Normal (Gaussian) density for scalar random variables

A r. v. is normally or Gaussian distributed with parameters µ and 2, if its density func-
tion is given by 1   x2 
 (0,1)  exp  
2  2 
1   x   2 
f x ( x)  exp   Gaussian density
2 2
 2 2



This density will be our standard model for ‘noise’ (additive white Gaussian noise,
AWGN). The distribution function is given by
  y    
2
 x 
x
1
Fx ( x)    2 2 
2 2 
exp  dy  Q 
  

 
where we used the so-called ‘normal integral’ or ‘Q-function’
1
x
 y2 
Q( x) 
2  exp  2 dy
September 7, 2022 Wolfgang Rave Slide 16
Normal scalar random variable / Moments

The ‘normal integral’ follows from integrating a zero-mean Gaussian r.v. with unit vari-
ance. It is tabulated in many mathematical textbooks. Often we use x ~  (  , 2 ) as the
notation for a Gaussian r.v. with mean µ and variance 2.

Moments of a random variable


The moment of order n of a random variable x with support [-∞, ∞]is defined as

E  x   
n
x n f x ( x)dx


Special cases are E  x   xf x ( x) dx first moment  mean / expectation of x


E  x   x
2 2
f x ( x) dx second moment


from which the variance of the random variable can be computed as


 2  E  x 2    E  x 
2

Note: The variance equals the 2nd central moment or the 2nd moment of the centered r.v.. For the
generic Gaussian r.v. from above we find E[x] = µ and E[(x – µ)2] = 2.
September 7, 2022 Wolfgang Rave Slide 17
Characteristic and moment generating fct.

Although it might appear complicated at first glance, it is sometimes easier to evaluate


moments of a random variable by using the so-called ‘moment generating’ function.
The latter follows from the characteristic function x() of a r.v. which is by definition
the Fourier transform (apart from the sign convention) of its density function


 j x j x
 x ( j )  e  f x ( x ) dx  E 
 e 

If we replace j by s the resulting integral is the moment generating function

 
  .
 sx
 x (s)  e  f x ( x ) dx  E e sx



Differentiating n times with respect to s, we obtain


dn
n
 x ( s )   (xn ) ( s )  E  x n e sx 
ds

Hence
 (xn ) (0)  E  x n  .

September 7, 2022 Wolfgang Rave Slide 18


Right-tail prob. of Gaussian density (Q-fct.)

To compute error probabilites in hypothesis testing (detection problems) involving


Gaussian r.v. we often need to evaluate the (complementary) error function erfc(x) or,
equivalently the Q-function Q(x).
The area under the Gaussian density (try to identify these areas in the plot of the Gaus-
sian density in one of the previous slides) can be computed from the error function as
x 
2 2
 e dt  dt
t 2 2
t
erf( x)  erfc( x)  1  erf ( x)  e
 0  x

To understand the normalization of the Gaussian density it is useful to know that


 

 e dt    dt   / 2
t 2 2
t
or e
 0

A useful alternative of expressing the right-tail probability of a Gaussian r.v. is:



1

 t 2 /2
Q( x)  e dt
2 x
Examples of how to express a general Gaussian density in terms of Q(x) appear in the exercises and will
occur repeatedly in the lectures!
September 7, 2022 Wolfgang Rave Slide 19
Intuition on the Q-function

1

 t 2 /2
Expressing the right-tail probability of a Gaussian r.v.: Q( x)  e dt
2 x
• probabilities and associated areas
f(x) f(x) symmetry!
1a 1b
1-Q(a) = (0,) (0,)
Pr[X≤a] Q(a) = Pr[X >a] Q(-a) = Pr[X >-a]
= 1-Q(a)
a x x
-a
• influence of mean and variance
2a f(x) scaling 2b f(x) scaling
(0,2) (µ,2) & shift
a a 
Q  Q 
    
a x µ a x
• expressing specific areas (homework problem)

  (A,2/N)  A –   (0,2/N)
Ă  (A/2,2/4N)  A – Ă  (-A/2,2/4N)

September 7, 2022 Wolfgang Rave Slide 20


Probability density of a Gaussian random
vector
PDF of a scalar Gaussian random variable is characterized by mean µ and variance 2:
1  1 
exp   x     2  x       (  ,  2 )
T
f x ( x) 
2 2  2 
For the PDF of a Gaussian random vector the size p of the random vector and the covari-
ance matrix R need to be taken into account. For the r.v. x with Rx  E  x   x  x   x   we
T

have
1 1  1 
exp   x   x  Rx-1  x   x     (  x , Rx )
T
f x ( x) 
 2  det Rx  2 
p

Similarly as in the scalar case, the density is completely characterized by specifying


mean and covariance of the vector written in shorthand notation as x ~ (µx, Rx).

Note how the vector PDF generalizes the PDF of the scalar case:
• (in the argument of the exponent) the inverse of the variance of the scalar r.v. is
replaced by the inverse of the co-variance matrix in the vector case.
• The normalization factor requires the root of the determinant of the covariance
matrix instead of the root of the variance
September 7, 2022 Wolfgang Rave Slide 21
Probability density of a Gaussian random
vector
For the PDF of a Gaussian random vector the size p of the random vector and the
covariance matrix R need to be taken into account
1 1  1 
exp   x   x  Rx-1  x   x     (  x , Rx )
T
f x ( x) 
 2  det Rx  2 
p

Similarly as in the scalar case, the density is completely characterized by specifying


mean and covariance of the vector written in shorthand notation as x ~ (µx, Rx) .
Note how the vector PDF generalizes the PDF of the scalar case:
• the inverse of the variance of the scalar r.v. is replaced by the inverse of the covariance matrix
in the vector case.
• The normalization factor requires the root of the determinant of the covariance matrix instead
of the root of the variance
Important special case: if the covariance matrix Rx reduces to a scaled identity matrix Rx
= 2 I, which means that all components of the random vector are independent and have
the same variance. Then the PDF reduces to
1  1 p 2
f x ( x)  exp     x       (  ,  2
I)
 2   2 i 1
i i
2 p /2 2

September 7, 2022 Wolfgang Rave Slide 22
End of review

• Learning by doing helps to better understand the more abstract


ideas
• Return to this material (or a textbook) when you forgot some
specific details that becomes relevant to solve some given
problem (e.g. your homework problems  )

September 7, 2022 Wolfgang Rave Slide 23


Part I: Estimation Theory

1) Introduction to Estimation Problems

based on S. M. Kay: ‘Fundamentals of Statistical Signal Processing: Estimation Theory’


(Ch. 1)
September 7, 2022 Wolfgang Rave Slide 24
Estimation in Signal Processing

1.1 Introductory Examples for Estimation Problems


Estimation problems occur in many signal and communication systems and play a
central role in their successful and efficient operation. Examples of such systems and
associated applications are
• RADAR (radio detection and ranging)
• SONAR (sound navigation and ranging)
• Speech recognition
• Image analysis
• Biomedical analysis
• Communication systems
• Control
• Seismology

where all applications share the common problem that some or several parameter values
need to be estimated given a set of observations.

September 7, 2022 Wolfgang Rave Slide 25


Motivational Example 1: RADAR System

Airport
Surveillance
RADAR

Radar process-
sing system

Transmit pulse

Time
Transmitted
and received
wave forms Received waveform
Range estimate:
R
v  Rˆ  2ˆ0 v Time
2 0
0
September 7, 2022 Wolfgang Rave Slide 26
Motivational Example 2: SONAR System

Passive
SONAR Sea surface

Towed array
Noisy target
Sea bottom

Sensor 1 output
Time
Received signals
at array sensor
source Sensor 2 output

y bearing angle estimate 0 Time


v m   md cos 
v
 cos ˆ   ˆm Sensor 3 output
 md
Time
x 0
d 2d
September 7, 2022 Wolfgang Rave Slide 27
Mathematical Model of an Estimation Problem

1.2 Mathematical Model of an Estimation Problem


In determining a good estimator the first step is to choose a mathematical model that
relates the observed data y[n], n = 0, 1,…, N–1 to the parameter(s) x to be estimated.
Because the data are inherently random (e.g. due to noise), we describe it by its Proba-
bility Density Function (PDF) or f(y[0], y[1],… y[N–1]; x).

The PDF is parametrized by the unknown parameter x, i.e., we have a class of functions
where each one is different due to a different value of x. We will use a semicolon to
denote this dependence or parametrization.
As an example, if N =1, and x denotes the mean, the PDF of the data might be
1  1 2
f ( y[0]; x)  exp  2  y[0]  x   , or y   ( x, 2 )
2 2  2 
while for N observations the Gaussian vector PDF occurs that is shown on the next slide.

 The first step is to find/choose the PDF of data as a function of x: fy;x(y; x)

September 7, 2022 Wolfgang Rave Slide 28


Reference Estimation Problem

‚Running example‘: It will be reconsidered under different assumptions such as minimum vari-
ance unbiased estimation, maximum likelihood estimation, best linear unbiased estimator etc.)
Data model:
y[n] = A + w[n], n = 0, 1,…,N–1; where w[n] ~ (0, 2)
Probability density function (PDF) of the normalized observations vector y = (1/N)  y[n], i.e. the
sample mean will be Gaussian, because we assume the noise to be Gaussian:
1  1 N 2 f(y;A)
f y ( y)  exp  2   y[n]  A  
 2 
2 N /2
 2 i 1 
y
y   ( A1,  2 I ) A
It will turn out that the optimal estimator in the minimum mean-squared error (least mean squa-
res) sense is given by the sample mean:
1 N 1
A   y[n]
ˆ is linear in y[ n ]  BLUE  MVUE  ML
N n 0
• Note, that the optimality criterion is crucial, as it determines what is optimal.
• Note as well, that in the above case the optimal estimator is linear (which is a lucky coincidence)
 The second step is to find the estimator according to some optimal criterion
September 7, 2022 Wolfgang Rave Slide 29
Assessing Estimator Performance

1.3 Performance of an Estimator


Considering a typical data set for our standard example (with A = 1 and N = 100) we may
intuitively choose two different possible estimators.
1.3

The sample mean, given by 1.2


N 1
1
Aˆ   y[n]  0.993
1.1

N n 0 1

and the first sample of out data set 0.9


 0.8
A  y[0]  0.995.
0.7
0 10 20 30 40 50 60 70 80 90 100
sample index n
Although the sample mean estimator has better performance over repeated trials due its
averaging property over the noise, in this case it turns out that the estimate using a
single sample produces a more accurate estimate.
 What matters is the performance on average that occurs for many repeated appli-
cations of an estimator!
September 7, 2022 Wolfgang Rave Slide 30
Assessing Estimator Performance

1.3 Performance of an Estimator …


sample mean  use of a single observation
1 N 1 
A   y[n]  0.993
ˆ  A  y[0]  0.995
N n 0

Important points to keep in mind:


 An estimator is a random variable. Its performance can only be described statisti-
cally, i.e. by its PDF.
 Computer simulations can never assess the performance of an estimator conclusively.
At best, a certain degree of accuracy is achieved, at worst for an insufficient number
of trials misleading results are obtained
 A tradeoff between performance and computational complexity occurs for every esti-
mator, e.g. if optimal estimators using non-linear functions of the observations need
to be used. Thus, the class of (best) linear estimators is of practical interest.

September 7, 2022 Wolfgang Rave Slide 31

You might also like