1 Introduction

Lecture “Fundamentals of Estimation
and Detection”
Part I: Estimation Theory

Part II: Detection Theory
Note: Because detection or decision problems almost always require the estimate of some quantity
on which the detection step is based upon, we start with estimation theory in this course before
we turn to detection theory.
September 7, 2022 Wolfgang Rave Slide 1
Organisational Issues
Time and location (regular teaching in lecture hall; online, in case Corona gets worse)
• Lecture hall BAR 106: Wed, 16:40 – 18:10, Wolfgang Rave
• Homework Session: Thu, 16:40 – 18:10, R. Rojbi, I. Bizon Franco de Almeida,
A. Rezaei
• Questions during online lectures: collect and ask at the end of the lecture or when
I ask for questions (better ideas are welcome; in the lecture hall raise your hand)
Material on OPAL – web page
• lecture slides
• homework problems
• dates & links for GoToMeeting
• subscribe to learning group 2022/23 ⇒access to lecture material
Credits for module ‚Vertiefung Mobile Nachrichtensysteme‘ …

• written exam for 2 of 3 lectures (Machine Learning, Fundamentals of Estima-
tion and Detection, Algorithmen für Mehrantennensysteme; EE students)
• written exam for ‚Fundamentals of Estimation and Detection‘ (NES students)
Literature to the Course
The course is based on two standard text books by Steven M. Kay:

‘Fundamentals of Statistical Signal Processing: Estimation Theory’
‘Fundamentals of Statistical Signal Processing: Detection Theory’

x-y Notation in Estimation Theory
Remarks on notation:
1) While S. Kay uses the -x notation, we use in this course the so-called x-y notation
which is customary in the literature. Here x denotes the parameter (or parameter
vector) to be estimated while y represents the observation(s) or data that are avail-
able to estimate the parameter(s) of interest. Initially, we will start by estimating a
scalar parameter from one or several observations. Then we will generalize our
consideration to estimating a vector parameter x.
2) We also use boldface notation for random variables such as the noise w and the
observations y. In contrast, constant or deterministic quantities are written in
normal font. Modelling the parameter x to be estimated as a constant which we will
do most of the time during this course (so-called ‘classical’ estimation), it is there-
fore written in normal font. Note that this is a modelling assumption. If x is assumed
to be random, the corresponding approach is called ‘Bayesian’ estimation (because
we assume some prior probability density function, PDF, for x).
3) Thus, in equations relating random variables to each other we use bold face. However, given
realizations that are again deterministic, are written in normal font. This occurs e.g. for
density functions where we write the argument in normal font, such as fx(x).
Lecture Overview (Content)
1) Part I (Estimation Theory): Minimum Variance Unbiased Estimators (MVUE)

and the Cramer-Rao Lower Bound (CRLB)
 Unbiased estimators and minimum variance criterion
 Theorem on Cramer-Rao Lower Bound (smallest achievable MSE)
 Transformation of Parameters
 Signal Processing Example(s)
2) Linear Models ‘Classical’ estimation
 Practically important, achieves CRLB (parameter treated as
 Example(s) a deterministic
3) General Minimum Variance Unbiased Estimation variable)
 Sufficient statistics
 Neyman-Fisher factorization theorem (to find the MVUE)
 Rao-Blackwell-Lehman-Scheffe theorem
4) Best Linear Unbiased Estimators (BLUE)
 Definition and Determination of the BLUE
 Example(s)
5) Maximum Likelihood Estimation (MLE)
 Finding the MLE
 Properties and numerical determination
 Examples
6) …
Lecture Overview (Content)
6) The ‘Bayesian’ Approach: General Bayesian Estimators

 Prior knowledge and estimation, choosing a prior PDF
 properties of the Gaussian PDF ‘Bayesian’ estimation
 maximum a-posteriori estimators (parameter treated as
7) Linear Bayesian Estimators a random variable)
Part II (Detection Theory):

8) Decision Theory for Binary Hypothesis Testing
 Neyman-Pearson theorem
 Receiver Operating Characteristics (ROC)
 Minimum Probability of Error
9) Deterministic Signals
 (Generalized) Matched Filter
10) Random Signals
 Estimator-Correlator

Important Background Material
Review on Basic Concepts from

Probabilty Theory
• Random Variables and Random Vectors

• Distribution and Density Functions

Probability of an Event
Classical definition of Probability (Simon Laplace, 18th century)

• The probability of an event A is defined is defined a-priori without actual experimen-
tation (provided all the outcomes are equally likely)
Number of favourable outcomes to A
P( A) 
Total number of possible outcomes
• Relative frequency definition given n experiments (trial)
n
P( A)  lim A
n n
• More rigorous abstract definitions based on set theory can be found in textbooks. As
a standard reference on probability and random variables we recommend the book:
A. Papoulis and S.U. Pillai, ‘Probability, random variables and stochastic processes’,
McGraw Hill (4th edition), 2002

Concept of a Random Variable
Random variable  function (‘rule’), that associates a number to every

event within an event space
event space  random variable g() real number
(e.g. face i of die) (e.g. g(i) = 10i )
1 x1 = g(1)
2 x2 = g(2)
…
…
xM = g(M)
M
• Example 1 (rolling a die): for the set of events  = {1, 2, 3, 4, 5, 6} the random
variable (‘rule’) g(i) = 10i maps the events i to the real numbers xi ={10, …, 60}
Distribution Function of a r.v.
The elements of the set  = {1, 2, 3, 4, 5, 6} contained in the event {x ≤ x} change as x

is varied. The probability Pr{x ≤ x} of the event {x ≤ x} is therefore a number that de-
pends on x.
This number is denoted as Fx(x) and called the (cumulative) distribution function (CDF)
of the (generic) random variable x:
Fx(x) = Pr{x ≤ x}
Example 2: For a fair die with Pr{x(i)} = 1/6, the discrete-valued distribution function
is given by the graph
Fx(x)
1
5/6
4/6
3/6
2/6
1/6
0 x
0 10 20 30 40 50 60
Distribution Function of a r.v.
In particular we find:
Fx(35) = 1/2
Fx(30.1) = 1/2
Fx(30) = 1/2
Fx(29.9) = 1/3
Example 3: If a telephone call arrives uniformly at random within the interval x = [0, T]
the distribution function is continuous-valued and has the following form
Fx(x)
1
0 x
0 T
Density Function of a r.v.
Interpretation of the distribution function:

Pr  a  x  b   Fx (b)  Fx ( a) with Fx (0)  0 and Fx ()  1
The derivative of the distribution function Fx(x) is called the probability density func-
tion (PDF) fx(x) of the r.v. x. Thus we have
d
f x ( x)  Fx ( x)
dx
and the above probability can be evaluated also from
b
Pr  a  x  b    f x ( x)dx  Fx (b)  Fx (a ).
a
Because Fx(x) is monotonously increasing the derivative has the property fx(x) ≥ 0.
For discrete-valued r.v. the density is a weighted sum of Dirac pulses
f x ( x)   pi ( x  xi )
i
and is usually called probability mass function (PMF).

Density Function of a r.v.
Example 4: The densities of the examples 2 and 3 are

fx(x) fx(x)
1 1
5/6
4/6
3/6
2/6
1/6 1/T
0 x 0 x
0 10 20 30 40 50 60 0 T
Note that for discrete probability densities one also uses the term probability mass function (PMF).
Interpretation of probabilities as a definite integral of the density function (PDF) and

normalization constraint:
b 
Pr  a  x  b   f x ( x)dx with  f x ( x)dx  1
a 
So that the distribution function corresponds to the integral …
Joint Densities
The dependency between two real-valued random variables {x, y} is characterized by

their joint probability density function. Let fx,y(x, y) denote the joint (continuous) PDF of
x and y; this function allows us to evaluate probabilites of the form
d b
P(a  x  b, c  y  d )    f x , y ( x, y ) dxdy
c a
namely, the probability that x and y assume jointly values within the intervals [a, b] and
[c, d], respectively. Let also fx|y(x|y) denote the conditional PDF of x given y; this func-
tion allows to evaluate the probabilites of events of the form
b
P(a  x  b, y  y )   f x| y ( x | y ) dx
a
namely, the probability that x assumes values within the interval [a, b] given that y is
fixed at the value y.It is well-known that the joint and conditional PDFs of two random
variables are related via Bayes´s rule which states that
f x , y ( x, y )  f y ( y ) f x| y ( x | y )  f x ( x) f y| x ( y | x) ‚conservation law for probability‘
Note: this essentially allows to express one density in terms of the others; e.g. if we know fx|y(x|y)
f ( x) f ( y | x)
and fy(y) , we can evaluate fx,y(x, y) and so on ... f ( x | y)  x| y
f ( y)
x y| x
y
Dependent Random Variables / Correlation
The variables{x, y} are said to be independent, iff

f x| y ( x | y )  f x ( x) and f y|x ( y | x)  f y ( y )
in which case the pdfs of x and y are not modified by conditioning on y and x, resp. .
Otherwise, the variables are said to be dependent. In particular, when the variables are
independent, it follows that E[xy] = E[x] E[y]. It also follows that independent random
variables are uncorrelated, meaning that their cross-correlation is zero as can be verified
from the definition of cross-correlation (or cross-covariance): 2
 x  E  x -  x  x -  x  
 xy2  E  x -  x   y -  y   (‚centered‘ r.v.)
corr. coeff.:
 E  xy    x  y  E  x  E  y    x  y  0
 2
 xy  xy
; 1   xy  1
 x y
The converse statement is not true: uncorrelated random variables can be dependent.
Consider the example x = sin, y = cos for uniformly distributed [0,2]. Then we
have x2 + y2 = 1 so that x and y are dependent.
However: E[xy] = E[sincos]= ½ E[sin2] = 0; this means that x and y are uncorrelated.
Normal (Gaussian) density for scalar random variables
A r. v. is normally or Gaussian distributed with parameters µ and 2, if its density func-
tion is given by 1   x2 
 (0,1)  exp  
2  2 
1   x   2 
f x ( x)  exp   Gaussian density
2 2
 2 2


This density will be our standard model for ‘noise’ (additive white Gaussian noise,
AWGN). The distribution function is given by
  y    
2
 x 
x
1
Fx ( x)    2 2 
2 2 
exp  dy  Q 
  

 
where we used the so-called ‘normal integral’ or ‘Q-function’
1
x
 y2 
Q( x) 
2  exp  2 dy
Normal scalar random variable / Moments
The ‘normal integral’ follows from integrating a zero-mean Gaussian r.v. with unit vari-
ance. It is tabulated in many mathematical textbooks. Often we use x ~  (  , 2 ) as the
notation for a Gaussian r.v. with mean µ and variance 2.
Moments of a random variable

The moment of order n of a random variable x with support [-∞, ∞]is defined as

E  x   
n
x n f x ( x)dx


Special cases are E  x   xf x ( x) dx first moment  mean / expectation of x


E  x   x
2 2
f x ( x) dx second moment

from which the variance of the random variable can be computed as

 2  E  x 2    E  x 
2
Note: The variance equals the 2nd central moment or the 2nd moment of the centered r.v.. For the
generic Gaussian r.v. from above we find E[x] = µ and E[(x – µ)2] = 2.
Characteristic and moment generating fct.
Although it might appear complicated at first glance, it is sometimes easier to evaluate

moments of a random variable by using the so-called ‘moment generating’ function.
The latter follows from the characteristic function x() of a r.v. which is by definition
the Fourier transform (apart from the sign convention) of its density function


 j x j x
 x ( j )  e  f x ( x ) dx  E 
 e 

If we replace j by s the resulting integral is the moment generating function

 
  .
 sx
 x (s)  e  f x ( x ) dx  E e sx

Differentiating n times with respect to s, we obtain

dn
n
 x ( s )   (xn ) ( s )  E  x n e sx 
ds
Hence
 (xn ) (0)  E  x n  .

Right-tail prob. of Gaussian density (Q-fct.)
To compute error probabilites in hypothesis testing (detection problems) involving

Gaussian r.v. we often need to evaluate the (complementary) error function erfc(x) or,
equivalently the Q-function Q(x).
The area under the Gaussian density (try to identify these areas in the plot of the Gaus-
sian density in one of the previous slides) can be computed from the error function as
x 
2 2
 e dt  dt
t 2 2
t
erf( x)  erfc( x)  1  erf ( x)  e
 0  x
To understand the normalization of the Gaussian density it is useful to know that

 
 e dt    dt   / 2
t 2 2
t
or e
 0
A useful alternative of expressing the right-tail probability of a Gaussian r.v. is:


1

 t 2 /2
Q( x)  e dt
2 x
Examples of how to express a general Gaussian density in terms of Q(x) appear in the exercises and will
occur repeatedly in the lectures!
Intuition on the Q-function

1

 t 2 /2
Expressing the right-tail probability of a Gaussian r.v.: Q( x)  e dt
2 x
• probabilities and associated areas
f(x) f(x) symmetry!
1a 1b
1-Q(a) = (0,) (0,)
Pr[X≤a] Q(a) = Pr[X >a] Q(-a) = Pr[X >-a]
= 1-Q(a)
a x x
-a
• influence of mean and variance
2a f(x) scaling 2b f(x) scaling
(0,2) (µ,2) & shift
a a 
Q  Q 
    
a x µ a x
• expressing specific areas (homework problem)
Â  (A,2/N)  A – Â  (0,2/N)
Ă  (A/2,2/4N)  A – Ă  (-A/2,2/4N)

Probability density of a Gaussian random
vector
PDF of a scalar Gaussian random variable is characterized by mean µ and variance 2:
1  1 
exp   x     2  x       (  ,  2 )
T
f x ( x) 
2 2  2 
For the PDF of a Gaussian random vector the size p of the random vector and the covari-
ance matrix R need to be taken into account. For the r.v. x with Rx  E  x   x  x   x   we
T
have
1 1  1 
exp   x   x  Rx-1  x   x     (  x , Rx )
T
f x ( x) 
 2  det Rx  2 
p
Similarly as in the scalar case, the density is completely characterized by specifying

mean and covariance of the vector written in shorthand notation as x ~ (µx, Rx).
Note how the vector PDF generalizes the PDF of the scalar case:
• (in the argument of the exponent) the inverse of the variance of the scalar r.v. is
replaced by the inverse of the co-variance matrix in the vector case.
• The normalization factor requires the root of the determinant of the covariance
matrix instead of the root of the variance
Probability density of a Gaussian random
vector
For the PDF of a Gaussian random vector the size p of the random vector and the
covariance matrix R need to be taken into account
1 1  1 
exp   x   x  Rx-1  x   x     (  x , Rx )
T
f x ( x) 
 2  det Rx  2 
p
Similarly as in the scalar case, the density is completely characterized by specifying

mean and covariance of the vector written in shorthand notation as x ~ (µx, Rx) .
Note how the vector PDF generalizes the PDF of the scalar case:
• the inverse of the variance of the scalar r.v. is replaced by the inverse of the covariance matrix
in the vector case.
• The normalization factor requires the root of the determinant of the covariance matrix instead
of the root of the variance
Important special case: if the covariance matrix Rx reduces to a scaled identity matrix Rx
= 2 I, which means that all components of the random vector are independent and have
the same variance. Then the PDF reduces to
1  1 p 2
f x ( x)  exp     x       (  ,  2
I)
 2   2 i 1
i i
2 p /2 2

End of review
• Learning by doing helps to better understand the more abstract

ideas
• Return to this material (or a textbook) when you forgot some
specific details that becomes relevant to solve some given
problem (e.g. your homework problems  )

Part I: Estimation Theory
1) Introduction to Estimation Problems
based on S. M. Kay: ‘Fundamentals of Statistical Signal Processing: Estimation Theory’

(Ch. 1)
Estimation in Signal Processing
1.1 Introductory Examples for Estimation Problems

Estimation problems occur in many signal and communication systems and play a
central role in their successful and efficient operation. Examples of such systems and
associated applications are
• RADAR (radio detection and ranging)
• SONAR (sound navigation and ranging)
• Speech recognition
• Image analysis
• Biomedical analysis
• Communication systems
• Control
• Seismology
where all applications share the common problem that some or several parameter values
need to be estimated given a set of observations.

Motivational Example 1: RADAR System
Airport
Surveillance
RADAR
Radar process-
sing system
Transmit pulse
Time
Transmitted
and received
wave forms Received waveform
Range estimate:
R
v  Rˆ  2ˆ0 v Time
2 0
0
Motivational Example 2: SONAR System
Passive
SONAR Sea surface
Towed array
Noisy target
Sea bottom
Sensor 1 output
Time
Received signals
at array sensor
source Sensor 2 output
y bearing angle estimate 0 Time

v m   md cos 
v
 cos ˆ   ˆm Sensor 3 output
 md
Time
x 0
d 2d
Mathematical Model of an Estimation Problem
1.2 Mathematical Model of an Estimation Problem

In determining a good estimator the first step is to choose a mathematical model that
relates the observed data y[n], n = 0, 1,…, N–1 to the parameter(s) x to be estimated.
Because the data are inherently random (e.g. due to noise), we describe it by its Proba-
bility Density Function (PDF) or f(y[0], y[1],… y[N–1]; x).
The PDF is parametrized by the unknown parameter x, i.e., we have a class of functions
where each one is different due to a different value of x. We will use a semicolon to
denote this dependence or parametrization.
As an example, if N =1, and x denotes the mean, the PDF of the data might be
1  1 2
f ( y[0]; x)  exp  2  y[0]  x   , or y   ( x, 2 )
2 2  2 
while for N observations the Gaussian vector PDF occurs that is shown on the next slide.
 The first step is to find/choose the PDF of data as a function of x: fy;x(y; x)

Reference Estimation Problem
‚Running example‘: It will be reconsidered under different assumptions such as minimum vari-
ance unbiased estimation, maximum likelihood estimation, best linear unbiased estimator etc.)
Data model:
y[n] = A + w[n], n = 0, 1,…,N–1; where w[n] ~ (0, 2)
Probability density function (PDF) of the normalized observations vector y = (1/N)  y[n], i.e. the
sample mean will be Gaussian, because we assume the noise to be Gaussian:
1  1 N 2 f(y;A)
f y ( y)  exp  2   y[n]  A  
 2 
2 N /2
 2 i 1 
y
y   ( A1,  2 I ) A
It will turn out that the optimal estimator in the minimum mean-squared error (least mean squa-
res) sense is given by the sample mean:
1 N 1
A   y[n]
ˆ is linear in y[ n ]  BLUE  MVUE  ML
N n 0
• Note, that the optimality criterion is crucial, as it determines what is optimal.
• Note as well, that in the above case the optimal estimator is linear (which is a lucky coincidence)
 The second step is to find the estimator according to some optimal criterion
Assessing Estimator Performance
1.3 Performance of an Estimator

Considering a typical data set for our standard example (with A = 1 and N = 100) we may
intuitively choose two different possible estimators.
1.3
The sample mean, given by 1.2

N 1
1
Aˆ   y[n]  0.993
1.1
N n 0 1
and the first sample of out data set 0.9

 0.8
A  y[0]  0.995.
0.7
0 10 20 30 40 50 60 70 80 90 100
sample index n
Although the sample mean estimator has better performance over repeated trials due its
averaging property over the noise, in this case it turns out that the estimate using a
single sample produces a more accurate estimate.
 What matters is the performance on average that occurs for many repeated appli-
cations of an estimator!
Assessing Estimator Performance
1.3 Performance of an Estimator …

sample mean  use of a single observation
1 N 1 
A   y[n]  0.993
ˆ  A  y[0]  0.995
N n 0
Important points to keep in mind:

 An estimator is a random variable. Its performance can only be described statisti-
cally, i.e. by its PDF.
 Computer simulations can never assess the performance of an estimator conclusively.
At best, a certain degree of accuracy is achieved, at worst for an insufficient number
of trials misleading results are obtained
 A tradeoff between performance and computational complexity occurs for every esti-
mator, e.g. if optimal estimators using non-linear functions of the observations need
to be used. Thus, the class of (best) linear estimators is of practical interest.

1 Introduction

Uploaded by

Copyright:

Available Formats

You might also like

1 Introduction

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1 Introduction

Uploaded by

Copyright:

Available Formats

Lecture “Fundamentals of Estimation

Part I: Estimation Theory

Credits for module ‚Vertiefung Mobile Nachrichtensysteme‘ …

The course is based on two standard text books by Steven M. Kay:

September 7, 2022 Wolfgang Rave Slide 3

1) Part I (Estimation Theory): Minimum Variance Unbiased Estimators (MVUE)

6) The ‘Bayesian’ Approach: General Bayesian Estimators

Part II (Detection Theory):

September 7, 2022 Wolfgang Rave Slide 6

Review on Basic Concepts from

• Random Variables and Random Vectors

September 7, 2022 Wolfgang Rave Slide 7

Classical definition of Probability (Simon Laplace, 18th century)

September 7, 2022 Wolfgang Rave Slide 8

Random variable  function (‘rule’), that associates a number to every

The elements of the set  = {1, 2, 3, 4, 5, 6} contained in the event {x ≤ x} change as x

Interpretation of the distribution function:

and is usually called probability mass function (PMF).

Example 4: The densities of the examples 2 and 3 are

Interpretation of probabilities as a definite integral of the density function (PDF) and

The dependency between two real-valued random variables {x, y} is characterized by

The variables{x, y} are said to be independent, iff

Moments of a random variable

from which the variance of the random variable can be computed as

Although it might appear complicated at first glance, it is sometimes easier to evaluate

Differentiating n times with respect to s, we obtain

September 7, 2022 Wolfgang Rave Slide 18

To compute error probabilites in hypothesis testing (detection problems) involving

To understand the normalization of the Gaussian density it is useful to know that

A useful alternative of expressing the right-tail probability of a Gaussian r.v. is:

September 7, 2022 Wolfgang Rave Slide 20

Similarly as in the scalar case, the density is completely characterized by specifying

Similarly as in the scalar case, the density is completely characterized by specifying

• Learning by doing helps to better understand the more abstract

September 7, 2022 Wolfgang Rave Slide 23

1) Introduction to Estimation Problems

based on S. M. Kay: ‘Fundamentals of Statistical Signal Processing: Estimation Theory’

1.1 Introductory Examples for Estimation Problems

September 7, 2022 Wolfgang Rave Slide 25

y bearing angle estimate 0 Time

1.2 Mathematical Model of an Estimation Problem

 The first step is to find/choose the PDF of data as a function of x: fy;x(y; x)

September 7, 2022 Wolfgang Rave Slide 28

1.3 Performance of an Estimator

The sample mean, given by 1.2

and the first sample of out data set 0.9

1.3 Performance of an Estimator …

Important points to keep in mind:

September 7, 2022 Wolfgang Rave Slide 31

You might also like