Professional Documents
Culture Documents
1 Introduction
1 Introduction
1 Introduction
and Detection”
Note: Because detection or decision problems almost always require the estimate of some quantity
on which the detection step is based upon, we start with estimation theory in this course before
we turn to detection theory.
September 7, 2022 Wolfgang Rave Slide 1
Organisational Issues
Time and location (regular teaching in lecture hall; online, in case Corona gets worse)
• Lecture hall BAR 106: Wed, 16:40 – 18:10, Wolfgang Rave
• Homework Session: Thu, 16:40 – 18:10, R. Rojbi, I. Bizon Franco de Almeida,
A. Rezaei
• Questions during online lectures: collect and ask at the end of the lecture or when
I ask for questions (better ideas are welcome; in the lecture hall raise your hand)
Material on OPAL – web page
• lecture slides
• homework problems
• dates & links for GoToMeeting
• subscribe to learning group 2022/23 ⇒access to lecture material
Remarks on notation:
1) While S. Kay uses the -x notation, we use in this course the so-called x-y notation
which is customary in the literature. Here x denotes the parameter (or parameter
vector) to be estimated while y represents the observation(s) or data that are avail-
able to estimate the parameter(s) of interest. Initially, we will start by estimating a
scalar parameter from one or several observations. Then we will generalize our
consideration to estimating a vector parameter x.
2) We also use boldface notation for random variables such as the noise w and the
observations y. In contrast, constant or deterministic quantities are written in
normal font. Modelling the parameter x to be estimated as a constant which we will
do most of the time during this course (so-called ‘classical’ estimation), it is there-
fore written in normal font. Note that this is a modelling assumption. If x is assumed
to be random, the corresponding approach is called ‘Bayesian’ estimation (because
we assume some prior probability density function, PDF, for x).
3) Thus, in equations relating random variables to each other we use bold face. However, given
realizations that are again deterministic, are written in normal font. This occurs e.g. for
density functions where we write the argument in normal font, such as fx(x).
September 7, 2022 Wolfgang Rave Slide 4
Lecture Overview (Content)
• More rigorous abstract definitions based on set theory can be found in textbooks. As
a standard reference on probability and random variables we recommend the book:
A. Papoulis and S.U. Pillai, ‘Probability, random variables and stochastic processes’,
McGraw Hill (4th edition), 2002
1 x1 = g(1)
2 x2 = g(2)
…
…
xM = g(M)
M
• Example 1 (rolling a die): for the set of events = {1, 2, 3, 4, 5, 6} the random
variable (‘rule’) g(i) = 10i maps the events i to the real numbers xi ={10, …, 60}
September 7, 2022 Wolfgang Rave Slide 9
Distribution Function of a r.v.
Example 2: For a fair die with Pr{x(i)} = 1/6, the discrete-valued distribution function
is given by the graph
Fx(x)
1
5/6
4/6
3/6
2/6
1/6
0 x
0 10 20 30 40 50 60
September 7, 2022 Wolfgang Rave Slide 10
Distribution Function of a r.v.
In particular we find:
Fx(35) = 1/2
Fx(30.1) = 1/2
Fx(30) = 1/2
Fx(29.9) = 1/3
Example 3: If a telephone call arrives uniformly at random within the interval x = [0, T]
the distribution function is continuous-valued and has the following form
Fx(x)
1
0 x
0 T
September 7, 2022 Wolfgang Rave Slide 11
Density Function of a r.v.
The derivative of the distribution function Fx(x) is called the probability density func-
tion (PDF) fx(x) of the r.v. x. Thus we have
d
f x ( x) Fx ( x)
dx
and the above probability can be evaluated also from
b
Pr a x b f x ( x)dx Fx (b) Fx (a ).
a
Because Fx(x) is monotonously increasing the derivative has the property fx(x) ≥ 0.
For discrete-valued r.v. the density is a weighted sum of Dirac pulses
f x ( x) pi ( x xi )
i
Note that for discrete probability densities one also uses the term probability mass function (PMF).
Note: this essentially allows to express one density in terms of the others; e.g. if we know fx|y(x|y)
f ( x) f ( y | x)
and fy(y) , we can evaluate fx,y(x, y) and so on ... f ( x | y) x| y
f ( y)
x y| x
y
September 7, 2022 Wolfgang Rave Slide 14
Dependent Random Variables / Correlation
in which case the pdfs of x and y are not modified by conditioning on y and x, resp. .
Otherwise, the variables are said to be dependent. In particular, when the variables are
independent, it follows that E[xy] = E[x] E[y]. It also follows that independent random
variables are uncorrelated, meaning that their cross-correlation is zero as can be verified
from the definition of cross-correlation (or cross-covariance): 2
x E x - x x - x
xy2 E x - x y - y (‚centered‘ r.v.)
corr. coeff.:
E xy x y E x E y x y 0
2
xy xy
; 1 xy 1
x y
The converse statement is not true: uncorrelated random variables can be dependent.
Consider the example x = sin, y = cos for uniformly distributed [0,2]. Then we
have x2 + y2 = 1 so that x and y are dependent.
However: E[xy] = E[sincos]= ½ E[sin2] = 0; this means that x and y are uncorrelated.
September 7, 2022 Wolfgang Rave Slide 15
Normal (Gaussian) density for scalar random variables
A r. v. is normally or Gaussian distributed with parameters µ and 2, if its density func-
tion is given by 1 x2
(0,1) exp
2 2
1 x 2
f x ( x) exp Gaussian density
2 2
2 2
This density will be our standard model for ‘noise’ (additive white Gaussian noise,
AWGN). The distribution function is given by
y
2
x
x
1
Fx ( x) 2 2
2 2
exp dy Q
where we used the so-called ‘normal integral’ or ‘Q-function’
1
x
y2
Q( x)
2 exp 2 dy
September 7, 2022 Wolfgang Rave Slide 16
Normal scalar random variable / Moments
The ‘normal integral’ follows from integrating a zero-mean Gaussian r.v. with unit vari-
ance. It is tabulated in many mathematical textbooks. Often we use x ~ ( , 2 ) as the
notation for a Gaussian r.v. with mean µ and variance 2.
Note: The variance equals the 2nd central moment or the 2nd moment of the centered r.v.. For the
generic Gaussian r.v. from above we find E[x] = µ and E[(x – µ)2] = 2.
September 7, 2022 Wolfgang Rave Slide 17
Characteristic and moment generating fct.
j x j x
x ( j ) e f x ( x ) dx E
e
If we replace j by s the resulting integral is the moment generating function
.
sx
x (s) e f x ( x ) dx E e sx
Hence
(xn ) (0) E x n .
e dt dt / 2
t 2 2
t
or e
0
 (A,2/N) A –  (0,2/N)
Ă (A/2,2/4N) A – Ă (-A/2,2/4N)
have
1 1 1
exp x x Rx-1 x x ( x , Rx )
T
f x ( x)
2 det Rx 2
p
Note how the vector PDF generalizes the PDF of the scalar case:
• (in the argument of the exponent) the inverse of the variance of the scalar r.v. is
replaced by the inverse of the co-variance matrix in the vector case.
• The normalization factor requires the root of the determinant of the covariance
matrix instead of the root of the variance
September 7, 2022 Wolfgang Rave Slide 21
Probability density of a Gaussian random
vector
For the PDF of a Gaussian random vector the size p of the random vector and the
covariance matrix R need to be taken into account
1 1 1
exp x x Rx-1 x x ( x , Rx )
T
f x ( x)
2 det Rx 2
p
where all applications share the common problem that some or several parameter values
need to be estimated given a set of observations.
Airport
Surveillance
RADAR
Radar process-
sing system
Transmit pulse
Time
Transmitted
and received
wave forms Received waveform
Range estimate:
R
v Rˆ 2ˆ0 v Time
2 0
0
September 7, 2022 Wolfgang Rave Slide 26
Motivational Example 2: SONAR System
Passive
SONAR Sea surface
Towed array
Noisy target
Sea bottom
Sensor 1 output
Time
Received signals
at array sensor
source Sensor 2 output
The PDF is parametrized by the unknown parameter x, i.e., we have a class of functions
where each one is different due to a different value of x. We will use a semicolon to
denote this dependence or parametrization.
As an example, if N =1, and x denotes the mean, the PDF of the data might be
1 1 2
f ( y[0]; x) exp 2 y[0] x , or y ( x, 2 )
2 2 2
while for N observations the Gaussian vector PDF occurs that is shown on the next slide.
‚Running example‘: It will be reconsidered under different assumptions such as minimum vari-
ance unbiased estimation, maximum likelihood estimation, best linear unbiased estimator etc.)
Data model:
y[n] = A + w[n], n = 0, 1,…,N–1; where w[n] ~ (0, 2)
Probability density function (PDF) of the normalized observations vector y = (1/N) y[n], i.e. the
sample mean will be Gaussian, because we assume the noise to be Gaussian:
1 1 N 2 f(y;A)
f y ( y) exp 2 y[n] A
2
2 N /2
2 i 1
y
y ( A1, 2 I ) A
It will turn out that the optimal estimator in the minimum mean-squared error (least mean squa-
res) sense is given by the sample mean:
1 N 1
A y[n]
ˆ is linear in y[ n ] BLUE MVUE ML
N n 0
• Note, that the optimality criterion is crucial, as it determines what is optimal.
• Note as well, that in the above case the optimal estimator is linear (which is a lucky coincidence)
The second step is to find the estimator according to some optimal criterion
September 7, 2022 Wolfgang Rave Slide 29
Assessing Estimator Performance
N n 0 1