Professional Documents
Culture Documents
Week 5 Annotated
Week 5 Annotated
FY (y ) =FZ ( y ) FZ ( y ) = 2 FZ ( y ) 1;
E[Y ] =E Z 2 = 1;
2
Var (Y ) =E Y 2 (E [Y ])2 = E Z 4 E Z 2
= 3 1 = 2.
802/827
= Pr ( y Z y )
Z y
1 2
1
e 2 z dz
=
2
y
Z y
1 2
1
e 2 z dz
= 2
2
Z0 y
1
1
1
w 1/2 e 2 w dw .
= 2
2 2
0
Proof (cont.).
Z
FY (y ) =
0
1
1
w 1/2 e 2 w dw .
2
804/827
Rb
a
f (x)dx
b
= f (b).
E[X ] = E
Yi = n E [Yi ] = n
i=1
n
X
Var (X ) = Var
!
Yi
= n Var (Yi ) = 2 n
i=1
MX (t) =
805/827
MPn
i=1
Yi (t)
t < 1/2.
n x n1 e x
,
(n)
Chi-squared probability/cumulative
density function
2
2
p.d.f.
c.d.f.
1
n=1
n=2
n=3
n=5
n=10
n=25
0.5
0.4
0.9
0.8
0.7
FX(x)
fX(x)
0.6
0.3
0.5
0.4
0.2
0.3
0.2
0.1
0.1
0
0
807/827
10
20
x
30
0
0
10
20
x
30
r
P
Zi2 , where
k=1
809/827
Proof:
Note p.d.f.s:
fV (v ) =
fZ (z) =
v r /21
e v /2 ,
2r /2 (r /2)
1 2
1 e 2 z ,
2
if 0 v < ;
if < z < .
and
z
.
t = g2 (z, v ) = p
v /r
h1 (s, t) h1 (s, t)
s
t
J (s, t) = det
h (s, t) h (s, t)
2
2
s
t
1
0
p
= s /r
= det
p
1
1/2 /r
s /r
2 t s
and
and
< z < ;
< t < .
s r /21 e s/2
(r /2) 2r /2
2
1
1
s
t2
(r +1)/21
=
s
exp
1+
2
r
r
2 (r /2) 2r /2
5. Therefore, the marginal density of T is given by:
Z
fT (t) =
fS,T (s, t) ds
812/827
s=
2w
,
1 + t 2 /r
so that:
1
dw =
2
t2
1+
ds
r
ds =
2
1 + t 2 /r
dw .
So that we have:
Z
fT (t) =
0
Z
=
0
813/827
1
1
s
t2
s (r +1)/21 exp 1 +
ds
2
r
r
2 (r /2) 2r /2
2 (r /2) 2r /2
2w
2
1 + tr
! (r +1)
2 1
1
exp(w )
r
2
2
1 + tr
!
dw .
Simplifying:
fT (t) =
0
1
2r (r /2) 2r /2
2
1 + t 2 /r
(r +1)/21
2
1 + t 2 /r
w (r +1)/21 e w dw
1
r (r /2) 2(r +1)/2
((r + 1) /2)
1
=
(r /2)
r
1
1 + t 2 /r
R
0
2
1 + t 2 /r
(r +1)/2 Z
w (r +1)/21 e w dw
(r +1)/2
,
x 1 exp(x)dx = ().
Studentt c.d.f.
0.4
1
r=1
r=2
r=3
r=5
r=10
r=25
0.35
0.3
0.9
0.8
0.7
0.6
F (x)
f (x)
0.25
0.2
0.5
0.4
0.15
0.3
0.1
0.2
0.05
0
5
815/827
0.1
0
x
0
5
0
x
Snecdors F distribution
Suppose U 2 (n1 ) and V 2 (n2 ) are two independent
chi-squared distributed random variables.
Then, the random variable:
F =
U /n1
V /n2
u/n1
v /n2
, g = v;
n1
n2 .
Snecdors F distribution
3. Jacobian of the transformation:
0
v /f v /g
J(f , g ) = det
= det
g nn21
u/f u/g
Absolute value of the Jacobian: |J(f , g )| = g
1
f nn12
= g
n1
n2 .
817/827
(n1 2)
2
n1
.
n2
Snecdors F distribution
fF ,G (f , g ) =
exp
exp
n2
2
2
2 1 n21
2n2 /2 n22
5. The marginal of F is obtained by integrating over all possible
values of G :
Z
fF (f ) =
fF ,G (f , g )dg
0
Z
1
fn1
(n1 +n2 2)/2
=func(f )
g
exp g
+
dg
2 2n2
0
where func(f ) =
818/827
n1
(f n1 )(n1 2)/2
Continues:
fF (f ) =func(f )
=func(f )
2 n2
n2 + f n1
2 n2
n2 + f n1
((n1 + n2 )/2)
((n1 + n2 )/2)
f n1 /21
n /2
n2 2
n1 (f n1 )(n1 2)/2
n /2
(n
+n
)/2
2 1 2 n21 (n2 /2)(n1 /2)
Snecdors F c.d.f.
1
n =2, n =2
1
0.9
0.8
n1=2, n2=6
0.8
0.7
n1=2, n2=10
0.7
0.6
fX(x)
n =2, n =4
2
n1=10, n2=2
n1=10, n2=10
0.5
0.6
FX(x)
0.9
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0
0
820/827
5
x
10
0
0
5
x
10
821/827
Xk X .
n1
k=1
X
S
(n 1) S 2
2 (n 1): sample variance using population
2
variance.
823/827
tn1
Proof:
X
X
S
n
=q 2 q
S
2
Z
2n1
n1
tn1 ,
Pn
i=1
Xi X
2
2
i=1
i=1
We have:
Pn
|
i=1 (Xi
2
)2
{z
Pn
i=1
Zi2 2n
Pn
(Xi X ) + (X )
2
Pn
Xi X
2
2
Xi X
2
2
i=1
i=1
Pn
=
i=1
2
Pn
X
2
!2
X
.
i=1
+
+
2
{z
Z 2 21
=0
X
S
tn1
(n 1) S 2
2 (n 1)
2
Week 5
Week 2
Week 3
Week 4
Probability:
Review
Estimation: Week 6
Week
7
Week
8
Week 9
Hypothesis testing:
Week
10
Week
11
Week
12
Linear regression:
Week 2 VL
Week 3 VL
Week 4 VL
Video lectures: Week 1 VL
Week 1
Week 5 VL
1001/1074
1002/1074
This week
Parameter estimation:
- Method of Moments;
- Maximum Likelihood method;
- Bayesian estimator.
Definition of an Estimator
Problem of statistical estimation: a population has some
characteristics that can be described by a r.v. X with density
fX ( | ).
Density has unknown parameter (or set of parameters) .
We observe values of the random sample X1 , X2 , . . . , Xn
from the population fX ( | ). Denote this observed sample
values by x1 , x2 , . . . , xn .
We then estimate the parameter (or some function of the
parameter) based on this random sample.
1004/1074
Definition of an Estimator
Any statistic, i.e., a function T (X1 , X2 , . . . , Xn ), that is a
function of observable random variables and whose values are
used to estimate (), where () is some function of the
parameter , is called an estimator of ().
A value b of the statistic evaluated at the observed sample
values by x1 , x2 , . . . , xn , will be called an (point) estimate.
For example:
1 Pn
Xj , estimator;
T (X1 , X2 , . . . , Xn ) = X n =
n j=1
b
= 0.23,
point estimate.
Note can be a vector, then the estimator is a set of
equations.
1005/1074
n
n
n
1 X 2
1 X k
1 X
xj , m2 =
xj , . . . , mk =
xj ,
n
n
n
j=1
j=1
j=1
b
Solving this provides us the point estimate .
b = x/n.
p
E X2
| {z }
population moment
sample moment
n
1 X 2
xj .
n
j=1
{z
sample moment
1009/1074
and
b =E [X ] = x
b2 =E X 2 (E [X ])2
n
n
1X
1X 2
n1 2
xj x 2 =
(xj x)2 =
s ,
=
n
n
n
j=1
n
(xj x )
* using s 2 = j=1n1
is the sample variance.
2
2
Note: E
b 6= (biased estimator), more on this next
week.
1010/1074
j=1
n
Y
fX (xj |) ,
j=1
1012/1074
L (; x)
= 0,
2
...,
L (; x)
= 0.
k
L
1
D (L) =
L
2
2L
2
1
H (L) =
2L
1 2
1014/1074
2L
1 2
2
L
22
2L
2L
2
1 2
h1
1
h1 h2
h2 < 0,
2L
2L
1 2
22
for all [h1 , h2 ] 6= 0.
1015/1074
Log-Likelihood function
Generally, maximizing the log-likelihood function is easier.
Not surprisingly, we define the log-likelihood function as:
` (1 , 2 , . . . , k ; x) = log (L (1 , 2 , . . . , k ; x))
n
Y
= log fX (xj |)
j=1
n
X
j=1
1016/1074
MLE procedure
The general procedure to find the ML estimator is:
1. Determine the likelihood function L (1 , 2 , . . . , k ; x);
2. Determine the log-likelihood function
` (1 , 2 , . . . , k ; x) = log (L (1 , 2 , . . . , k ; x));
3. Equate the derivatives of ` (1 , 2 , . . . , k ; x) w.r.t.
1 , 2 , . . . , k to zero ( global/local minimum/maximum).
4. Check wether second derivative is negative (maximum) and
boundary conditions.
1017/1074
...
fX (xj |) =
L (; x) =
x1 !
x2 !
xn !
j=1
x1
x2
xn
=e n
...
.
x1 ! x2 !
xn !
2. So that taking the log of both sides, we get:
n
n
X
X
xk
log (xk !) .
` (; x) = n + log ()
k=1
k=1
j=1
j=1
1X
` () = 0
n +
xk = 0.
k=1
X
b= 1
xk = x,
n
k=1
1019/1074
2 !
.
1
1
L (, ; x) =
exp
2
2
k=1
Question: Find the MLE of and 2 .
1020/1074
xk
2 !
.
1
1
exp
2
2
=n log()
xk
2 !!
n
1 X
(xk )2 .
log(2) 2
2
2
k=1
* using log(1/a)
= log(a1 ) = log(a), with a =
1 X
` (, ; x) = 2
(xk ) = 0
k=1
n
X
xk n = 0
k=1
b=x
Pn
(xk )
n
` (, ; x) =
+ k=1 3
=0
n
P
(xk )
k=1
n=
2
n
X
1
b2 =
(xk x)2 .
n
k=1
1022/1074
()
x 1 e x ;
tX
MX (t) = E e
= t ;
fX (x) =
E [X r ] =
Var (X ) =
(+r )
r ()
.
2
t=0
1024/1074
= E [X ] = x
and
(2)
2 = MX (t)
t=0
n
X
xi2
= E X2 =
.
n
i=1
( + 1)
+1
1
1 =
and
2 =
= 1 1 +
.
=
= 1 +
= 1
=
=
1
2 21
21
.
2 21
b = 2.
b
using (step 1.) 1 = x and
n 2
n 2
P
P
xi
xi
2 =
2
2 =
b2
2
1
n
n x =
1025/1074
i=1
i=1
n
Y
i=1
1
xi1 e xi .
()
i=1
()
` (, ; x) = n + n log() +
log(xi ) = 0
()
i=1
n
` (, ; x) =
n
X
xi = 0.
i=1
n
P
xi
i=1
I{0xk } ,
k=1
L(; x)
8
6
Qn
2
0
1029/1074
x(2)
4th
5th
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
1
1031/1074
2nd
F()
F()
1st time
1
0
0.1
0.2
0.3
0.4
Introduction
We have seen:
I
1033/1074
b
for every ;
when b = .
(mostly used);
1036/1074
b
=argminb Eb E L(|)
n h
io
b
=argminb Eb L()
.
1037/1074
Z Z
b
=
L((x),
) fx| (|x)dxd
Z Z
b
) (|x)d fx| dx
=
L((x),
|
{z
}
b
r (|x)
Z
=
b fx| dx.
r (|x)
b
2 ( (x))
(|x)d = 0
Z
B
b
(x) =
(|x)d
bB (x) = E|x [] .
Interpretation: Bayesian estimator under squared error loss
function is the expectation of the posterior density, i.e.,
bB = E[(|x)]!
1039/1074
(|x) =
fX | (x1 , x2 , . . . , xT | ) ()
fX | (x1 , x2 , . . . , xT | ) () d
(1)
fX | (x1 , x2 , . . . , xT | ) ()
fX (x1 , x2 , . . . , xT )
i )Pr(Ai )
, with
* Using Bayes formula: Pr(Ai |B) = PnPr(B|A
j=1 Pr(B|Aj )Pr(Aj )
A1 , . . . , An a complete partition of .
P
** Using LTP: Pr(A) = ni=1 Pr(A|Bi ) Pr(Bi )
(where B1 , . . . , Bn a complete partition of , week 1).
(2)
Estimation procedure:
1041/1074
(a + b)
a1 (1 )b1 .
(a) (b)
1042/1074
j=1
xj
(1 )
T
P
j=1
xj
= s (1 )T s .
(a + b)
(a+s)1 (1 )(b+T s)1
(a) (b)
(3)
fX (x) =
fX | (x | ) ()d
0
Z 1
(a + b)
(a+s)1 (1 )(b+T s)1 d
=
(a)
(b)
0
(a + b) (a + s) (b + T s)
.
=
(a) (b)
(a + b + T )
** :
R1
0
x 1 (1 x)1 dx = B(, ) =
using (2):
(|x) =
=
1044/1074
()()
(+) ;
Posterior density
fX | (x | ) ()
fX (x)
s (1 )T s
(a+b)
a1 (1
(a)(b)
(a+b) (a+s)(b+T s)
(a)(b)
(a+b+T )
)b1
(a + b + T )
(a+s)1 (1 )(b+T s)1 ,
(a + s) (b + T s)
a+s
a+b+T
.
= E[|X = x] =
a+b+T
T}
a+b+T
a+b
|
{z
|
{z
}
|
{z
} | {z }
weight sample
1045/1074
sample mean
weight prior
prior mean
Exercise Normal-Normal
Let X1 , X2 , . . . , XT be i.i.d. Normal , 22 , i.e.,
(Xi | = ) Normal(, 22 ).
Assume the prior density of is Normal m, 12 so that:
1
( m)2
.
() =
exp
2 12
21
Question: Find the Bayesian estimator for .
1046/1074
fX | (x | ) =
exp
2 22
22
j=1
!
PT
2
(x
)
1
j
j=1
=
exp
2 22
( 22 )T
1. Posterior density:
(|x) fX | (x|) () exp
PT
j=1 (xj
)2
( m)2
exp
2 12
2 22
!
PT
2
( m)2
j=1 (xj )
= exp
2 22
2 12
!
PT
2
2
(2 + m2 2 m)
j=1 (xj + 2 xj )
= exp
2 22
2 12
!
P
2
2
22 (2 + m2 2 m) + 12 T
j=1 (xj + 2 xj )
= exp
2 22 12
2
(22 + T 12 ) 2 (m 22 + T x 12 )
exp
2 22 12
(m 2 +T x 2 )
m22 +T x12 2
2 2 (22+T 2 ) 1
22 +T 12
2
1
= exp
exp
2
2
2
2
2
2
2 2 1 /(2 + T 1 )
2 2 1 /(22 + T 12 )
1047/1074
P
m2 + T
j=1 xj
22 2
2 1
*: exp
and **:
m22 +T x12 2
2
2
2
2
2 2 1 /(2 + T 1 )
exp
are
2 +T 2
2
constants given x.
1. Thus |X is Normally distributed with mean
variance
mean:
22 12
22 +T 12
1
12
1
12
T
22
m22 +T x12
22 +T 12
and
m+
T
12
1
12
T
22
x,
and variance:
1
T
+ 2
2
1
2
=
1048/1074
1
12
1
12
T
22
m+
T
12
1
12
T
22
x.
1
Chebyshevs Inequality
The Chebyshevs inequality, states that for any random
variable X with mean and variance 2 , the following
probability inequality holds for all > 0:
2
.
2
Note that this applies to all distributions, hence also
non-symmetric ones! This implies that:
Pr (|X | > )
2
Pr (X < ) .
2
Interesting example: set = k then:
1
Pr (|X | > k ) 2 .
k
This provides us with an upper bound of the probability that
X deviates more than k standard deviations of its mean.
Pr (X > )
1049/1074
Convergence concepts
Suppose X1 , X2 , . . . form a sequence of r.v.s. Example: Xi is
the sample variance using the first i observations.
Xn is said to converge almost surely (a.s.) to the random
variable X as n if and only if:
Pr ( : Xn () X () , as n ) =1,
a.s.
and we write Xn X , as n .
Sometimes called strong convergence. It means that beyond
some point in the sequence (), the difference will always be
less than some positive , but that point is random.
OPTIONAL:
Also expressed as: Pr (|Xn () X ()| > , i.o.) = 0, where
i.o. stands for infinitely often: Pr(An i.o.) = Pr(lim supn An ).
1051/1074
as n ,
and we write Xn X , as n .
Difference converges in probability and converges almost
surely: Pr (|Xn X | > ) goes to zero instead of equals zero
p
a.s.
1052/1074
as n .
1053/1074
1054/1074
X n ,
as n .
g (x) dx,
X
bIn (g ) = 1
g (Xk ) ,
n
k=1
Z
g (x) 1dx =
E [g (X )] =
0
g (x) dx = I (g ) .
0
1057/1074
Then, the LLN tells us that the amount each person will end
up paying becomes more predictable as the size of the group
increases. In effect, this amount will become
closer to , the average loss each individual expects.
1058/1074
as n .
This holds for all r.v. with finite mean and variance, not only
normal r.v.!
Prove & rewriting CLT: see next slides.
1059/1074
Xn
x
!
= (x) ,
Xn
1060/1074
n
converges to the standard normal distribution.
Zn =
1061/1074
2. Recall Sn =
n
P
n
P
Xi , the m.g.f. of Zn =
i=1
S
n
n
Xi
n
i=1
is
obtained by:
= MXi
n
* using MaX (t) = MX (a t) ** using Sn is the sum of n i.i.d.
random variables Xi , thus MPni=1 Xi (t) = MXn i (t).
Note that we only assumed that:
MXi (t) =f t, 2 ;
E [Xi ] =;
Var (Xi ) = 2 < ,
1062/1074
1 2
t M (2) (t)
+ O(t 3 ),
2
t=0
t=0
where O(t 3 ) covers all terms ck t k , with ck R for k 3.
=M (0) + t M (1) (t)
t=0
=E [Xi ] = 0,
and
(2)
MXi (t)
t=0
2
= E Xi2 = Var (Xi ) + (E [Xi ]) = 2 .
Now we can align the results from the previous two slides:
n
1062
lim MZn (t) = lim MXi t/( n)
n
n
!n
i
X
t/( n)
1063
(i)
= lim
MXi (t)
n
i!
t=0
i=0
2
3 !!n
t
t
1
1063
2
+O
= lim 1 + 0 +
n
2 n
/ n
2
3/2 !!
1
1
t
2
=
lim n
+O
= ,
n
2
n
2
n
|
{z
}
3/2
2
1/2
=n O ( n1 )
+O ( n1 )
=O ( n1 )
0, if n
P (1)i+1 ai
t2
1 3/2
2
* using log(1 + a)= i=1
=
a
+
O(a
),
with
a=
+
O
.
i
n
n
1064/1074
n
2
X n N , / n
n X n N n , n 2
0.9772 = Pr 400 X 400 400 10 million + 2 20 25 million .
Thus, Pr 400 X 400 > $5 billion = 1 0.9772 = 0.0228.
1065/1074
npq
Question: What is the probability that X = 60 if
X Bin(1000, 0.06)? Not in Binomial tables!
1066/1074
n = 5, p = 0.1
0.4
p.d.f. N(0.5,0.45)
0.2
0
x
Binomial(30,0.1) p.m.f.
n = 30, p = 0.1
0.2
0.15
p.d.f. N(3,2.7)
0.1
0.05
1068/1074
0
0
10
20
x
30
Binomial(5,0.1) p.m.f.
Binomial(10,0.1) p.m.f.
0.4
n = 10, p = 0.1
0.3
p.d.f. N(1,0.9)
0.2
0.1
0
0.08
0.06
5
10
x
Binomial(200,0.1) p.m.f.
n = 200, p = 0.1
p.d.f. N(20,18)
0.04
0.02
0
0
100
x
200
1. We have the m.g.f. of Z : MZ (t) = exp t 2 /2 .
2. Next, we need to find the m.g.f. of Zn . We know (week 2):
MXn (t) = exp n e t 1 .
Thus, using the calculation rules for m.g.f., we have:
MZn (t) =M X
n n (t) = M Xn (t)
n
n
n
p
p
=exp n t MXn t/ n
p
= exp n t exp n e t/ n 1
p
= exp n t + n e t/ n 1
3. Find the limit of the MZn (t) and proof it equals MZ (t):
p
lim MZn (t) = lim exp n t + n e t/ n 1
n
n
t
p
lim log (MZn (t)) = lim t n + n e n 1
n
n
2
p
t
1
t
= lim t n + n 1 + +
n
n 2!
n
!
3
1
t
+
+ . . . 1
3!
n
1 2
1
= lim t + O
= t 2 /2
n 2!
n
lim MZn (t) = exp t 2 /2 = MZ (t).
n
1071/1074
* usingexponential expansion: e a =
a = t/ n .
ai
i=1 i! ,
with
Poisson(0.1) p.m.f.
= 0.1
p.d.f. N(0.1,0.1)
0.5
1
2
x
Poisson(10) p.m.f.
= 10
0.1
p.d.f. N(10,10)
0.05
0
0
1072/1074
10
20
x
30
Poisson(1) p.m.f.
=1
0.3
p.d.f. N(1,1)
0.2
0.1
0
4
6
x
Poisson(100) p.m.f.
= 100
0.03
0.02
p.d.f. N(100,100)
0.01
0
0
100
x
200
Parameter estimators
Method of moments:
1. Equate (the first) k sample moments to the corresponding k
population moments;
2. Equate the k population moments to the parameters of the
distribution;
3. Solve the resulting system of simultaneous equations.
Maximum likelihood:
1. Determine the likelihood function L (1 , 2 , . . . , k ; x);
2. Determine the log-likelihood function
` (1 , 2 , . . . , k ; x) = log (L (1 , 2 , . . . , k ; x));
3. Equate the derivatives of ` (1 , 2 , . . . , k ; x) w.r.t.
1 , 2 , . . . , k to zero ( global/local minimum/maximum).
4. Check wether second derivative is negative (maximum) and
boundary conditions.
Bayesian:
1073/1074
as n .