Download as pdf or txt
Download as pdf or txt
You are on page 1of 124

ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes

ACTL2002/ACTL5101 Probability and Statistics


c Katja Ignatieva

School of Risk and Actuarial Studies
Australian School of Business
University of New South Wales
k.ignatieva@unsw.edu.au

Week 5 Video Lecture Notes


Week 2
Week 3
Week 4
Probability: Week 1
Week 6
Review
Estimation: Week 5
Week
7
Week
8
Week 9
Hypothesis testing:
Week
10
Week
11
Week
12
Linear regression:
Week 2 VL
Week 3 VL
Week 4 VL
Video lectures: Week 1 VL

ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes


Special Sampling Distributions: chi-squared distribution
Chi-squared distribution: one degree of freedom

Special sampling distributions & sample mean and variance

Special Sampling Distributions: chi-squared distribution


Chi-squared distribution: one degree of freedom
Chi-squared distribution: n degrees of freedom

Special Sampling Distributions: student-t distribution


Jacobian technique and William Gosset (t-distribution)

Special Sampling Distributions: Snecdors F distribution


Jacobian technique and Snecdors F distribution

Distribution of sample mean/variance


Background
Fundamental sampling distributions

ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes


Special Sampling Distributions: chi-squared distribution
Chi-squared distribution: one degree of freedom

Chi-squared distribution: one degree of freedom


Sampling from a normal distribution; independent and
identically distributed (i.i.d.) random values.
Suppose Z N (0, 1), then
Y = Z 2 2 (1)
has a chi-squared distribution with one degree of freedom.
Distribution characteristics:
1
fY (y ) =
exp(y /2);
2y

FY (y ) =FZ ( y ) FZ ( y ) = 2 FZ ( y ) 1;
 
E[Y ] =E Z 2 = 1;
 
 
 2
Var (Y ) =E Y 2 (E [Y ])2 = E Z 4 E Z 2
= 3 1 = 2.
802/827

Prove: see next slides.

ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes


Special Sampling Distributions: chi-squared distribution
Chi-squared distribution: one degree of freedom

Prove that Z 2 has a chi-squared distributed with one degree


of freedom (using p.d.f.), with Z a standard normal r.v..
Proof: using the CDF technique (seen last week). Consider:

FY (y ) = Pr Z 2 y

= Pr ( y Z y )
Z y
1 2
1
e 2 z dz
=

2
y
Z y
1 2
1
e 2 z dz
= 2
2
Z0 y
1
1
1

w 1/2 e 2 w dw .
= 2
2 2
0

* using change of variable z = w , so that


dz = 12 w 1/2 dw .
803/827

Proof continues on next slide.

ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes


Special Sampling Distributions: chi-squared distribution
Chi-squared distribution: one degree of freedom

Proof (cont.).
Z
FY (y ) =
0

1
1
w 1/2 e 2 w dw .
2

Differentiating to get the p.d.f. gives:


1
FY (y )
1
= fY (y ) = y 1/2 e 2 y
y
2
1
 y (12)/2 e y /2 ,
= 1/2
2 21

** using differentiation of integral:

804/827

Rb
a

f (x)dx
b

= f (b).

which is the density of a 2 (1) distributed random variables


(see F&T pages 164-169 for tabulated values of c.d.f.).

dist
Note: Yi 2 (1) = Gamma 21 , 12

1/2
1/2
MY (t) = 1/2t
= (1 2 t)1/2 .

ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes


Special Sampling Distributions: chi-squared distribution
Chi-squared distribution: n degrees of freedom

Special sampling distributions & sample mean and variance

Special Sampling Distributions: chi-squared distribution


Chi-squared distribution: one degree of freedom
Chi-squared distribution: n degrees of freedom

Special Sampling Distributions: student-t distribution


Jacobian technique and William Gosset (t-distribution)

Special Sampling Distributions: Snecdors F distribution


Jacobian technique and Snecdors F distribution

Distribution of sample mean/variance


Background
Fundamental sampling distributions

ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes


Special Sampling Distributions: chi-squared distribution
Chi-squared distribution: n degrees of freedom

Chi-squared distribution: n degrees of freedom


P
Let Zi , i = 1, . . . , n be i.i.d. N(0,1), then X = ni=1 Zi2 , has
a Chi-squared distribution with n d.f.: X 2 (n).
Distribution properties:
1
x (n2)/2 e x/2 , if x > 0,
fX (x) = n/2
2 (n/2)
and zero otherwise. Parameter constraints: n = 1, 2, . . .
" n
#
X

E[X ] = E
Yi = n E [Yi ] = n
i=1
n
X

Var (X ) = Var

!
Yi

= n Var (Yi ) = 2 n

i=1

MX (t) =
805/827

MPn

i=1

Yi (t)

= MYn i (t) = (1 2 t)n/2 ,

Prove: * use i = 1, . . . , n i.i.d. Yi 2 (1).

t < 1/2.

ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes


Special Sampling Distributions: chi-squared distribution
Chi-squared distribution: n degrees of freedom

Alternative proof: Recall the p.d.f. of Y :


1
y 1/2 e y /2
fY (y ) =
2 (1/2)
Recall X Gamma (n, ), with p.d.f.:
fX (x) =

n x n1 e x
,
(n)

if x 0 and zero otherwise.

For independent Y1 , Y2 , . . . , Yn 2 (1) ,




n 1 dist 2
Y1 + Y2 + . . . + Yn Gamma
,
= (n) ,
2 2
since the sum of i.i.d. Gamma random variables
Gamma(
Pi ,n) is also a Gamma random variable but with
Gamma( i=1 i , ) (see lecture week 2).
806/827

See F&T pages 164-169 for tabulated values of c.d.f.

ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes


Special Sampling Distributions: chi-squared distribution
Chi-squared distribution: n degrees of freedom

Chi-squared probability/cumulative
density function
2
2
p.d.f.

c.d.f.

1
n=1
n=2
n=3
n=5
n=10
n=25

0.5

0.4

0.9
0.8
0.7

FX(x)

fX(x)

0.6
0.3

0.5
0.4

0.2

0.3
0.2

0.1

0.1
0
0
807/827

10

20
x

30

0
0

10

20
x

30

ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes


Special Sampling Distributions: student-t distribution
Jacobian technique and William Gosset (t-distribution)

Special sampling distributions & sample mean and variance

Special Sampling Distributions: chi-squared distribution


Chi-squared distribution: one degree of freedom
Chi-squared distribution: n degrees of freedom

Special Sampling Distributions: student-t distribution


Jacobian technique and William Gosset (t-distribution)

Special Sampling Distributions: Snecdors F distribution


Jacobian technique and Snecdors F distribution

Distribution of sample mean/variance


Background
Fundamental sampling distributions

ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes


Special Sampling Distributions: student-t distribution
Jacobian technique and William Gosset (t-distribution)

Jacobian technique and William Gosset


As an illustration of the Jacobian transformation technique,
consider deriving the t-distribution (see exercises 4.111, 4.112
and 7.30 in W+(7ed)).
t-Distributions discovered by William Gosset in 1908. Gosset
was a statistician employed by the Guinness brewing company.
Suppose Z N (0, 1) and V 2 (r ) =

r
P

Zi2 , where

k=1

Zi , i = 1, . . . , r i.i.d. and Z , V are independent.


Then, the random variable:
Z
T =p
V /r
has a t-distribution with r degrees of freedom.
808/827

ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes


Special Sampling Distributions: student-t distribution
Jacobian technique and William Gosset (t-distribution)

Jacobian transformation technique procedure

Recall the procedure to find joint density of u1 = g1 (x1 , x2 )


and u2 = g2 (x1 , x2 ):
1. Find u1 = g1 (x1 , x2 ) and u2 = g2 (x1 , x2 ).
2. Determine h (u1 , u2 ) = g 1 (u1 , u2 ).
3. Find the absolute value of the Jacobian of the transformation.
4. Multiply that with the joint density of X1 , X2 evaluated in
h1 (u1 , u2 ), h2 (u1 , u2 ).

809/827

ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes


Special Sampling Distributions: student-t distribution
Jacobian technique and William Gosset (t-distribution)

Proof:
Note p.d.f.s:
fV (v ) =
fZ (z) =

v r /21
e v /2 ,
2r /2 (r /2)
1 2
1 e 2 z ,
2

if 0 v < ;
if < z < .

1. Define the variables:


s = g1 (z, v ) = v

and

z
.
t = g2 (z, v ) = p
v /r

2. So that this forms a one-to-one transformation with inverse:


p
v = h1 (s, t) = s
and
z = h2 (s, t) = t s/r .
810/827

ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes


Special Sampling Distributions: student-t distribution
Jacobian technique and William Gosset (t-distribution)

3. The Jacobian is:

h1 (s, t) h1 (s, t)

s
t

J (s, t) = det

h (s, t) h (s, t)
2
2
s
t

1
0
p
= s /r
= det
p
1
1/2 /r
s /r
2 t s

Note that the support is:


0 <v <
0 <s<
811/827

and
and

< z < ;
< t < .

ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes


Special Sampling Distributions: student-t distribution
Jacobian technique and William Gosset (t-distribution)

Since Z and V are independent, their joint density can be


written as:

fZ ,V (z, v ) =fZ (z) fV (v )


1 2
1
1
= e 2 z
v r /21 e v /2 .
(r /2) 2r /2
2
4. Using the Jacobian transformation formula above, the joint
density of (S, T ) is given by:
2

p
1 21 t s/r
1
fS,T (s, t) = s /r e

s r /21 e s/2
(r /2) 2r /2
2



1
1
s
t2
(r +1)/21

=
s

exp
1+
2
r
r
2 (r /2) 2r /2
5. Therefore, the marginal density of T is given by:
Z
fT (t) =
fS,T (s, t) ds
812/827

(continues on next slide).

ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes


Special Sampling Distributions: student-t distribution
Jacobian technique and William Gosset (t-distribution)

Making the transformation:




s
t2
w=
1+
2
r

s=

2w
,
1 + t 2 /r

so that:
1
dw =
2



t2
1+
ds
r

ds =

2
1 + t 2 /r


dw .

So that we have:
Z

fT (t) =
0

Z
=
0
813/827




1
1
s
t2
s (r +1)/21 exp 1 +
ds
2
r
r
2 (r /2) 2r /2

2 (r /2) 2r /2

2w
2
1 + tr

! (r +1)
2 1

1
exp(w )
r

2
2
1 + tr

!
dw .

ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes


Special Sampling Distributions: student-t distribution
Jacobian technique and William Gosset (t-distribution)

Simplifying:

fT (t) =
0

1
2r (r /2) 2r /2

2
1 + t 2 /r

(r +1)/21 

2
1 + t 2 /r

w (r +1)/21 e w dw

1
r (r /2) 2(r +1)/2

((r + 1) /2)
1
=
(r /2)
r

* using Gamma function:

1
1 + t 2 /r

R
0

2
1 + t 2 /r

(r +1)/2 Z

w (r +1)/21 e w dw

(r +1)/2
,

for < t < ,

x 1 exp(x)dx = ().

This is the standard form of tdistribution (see F&T page


163 for tabulated values of c.d.f.).

814/827

ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes


Special Sampling Distributions: student-t distribution
Jacobian technique and William Gosset (t-distribution)

Student-t probability/cumulative density function


Studentt p.d.f.

Studentt c.d.f.

0.4

1
r=1
r=2
r=3
r=5
r=10
r=25

0.35
0.3

0.9
0.8
0.7
0.6

F (x)

f (x)

0.25

0.2

0.5
0.4

0.15

0.3
0.1
0.2
0.05
0
5
815/827

0.1
0
x

0
5

0
x

ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes


Special Sampling Distributions: Snecdors F distribution
Jacobian technique and Snecdors F distribution

Special sampling distributions & sample mean and variance

Special Sampling Distributions: chi-squared distribution


Chi-squared distribution: one degree of freedom
Chi-squared distribution: n degrees of freedom

Special Sampling Distributions: student-t distribution


Jacobian technique and William Gosset (t-distribution)

Special Sampling Distributions: Snecdors F distribution


Jacobian technique and Snecdors F distribution

Distribution of sample mean/variance


Background
Fundamental sampling distributions

ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes


Special Sampling Distributions: Snecdors F distribution
Jacobian technique and Snecdors F distribution

Snecdors F distribution
Suppose U 2 (n1 ) and V 2 (n2 ) are two independent
chi-squared distributed random variables.
Then, the random variable:
F =

U /n1
V /n2

has a F distribution with n1 and n2 degrees of freedom.


See F&T pages 170-174 for tabulated values of c.d.f.
Prove: Use Jacobian technique.
1. Define variables: f =

u/n1
v /n2

, g = v;

2. Inverse transformation: v = g and u = f g


816/827

n1
n2 .

ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes


Special Sampling Distributions: Snecdors F distribution
Jacobian technique and Snecdors F distribution

Snecdors F distribution
3. Jacobian of the transformation:



0
v /f v /g
J(f , g ) = det
= det
g nn21
u/f u/g
Absolute value of the Jacobian: |J(f , g )| = g

1
f nn12


= g

n1
n2 .

4. Multiply the absolute value of the Jacobian by the joint


density (joint density, using independence:
fU,V (u, v ) = fU (u) fV (v )):
fU,V (u, v ) =fU (u) fV (v )
(n2 2)
 u
 v
v 2
= n /2
exp
n /2
exp
2
2
2 1 (n1 /2)
2 2 (n2 /2)

817/827

(n1 2)
2

Continues on the next slide.

n1
.
n2

ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes


Special Sampling Distributions: Snecdors F distribution
Jacobian technique and Snecdors F distribution

Snecdors F distribution

(Cont.) Joint density F and G (using u = f g nn12 and


v = g ):
 (n1 2)

!
2
f n1 g
(n2 2)
f n1 g
 g
n2
g 2
n1 g
n2


n /2

fF ,G (f , g ) =

exp

exp

n2
2
2
2 1 n21
2n2 /2 n22
5. The marginal of F is obtained by integrating over all possible
values of G :
Z
fF (f ) =
fF ,G (f , g )dg
0



Z
1
fn1
(n1 +n2 2)/2
=func(f )
g
exp g
+
dg
2 2n2
0
where func(f ) =
818/827

n1
(f n1 )(n1 2)/2

2n2 /2 (n2 /2) nn1 /2 2n1 /2 (n1 /2)


2

ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes


Special Sampling Distributions: Snecdors F distribution
Jacobian technique and Snecdors F distribution

Continues:

fF (f ) =func(f )
=func(f )

2 n2
n2 + f n1

(n1 +n2 2)/2+1 Z

2 n2
n2 + f n1

(n1 +n2 )/2

x (n1 +n2 2)/2 exp (x) dx

((n1 + n2 )/2)

((n1 + n2 )/2)
f n1 /21

(n1 /2) (n2 /2) (n2 + f n1 )(n1 +n2 )/2




n1
2
* using transformation x = g 12 + f2n
, thus g = n22n
+f n1 x and
2



1
+f n1
+f n1
dx = n22n
dg , thus dg = n22n
dx.
2
R 2 1
** using Gamma function: () = 0 x
exp(x)dx.
n1 /2
= n1

n /2

n2 2

*** using func(f ) =


819/827

n1 (f n1 )(n1 2)/2
n /2
(n
+n
)/2
2 1 2 n21 (n2 /2)(n1 /2)

ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes


Special Sampling Distributions: Snecdors F distribution
Jacobian technique and Snecdors F distribution

Snecdors F probability density function


Snecdors F p.d.f.

Snecdors F c.d.f.

1
n =2, n =2
1

0.9

0.8

n1=2, n2=6

0.8

0.7

n1=2, n2=10

0.7

0.6

fX(x)

n =2, n =4
2

n1=10, n2=2
n1=10, n2=10

0.5

0.6

FX(x)

0.9

0.5

0.4

0.4

0.3

0.3

0.2

0.2

0.1

0.1

0
0
820/827

5
x

10

0
0

5
x

10

ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes


Distribution of sample mean/variance
Background

Special sampling distributions & sample mean and variance

Special Sampling Distributions: chi-squared distribution


Chi-squared distribution: one degree of freedom
Chi-squared distribution: n degrees of freedom

Special Sampling Distributions: student-t distribution


Jacobian technique and William Gosset (t-distribution)

Special Sampling Distributions: Snecdors F distribution


Jacobian technique and Snecdors F distribution

Distribution of sample mean/variance


Background
Fundamental sampling distributions

ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes


Distribution of sample mean/variance
Background

Properties of the sample mean and sample variance


Suppose you select randomly from a sample.
Assume selected with replacement or, alternatively, from a
large population size.
These outcomes (x1 , . . . , xn ) are random variables, all with the
same distribution and independent.
Suppose X1 , X2 , . . . , Xn are n independent r.v. with identical
distribution. Define the sample mean by:
n
1 X
X =
Xk ,
n
k=1

821/827

and recall the sample variance by:


n
X
2
1
S2 =

Xk X .
n1
k=1

ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes


Distribution of sample mean/variance
Fundamental sampling distributions

Special sampling distributions & sample mean and variance

Special Sampling Distributions: chi-squared distribution


Chi-squared distribution: one degree of freedom
Chi-squared distribution: n degrees of freedom

Special Sampling Distributions: student-t distribution


Jacobian technique and William Gosset (t-distribution)

Special Sampling Distributions: Snecdors F distribution


Jacobian technique and Snecdors F distribution

Distribution of sample mean/variance


Background
Fundamental sampling distributions

ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes


Distribution of sample mean/variance
Fundamental sampling distributions

Fundamental sampling distributions


Sampling distributions for i.i.d. normal samples, i.e.,
Xi N(, 2 ).
In the next slides we will prove the following important
properties:

1 2

- X N , n : sample mean using known population


variance.
- T =

X
S

tn1 : sample mean using sample variance.

(n 1) S 2
2 (n 1): sample variance using population
2
variance.

- X and S 2 are independent (proof given in Exercise 13.93 of


W+(7ed)).
822/827

ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes


Distribution of sample mean/variance
Fundamental sampling distributions

Distribution of sample mean (known 2 )


Prove that the distribution of the sample mean given known
variance is N(, 2 /n).
We have X1 , . . . , Xn are i.i.d. normally distributed variables.
n
P
Xi
We defined the sample mean by: X =
n .
i=1

Use MGF-technique to find the distribution of X :




t
1 2  t 2 n
n
n
MX (t) =M P
(t) = MXi (t/n) = exp +
Xi /n
n 2
n
i=1


1 2 2
t
= exp t +
2 n

823/827

which is the m.g.f. of a normal distribution with mean and


variance 2 /n.

ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes


Distribution of sample mean/variance
Fundamental sampling distributions

Distribution of sample mean (unknown 2 )


The distribution of the sample mean given unknown
(population) variance is given by:
X
S

tn1

Proof:
X

X
S

n
=q 2 q

S
2

Z
2n1
n1

tn1 ,

where Z N(0, 1) is a standard normal r.v..


* Using (n 1) S 2 / 2 2n1 (prove: see next slides).
824/827

ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes


Distribution of sample mean/variance
Fundamental sampling distributions

Distribution of sample variance


Prove that the distribution of the sample variance is given by:
(n 1) S 2
2n1 .
2
First note that:
(n 1) S 2
=
2

Pn

i=1

Xi X
2

2

and second note that:


Pn

n 
n
2
X
Xi 2 X 2
i=1 (Xi )
=
=
Zi 2n ,
2

i=1

i=1

where Zi N(0, 1), i = 1, . . . , n i.i.d. standard normal r.v..


825/827

ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes


Distribution of sample mean/variance
Fundamental sampling distributions

We have:
Pn
|

i=1 (Xi
2

)2

{z

Pn

i=1

Zi2 2n

Pn

(Xi X ) + (X )
2

Pn

Xi X
2

2

Xi X
2

2

i=1

i=1

Pn
=

i=1

2

Pn

X
2
!2
X
.

i=1

+
+

2

{z

Z 2 21

Hence, the first term on right is 2n1 (using gamma sum


property/MGF-technique).
Xn
* Using 2 (X )
(Xi X ) = 0.
}
| i=1 {z
826/827

=0

ACTL2002/ACTL5101 Probability and Statistics: Week 5 Video Lecture Notes


Distribution of sample mean/variance
Fundamental sampling distributions

Fundamental sampling distributions


We have now proven the following important properties:
- X N , n1 2
- T =

X
S

tn1

(n 1) S 2
2 (n 1)
2

We will use this for:


- confidence intervals for population mean and variance;
- testing population mean and variance;
- parameter uncertainty of a linear regression model.

Notice, when applying CLT, we do not need that Xi are


normally distributed anymore.
827/827

ACTL2002/ACTL5101 Probability and Statistics: Week 5

ACTL2002/ACTL5101 Probability and Statistics


c Katja Ignatieva

School of Risk and Actuarial Studies
Australian School of Business
University of New South Wales
k.ignatieva@unsw.edu.au

Week 5
Week 2
Week 3
Week 4
Probability:
Review
Estimation: Week 6
Week
7
Week
8
Week 9
Hypothesis testing:
Week
10
Week
11
Week
12
Linear regression:
Week 2 VL
Week 3 VL
Week 4 VL
Video lectures: Week 1 VL
Week 1

Week 5 VL

ACTL2002/ACTL5101 Probability and Statistics: Week 5

1001/1074

ACTL2002/ACTL5101 Probability and Statistics: Week 5

Last four weeks


Introduction to probability;
Moments: (non)-central moments, mean, variance (standard
deviation), skewness & kurtosis;
Special univariate distribution (discrete & continue);
Joint distributions;
Dependence of multivariate distributions
Functions of random variables

1002/1074

ACTL2002/ACTL5101 Probability and Statistics: Week 5

This week
Parameter estimation:
- Method of Moments;
- Maximum Likelihood method;
- Bayesian estimator.

Convergence (almost surely, probability, & distribution);


Application (important theorems):
- Law of large numbers;
- Central limit theorem.
1003/1074

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Parameter estimation
Definition of an estimator

Limit theorems & parameter estimators


Parameter estimation
Definition of an estimator
Estimator I: the method of moments
The method of moments
Example & exercise
Estimator II: maximum likelihood estimator
Maximum likelihood estimation
Example & exercise
Sampling distribution and the bootstrap
Estimator III: Bayesian estimator
Introduction
Bayesian estimation
Example & exercise
Convergence of series
Chebyshevs Inequality
Convergence concepts
Application of strong convergency: Law of Large Numbers
Application of weak convergency: Central Limit Theorem
Application of convergence in distribution: Normal Approximation to the Binomial
Application of convergence in distribution: Normal Approximation to the Poisson
Summary
Summary

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Parameter estimation
Definition of an estimator

Definition of an Estimator
Problem of statistical estimation: a population has some
characteristics that can be described by a r.v. X with density
fX ( | ).
Density has unknown parameter (or set of parameters) .
We observe values of the random sample X1 , X2 , . . . , Xn
from the population fX ( | ). Denote this observed sample
values by x1 , x2 , . . . , xn .
We then estimate the parameter (or some function of the
parameter) based on this random sample.

1004/1074

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Parameter estimation
Definition of an estimator

Definition of an Estimator
Any statistic, i.e., a function T (X1 , X2 , . . . , Xn ), that is a
function of observable random variables and whose values are
used to estimate (), where () is some function of the
parameter , is called an estimator of ().
A value b of the statistic evaluated at the observed sample
values by x1 , x2 , . . . , xn , will be called an (point) estimate.
For example:
1 Pn
Xj , estimator;
T (X1 , X2 , . . . , Xn ) = X n =
n j=1
b
= 0.23,
point estimate.
Note can be a vector, then the estimator is a set of
equations.
1005/1074

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Estimator I: the method of moments
The method of moments

Limit theorems & parameter estimators


Parameter estimation
Definition of an estimator
Estimator I: the method of moments
The method of moments
Example & exercise
Estimator II: maximum likelihood estimator
Maximum likelihood estimation
Example & exercise
Sampling distribution and the bootstrap
Estimator III: Bayesian estimator
Introduction
Bayesian estimation
Example & exercise
Convergence of series
Chebyshevs Inequality
Convergence concepts
Application of strong convergency: Law of Large Numbers
Application of weak convergency: Central Limit Theorem
Application of convergence in distribution: Normal Approximation to the Binomial
Application of convergence in distribution: Normal Approximation to the Poisson
Summary
Summary

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Estimator I: the method of moments
The method of moments

The Method of Moments


Example of estimator: Method of Moments (MME).
Let X1 , X2 , . . . , Xn be a random sample from the population
with density fX (|) which we will assume has k number of
parameters, say = [1 , 2 , . . . , k ]> .
The method of moments estimator () procedure is:
1. Equate (the first) k sample moments to the corresponding k
population moments;
2. Equate the k population moments to the parameters of the
distribution;
3. Solve the resulting system of simultaneous equations.

The method of moment point estimates (b


) are the estimate
values of the estimator corresponding to the data set.
1006/1074

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Estimator I: the method of moments
The method of moments

The Method of Moments


Denote the sample moments by:
m1 =

n
n
n
1 X 2
1 X k
1 X

xj , m2 =
xj , . . . , mk =
xj ,
n
n
n
j=1

j=1

j=1

and the population moments by:


 
1 (1 , 2 , . . . , k ) = E [X ] , 2 (1 , 2 , . . . , k ) = E X 2 ,
h i
. . . , k (1 , 2 , . . . , k ) = E X k .
The system of equations to solve for (1 , 2 , . . . , k ) is given
by:
mj = j (1 , 2 , . . . , k ) , for j = 1, 2, . . . , k.
1007/1074

b
Solving this provides us the point estimate .

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Estimator I: the method of moments
Example & exercise

Limit theorems & parameter estimators


Parameter estimation
Definition of an estimator
Estimator I: the method of moments
The method of moments
Example & exercise
Estimator II: maximum likelihood estimator
Maximum likelihood estimation
Example & exercise
Sampling distribution and the bootstrap
Estimator III: Bayesian estimator
Introduction
Bayesian estimation
Example & exercise
Convergence of series
Chebyshevs Inequality
Convergence concepts
Application of strong convergency: Law of Large Numbers
Application of weak convergency: Central Limit Theorem
Application of convergence in distribution: Normal Approximation to the Binomial
Application of convergence in distribution: Normal Approximation to the Poisson
Summary
Summary

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Estimator I: the method of moments
Example & exercise

Example: MME & Binomial distribution


Suppose X1 , X2 , . . . , Xn is a random sample from Bin (n, p)
distribution, with known parameter n.
Question: Use the method of moments to find point
estimators of = p.
1. Solution: Equate population moment to sample moment:
n
1 X
E[X ] =
xj = x.
n
j=1

2. Equate population moment to the parameter (use week 2):


E[X ] = n p.
3. Then the method of moments estimator is (i.e., solving it):
x =np
1008/1074

b = x/n.
p

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Estimator I: the method of moments
Example & exercise

Exercise: MME & Normal distribution


Suppose X1 , X2 , . . . , Xn is a random sample from N , 2
distribution.

Question: Use the method of moments to find point


estimators of and 2 .
1. Solution: Equate population moment to sample moment:
n
1 X
=
xj = x
E [X ]
| {z }
n
j=1
population moment |
{z
}

 
E X2
| {z }

population moment

sample moment
n
1 X 2

xj .
n
j=1

{z

sample moment
1009/1074

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Estimator I: the method of moments
Example & exercise

Exercise: MME & Normal distribution


2. Equate population moment to the parameters (use week 2):
E[X ] =

and

E[X 2 ] = Var (X ) + E[X ]2 = 2 + 2 .

3. The method of moments estimators are:

b =E [X ] = x
 

b2 =E X 2 (E [X ])2
n
n
1X
1X 2
n1 2
xj x 2 =
(xj x)2 =
s ,
=
n
n
n
j=1

n
(xj x )
* using s 2 = j=1n1
is the sample variance.
 2
2
Note: E
b 6= (biased estimator), more on this next
week.

1010/1074

j=1

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Estimator II: maximum likelihood estimator
Maximum likelihood estimation

Limit theorems & parameter estimators


Parameter estimation
Definition of an estimator
Estimator I: the method of moments
The method of moments
Example & exercise
Estimator II: maximum likelihood estimator
Maximum likelihood estimation
Example & exercise
Sampling distribution and the bootstrap
Estimator III: Bayesian estimator
Introduction
Bayesian estimation
Example & exercise
Convergence of series
Chebyshevs Inequality
Convergence concepts
Application of strong convergency: Law of Large Numbers
Application of weak convergency: Central Limit Theorem
Application of convergence in distribution: Normal Approximation to the Binomial
Application of convergence in distribution: Normal Approximation to the Poisson
Summary
Summary

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Estimator II: maximum likelihood estimator
Maximum likelihood estimation

Maximum Likelihood function


Another example (mostly used) of an estimator is the
maximum likelihood estimator.
First, we need to define the likelihood function.
If x1 , x2 , . . . , xn are drawn from a population with a parameter
(where could be a vector of parameters), then the
likelihood function is given by:
L (; x1 , x2 , . . . , xn ) = fX1 ,X2 ,...,Xn (x1 , x2 , . . . , xn ) ,
where fX1 ,X2 ,...,Xn (x1 , x2 , . . . , xn ) is the joint probability density
of the random variables X1 , X2 , . . . , Xn .
1011/1074

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Estimator II: maximum likelihood estimator
Maximum likelihood estimation

Maximum Likelihood Estimation


Let L () = L (; x1 , x2 , . . . , xn ) be the likelihood function for
X1 , X2 , . . . , Xn .
The set of parameters b = b (x1 , x2 , . . . , xn ) (note: function of
observed values) that maximizes L () is the maximum
likelihood estimate of .
The random variable b (X1 , X2 , . . . , Xn ) is called the maximum
likelihood estimator.
When X1 , X2 , . . . , Xn is a random sample from fX (x|), then
the likelihood function is (using i.i.d. property):
L (; x1 , x2 , . . . , xn ) =

n
Y
fX (xj |) ,
j=1

1012/1074

which is just the product of the densities evaluated at each of


the observations in the random sample.

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Estimator II: maximum likelihood estimator
Maximum likelihood estimation

Maximum Likelihood Estimation


If the likelihood function contains k parameters so that:
L (1 , 2 , . . . , k ; x) = fX (x1 |) fX (x2 ; ) . . . fX (xn ; ) ,
then (under certain regularity conditions), the point where the
likelihood is a maximum is a solution of the k equations:
L (1 , 2 , . . . , k ; x)
= 0,
1

L (; x)
= 0,
2

...,

L (; x)
= 0.
k

Normally, the solutions to this system of equations give the


global maximum, but to ensure, you should usually check for
the second derivative (or Hessian) conditions and boundary
conditions for a global maximum.
1013/1074

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Estimator II: maximum likelihood estimator
Maximum likelihood estimation

Maximum Likelihood Estimation


Consider the case of estimating two variables, say 1 and 2 .
Define the gradient vector:

L
1

D (L) =

L
2

and define the Hessian matrix:

2L
2
1

H (L) =

2L
1 2
1014/1074

2L
1 2

2
L
22

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Estimator II: maximum likelihood estimator
Maximum likelihood estimation

Maximum Likelihood Estimation


From calculus we know that the maximum choice 1 and 2
should satisfy not only:
D (L) = 0,
but also H should be negative definite which means:

2L
2L
2



1 2
h1

 1

h1 h2
h2 < 0,
2L
2L
1 2
22
for all [h1 , h2 ] 6= 0.
1015/1074

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Estimator II: maximum likelihood estimator
Maximum likelihood estimation

Log-Likelihood function
Generally, maximizing the log-likelihood function is easier.
Not surprisingly, we define the log-likelihood function as:
` (1 , 2 , . . . , k ; x) = log (L (1 , 2 , . . . , k ; x))

n
Y
= log fX (xj |)
j=1

n
X

log (fX (xj |)) .

j=1

* using log(a b) = log(a) + log(b).

1016/1074

Maximizing the log-likelihood function gives the same


parameter estimates as maximizing the likelihood function,
because taking the log is a monotonic increasing function.

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Estimator II: maximum likelihood estimator
Maximum likelihood estimation

MLE procedure
The general procedure to find the ML estimator is:
1. Determine the likelihood function L (1 , 2 , . . . , k ; x);
2. Determine the log-likelihood function
` (1 , 2 , . . . , k ; x) = log (L (1 , 2 , . . . , k ; x));
3. Equate the derivatives of ` (1 , 2 , . . . , k ; x) w.r.t.
1 , 2 , . . . , k to zero ( global/local minimum/maximum).
4. Check wether second derivative is negative (maximum) and
boundary conditions.

1017/1074

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Estimator II: maximum likelihood estimator
Example & exercise

Limit theorems & parameter estimators


Parameter estimation
Definition of an estimator
Estimator I: the method of moments
The method of moments
Example & exercise
Estimator II: maximum likelihood estimator
Maximum likelihood estimation
Example & exercise
Sampling distribution and the bootstrap
Estimator III: Bayesian estimator
Introduction
Bayesian estimation
Example & exercise
Convergence of series
Chebyshevs Inequality
Convergence concepts
Application of strong convergency: Law of Large Numbers
Application of weak convergency: Central Limit Theorem
Application of convergence in distribution: Normal Approximation to the Binomial
Application of convergence in distribution: Normal Approximation to the Poisson
Summary
Summary

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Estimator II: maximum likelihood estimator
Example & exercise

Example: MLE and Poisson


1. Suppose X1 , X2 , . . . , Xn are i.i.d. and Poisson(). The
likelihood function is given by:
 x1   x2 
 xn 
n
Y
e
e
e

...
fX (xj |) =
L (; x) =
x1 !
x2 !
xn !
j=1
 x1


x2
xn
=e n

...
.
x1 ! x2 !
xn !
2. So that taking the log of both sides, we get:
n
n
X
X
xk
log (xk !) .
` (; x) = n + log ()
k=1

k=1

Or, equivalently, using directly the log-likelihood function:


n
n
X
X
` (; x) =
log (fX (xj |)) =
+ xk log () log (xk !) .
1018/1074

j=1

j=1

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Estimator II: maximum likelihood estimator
Example & exercise

Example: MLE and Poisson


Now we need to maximize this log-likelihood function with
respect to the parameter .
3. Taking the first order condition (FOC) with respect to we
have:
n

1X
` () = 0

n +
xk = 0.

k=1

This gives the maximum likelihood estimate (MLE):


n

X
b= 1
xk = x,

n
k=1

which equals the sample mean.

1019/1074

4. Check for second derivative condition to ensure global


maximum.

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Estimator II: maximum likelihood estimator
Example & exercise

Exercise: MLE and Normal



Suppose X1 , X2 , . . . , Xn are i.i.d. and Normal , 2 where
both parameters are unknown.
The p.d.f. is given by:
1
1
exp
fX (x) =
2
2

2 !
.

1. Thus the likelihood function is given by:


n
Y

1
1

L (, ; x) =
exp
2
2
k=1
Question: Find the MLE of and 2 .
1020/1074

xk

2 !
.

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Estimator II: maximum likelihood estimator
Example & exercise

Exercise: MLE and Normal


2. Solution: Its log-likelihood function is:
n
X
log
` (, ; x) =
i=1

1
1

exp
2
2

=n log()

xk

2 !!

n
1 X
(xk )2 .
log(2) 2
2
2
k=1

* using log(1/a)
= log(a1 ) = log(a), with a =

and log(1/ b) = log(b 0.5 ) = 0.5 log(b), with b = 2.


Take the derivative w.r.t. and and set that equal to zero.
1021/1074

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Estimator II: maximum likelihood estimator
Example & exercise

3./4. Then, we obtain:


n

1 X
` (, ; x) = 2
(xk ) = 0

k=1
n
X

xk n = 0

k=1

b=x

Pn
(xk )

n
` (, ; x) =
+ k=1 3
=0

n
P
(xk )
k=1
n=
2
n
X
1

b2 =
(xk x)2 .
n
k=1

1022/1074

See 9.7 and 9.8 of W+(7ed) for further details.

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Estimator II: maximum likelihood estimator
Example & exercise

Example: MME & MLE and Gamma


You may not always obtain closed-form solutions for the
parameter estimates with the maximum likelihood method.
An example of such problem when estimating the parameters
using MLE is the Gamma distribution.
As we will see in the next slides, using MLE yields one
parameter estimate in closed-form solution; not so for the
second parameter.
To find the MLE one should do the following: numerically
estimate the estimates (!) by solving a non-linear equation.
This can be done by employing an iterative numerical
approximation (e.g. Newton-Ralphson).
Application: Surrender mortgages, see Excel.
1023/1074

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Estimator II: maximum likelihood estimator
Example & exercise

Example: MME & MLE and Gamma


In such cases an initial value may be needed so that other
means of estimating first may be used, such as using the
method of moments. Then use it as the starting value.
Question: Consider X1 , X2 , . . . , Xn i.i.d. and Gamma(, )
find the MME of the Gamma distribution.

()

x 1 e x ;
 tX   
MX (t) = E e
= t ;
fX (x) =

E [X r ] =
Var (X ) =

(+r )
r ()

.
2

1. Solution: Equate sample moments to population moments:




(1)
1 = MX (t)

t=0

1024/1074

= E [X ] = x

and



(2)
2 = MX (t)

t=0

n
  X
xi2
= E X2 =
.
n
i=1

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Estimator II: maximum likelihood estimator
Example & exercise

Example: MME & MLE and Gamma


2. Equate population moments to the parameters:





( + 1)
+1
1

1 =
and
2 =
= 1 1 +
.
=

3. Therefore, the method of moments estimates are given by:


2
1

= 1 +

= 1

=
=

1
2 21
21
.
2 21

So that estimators are:


x2
b=x
and

b = 2.

b
using (step 1.) 1 = x and
n 2
n 2
P
P
xi
xi
2 =
2
2 =

b2
2
1
n
n x =
1025/1074

i=1

i=1

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Estimator II: maximum likelihood estimator
Example & exercise

Example: MME & MLE and Gamma


Question: Find the ML-estimates.
1. Solution: Now, X1 , X2 , . . . , Xn are i.i.d. and Gamma(, ) so
likelihood function is:
L (, ; x) =

n
Y
i=1

1
xi1 e xi .
()

2. The log-likelihood function is then:


` (, ; x) = n log ( ()) + n log()
n
n
X
X
+ ( 1)
log(xi )
xi .
i=1
1026/1074

i=1

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Estimator II: maximum likelihood estimator
Example & exercise

Example: MME & MLE and Gamma


3. Maximizing this:
n

()

` (, ; x) = n + n log() +
log(xi ) = 0

()
i=1

n
` (, ; x) =

n
X

xi = 0.

i=1

Easy to solve for second equation:


b
b = n
,

n
P
xi
i=1

but need numerical (iterative) techniques for solving the first


equation.
1027/1074

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Estimator II: maximum likelihood estimator
Example & exercise

Example: MLE and Uniform


1
Suppose X1 , X2 , . . . , Xn are i.i.d. U [0, ], i.e., fX (x) = , for

0 x , and zero otherwise. Here the range of x depends


on the parameter .
The likelihood function can be expressed as:
 n Y
n
1
L (; x) =

I{0xk } ,

k=1

where I{0xk } is an indicator function taking 1 if x [0, ]


and zero otherwise.
Question: How to find the maximum of this Likelihood
function?
1028/1074

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Estimator II: maximum likelihood estimator
Example & exercise

Example: MLE and Uniform


14
12
10

Solution: Non-linearity in the


indicator function cannot use
calculus to maximize this function,
i.e., setting FOC equal to zero.
You can maximize it by looking at its
properties:

L(; x)

8
6

Qn

k=1 I{0xk } can only take value 0


and 1;
Note: it will take the value 0 if
< x(n) and 1 else!

- (1/)n is a decreasing function in ;


4

- Hence, function is maximized for the


lowest
value of for which
Qn
I
k=1 {0xk } = 1 i.e.:

2
0

x(1) x(4) x(3)

1029/1074

x(2)

b = max {x1 , x2 , . . . , xn } = x(n) .

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Estimator II: maximum likelihood estimator
Sampling distribution and the bootstrap

Limit theorems & parameter estimators


Parameter estimation
Definition of an estimator
Estimator I: the method of moments
The method of moments
Example & exercise
Estimator II: maximum likelihood estimator
Maximum likelihood estimation
Example & exercise
Sampling distribution and the bootstrap
Estimator III: Bayesian estimator
Introduction
Bayesian estimation
Example & exercise
Convergence of series
Chebyshevs Inequality
Convergence concepts
Application of strong convergency: Law of Large Numbers
Application of weak convergency: Central Limit Theorem
Application of convergence in distribution: Normal Approximation to the Binomial
Application of convergence in distribution: Normal Approximation to the Poisson
Summary
Summary

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Estimator II: maximum likelihood estimator
Sampling distribution and the bootstrap

Sampling distribution and the bootstrap


We might not only be interested in the point estimate, but in
the whole distribution of the MLE estimate (parameter
uncertainty!);
However, we have no closed solution for MLE estimates. How
to obtain their sampling distribution? Use bootstrapping.
b
Step 1: Generate k samples from Gamma(,
b).
b
Step 2: Estimate ,
b for each of these k samples using MLE.
Step 3: The empirical joint cumulative distribution function of
these k parameter estimates is an approximation to sample
distribution of the MLE estimates.
Quantification of risk: produce histograms of estimates.
1030/1074

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Estimator II: maximum likelihood estimator
Sampling distribution and the bootstrap

Sampling distribution and bootstrap, k = 250, see Excel


Approximation sample distr of
3rd

4th

5th
1

0.8

0.8

0.6

0.6

0.4

0.4

0.2

0.2

0
1
1031/1074

2nd

F()

F()

1st time
1

Approximation sample distr of

0
0.1

0.2

0.3

0.4

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Estimator III: Bayesian estimator
Introduction

Limit theorems & parameter estimators


Parameter estimation
Definition of an estimator
Estimator I: the method of moments
The method of moments
Example & exercise
Estimator II: maximum likelihood estimator
Maximum likelihood estimation
Example & exercise
Sampling distribution and the bootstrap
Estimator III: Bayesian estimator
Introduction
Bayesian estimation
Example & exercise
Convergence of series
Chebyshevs Inequality
Convergence concepts
Application of strong convergency: Law of Large Numbers
Application of weak convergency: Central Limit Theorem
Application of convergence in distribution: Normal Approximation to the Binomial
Application of convergence in distribution: Normal Approximation to the Poisson
Summary
Summary

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Estimator III: Bayesian estimator
Introduction

Introduction
We have seen:
I

Method of moment estimator:


Idea: first k moments of the estimated special distribution and
sample are the same.
Maximum likelihood estimator:
Idea: Probability of sample given a class of distribution is the
highest with this set of parameters.

Warning: Bayesian estimation is hard to understand. Partly


due to non-standard notation in Bayesian estimates.

Pure Bayesian interpretation: Suppose you have, a priori,


prior belief about a distribution;
1032/1074

Then you observe data more information about the


distribution.

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Estimator III: Bayesian estimator
Introduction

Example frequentist interpretation: Let Xi Ber() be


whether individual i lodge a claim at the insurer:
PT
i=1 Xi = Y Bin(T , ) be the number of car accidents;
- The probability of insured having a car accident depends on
adverse selection;
- A new insurer does not know the amount of adverse selection
in his pool;
- Now, let , with Beta(a, b) the distribution of the
risk among individuals (i.e., representing adverse selection);
- Use this for estimating the parameter what is our prior for
?

This is called empirical Bayes.


Similar idea: Bayesian updating, in case of time varying
parameters:

1033/1074

- Prior: Last years estimated claim distribution;


- Data: This years claims;
- Posterior: revised estimated claim distribution.

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Estimator III: Bayesian estimator
Bayesian estimation

Limit theorems & parameter estimators


Parameter estimation
Definition of an estimator
Estimator I: the method of moments
The method of moments
Example & exercise
Estimator II: maximum likelihood estimator
Maximum likelihood estimation
Example & exercise
Sampling distribution and the bootstrap
Estimator III: Bayesian estimator
Introduction
Bayesian estimation
Example & exercise
Convergence of series
Chebyshevs Inequality
Convergence concepts
Application of strong convergency: Law of Large Numbers
Application of weak convergency: Central Limit Theorem
Application of convergence in distribution: Normal Approximation to the Binomial
Application of convergence in distribution: Normal Approximation to the Poisson
Summary
Summary

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Estimator III: Bayesian estimator
Bayesian estimation

Notation for Bayesian estimation


Under this approach, we assume that is a random quantity
with density () called the prior density.
(This is usual notation, rather than f ().)
A sample X = x(= [x1 , x2 , . . . , xT ]> ) is taken from its
population and the prior density is updated using the
information drawn from this sample and applying Bayes rule.
This updated prior is called the posterior density, which is the
conditional density of given the sample X = x is (|x)
(=f|X (|x)).
So were using a conditional r.v., |X , associated with the
multivariate distribution of and the X (look back at lecture
notes for week 3).
1034/1074

Use for example E [(|x)] as the Bayesian estimator.

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Estimator III: Bayesian estimator
Bayesian estimation

Bayesian estimation, theory


b ) on T which is an
First, let us define a loss function L(;
estimator of () with:
b ) 0,
L(;
b ) = 0,
L(;

b
for every ;
when b = .

Interpretation loss function: for reasonable functions we


have:
a loss function has a lower value better estimator.
Examples of the loss function:
b ) = (b )2
- Mean squared error: L(,
b ) = |b |.
- Absolute error: L(,
1035/1074

(mostly used);

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Estimator III: Bayesian estimator
Bayesian estimation

Bayesian estimation, theory


Next, we define a risk function, the expected loss:
h
i Z
b
b
Rb() =Eb L(; ) = L((x);
) fx| (x|)dx.
Note: estimator is a random variable (e.g. T = b = X ,
() = = ) depending on observations.
Interpretation risk function: loss function is a random
variable taking expectation returns a number given .
Note: Rb() is a function of (we only know prior density).
Define Bayes risk under prior as:
Z


b = E R b() =
Rb() ()d.
B ()

1036/1074

Goal: minimize Bayes risk.

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Estimator III: Bayesian estimator
Bayesian estimation

Bayesian estimation, theory


Now, we can introduce the Bayesian estimator, for a given
loss function, bB , for which the following hold:




E RbB () E Rb() ,
b
for any .
Rewriting, * using reversing order of integrals; ** using the
law of iterative expectations (week 3) we have:
n h h
iio
b
bB =argminb E Eb L(|)
n h h
iio

b
=argminb Eb E L(|)
n h
io

b
=argminb Eb L()
.
1037/1074

Interpretation: b is the best estimator with respect to loss


b ).
function L(;

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Estimator III: Bayesian estimator
Bayesian estimation

Bayesian estimation, estimators


Rewriting the Bayes risk we have:
Z
Z Z
b
b
) fx| dx ()d
B () =
Rb() ()d =
L((x),

Z Z

b
=
L((x),
) fx| (|x)dxd

Z Z

b
) (|x)d fx| dx
=
L((x),

|
{z
}
b
r (|x)

Z
=

b fx| dx.
r (|x)

b is equivalent to minimizing r (|x)


b for
Implying: minimizing B ()
all x. R
* using (|x)dx = (), i.e., Law of Total Probability and **
1038/1074changing order of integration.

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Estimator III: Bayesian estimator
Bayesian estimation

Bayesian estimation, estimators


For the squared error loss function (used in *) we have:
n
o
b
b minimizing r (|x)
b for all x r (|x) = 0
min B ()
b
|x
b
Z

b
2 ( (x))
(|x)d = 0

Z
B
b
(x) =
(|x)d

bB (x) = E|x [] .
Interpretation: Bayesian estimator under squared error loss
function is the expectation of the posterior density, i.e.,
bB = E[(|x)]!

1039/1074

One can show that for absolute error loss function:


bB (x) = median((|x)).

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Estimator III: Bayesian estimator
Bayesian estimation

Bayesian estimation, derivation


The posterior density (i.e., f|X (|x)) is derived as:

(|x) =

fX | (x1 , x2 , . . . , xT | ) ()
fX | (x1 , x2 , . . . , xT | ) () d

(1)

fX | (x1 , x2 , . . . , xT | ) ()
fX (x1 , x2 , . . . , xT )

i )Pr(Ai )
, with
* Using Bayes formula: Pr(Ai |B) = PnPr(B|A
j=1 Pr(B|Aj )Pr(Aj )
A1 , . . . , An a complete partition of .
P
** Using LTP: Pr(A) = ni=1 Pr(A|Bi ) Pr(Bi )
(where B1 , . . . , Bn a complete partition of , week 1).

Hence, denominator is is the marginal density of the


X = [x1 , x2 , . . . , xT ]> (=constant given the observations!).
1040/1074

Note: () is a complete partition of the sample space.

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Estimator III: Bayesian estimator
Bayesian estimation

Bayesian estimation, derivation


Notation: is proportional to, i.e., f (x) g (x) f (x) = c g (x).
We have that the posterior is given by:
(|x) fX | (x1 , x2 , . . . , xT | ) () .

(2)

Either use equation (1) (difficult/tidious integral!) or (2).


Equation (2) can be used
R to find the posterior density by:

I. Find c such that c fX | (x1 , x2 , . . . , xT | ) () d = 1.


II. Find a (special) distribution that is proportional to
fX | (x1 , x2 , . . . , xT | ) (). (fastest way, if possible!)

Estimation procedure:

1041/1074

1. Find posterior density using (1) (difficult/tidious integral!) or


(2).
2. Compute the Bayesian estimator (using the posterior) under a
given loss function (under mean squared loss function: take
expectation of the posterior distribution).

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Estimator III: Bayesian estimator
Example & exercise

Limit theorems & parameter estimators


Parameter estimation
Definition of an estimator
Estimator I: the method of moments
The method of moments
Example & exercise
Estimator II: maximum likelihood estimator
Maximum likelihood estimation
Example & exercise
Sampling distribution and the bootstrap
Estimator III: Bayesian estimator
Introduction
Bayesian estimation
Example & exercise
Convergence of series
Chebyshevs Inequality
Convergence concepts
Application of strong convergency: Law of Large Numbers
Application of weak convergency: Central Limit Theorem
Application of convergence in distribution: Normal Approximation to the Binomial
Application of convergence in distribution: Normal Approximation to the Poisson
Summary
Summary

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Estimator III: Bayesian estimator
Example & exercise

Example Bayesian estimation: Bernoulli-Beta


Let X1 , X2 , . . . , XT be i.i.d. Bernoulli(), i.e.,
(Xi | = ) Bernoulli().
Assume the prior density of is Beta(a, b) so that:
() =

(a + b)
a1 (1 )b1 .
(a) (b)

We know that the conditional density (density conditional on


the true value of ) of our data is given by:
fX | (x | ) =x1 (1 )1x1 x2 (1 )1x2 . . . xT (1 )1xT
T
P

1042/1074

j=1

xj

(1 )

T
P

j=1

xj

= s (1 )T s .

This is just the likelihood function.


P
* Simplifying notation, let s = T
j=1 xj .

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Estimator III: Bayesian estimator
Example & exercise

1. Easy method: The posterior density, the density of given


X = x, using (1) is proportional to:
(|x) fX | (x1 , x2 , . . . , xT | ) ()
=

(a + b)
(a+s)1 (1 )(b+T s)1
(a) (b)

(3)

I. Posterior density is also solvable by finding c such that:


Z
(a + b)
c
(a+s)1 (1 )(b+T s)1 d = 1.
(a) (b)
Posterior density is c fX | (x1 , x2 , . . . , xT | ) ().
II. However, we observe (3) is proportional to the p.d.f. of
Beta (a + s, b + T s).
1. Tedious method: To find the posterior density using (2) we
first need to find the marginal density of the X (next slide).
1043/1074

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Estimator III: Bayesian estimator
Example & exercise

The marginal density of the X (* using LTP) is given by:


Z 1

fX (x) =
fX | (x | ) ()d
0
Z 1
(a + b)
(a+s)1 (1 )(b+T s)1 d
=

(a)

(b)
0
(a + b) (a + s) (b + T s)
.
=
(a) (b)
(a + b + T )
** :

R1
0

x 1 (1 x)1 dx = B(, ) =

using (2):

(|x) =
=

1044/1074

()()
(+) ;

Posterior density

fX | (x | ) ()
fX (x)
s (1 )T s

(a+b)
a1 (1
(a)(b)
(a+b) (a+s)(b+T s)
(a)(b)
(a+b+T )

)b1

(a + b + T )
(a+s)1 (1 )(b+T s)1 ,
(a + s) (b + T s)

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Estimator III: Bayesian estimator
Example & exercise

Example Bayesian estimation: Bernoulli-Beta


2. The mean of this r.v. with the above posterior density is then:
bB = E[|X = x] = E [ Beta (a + s, b + T s)] =

a+s
a+b+T

gives the Bayesian estimator of .

We note that we can write the Bayesian estimator as a


weighted average of the prior mean (which is a/(a + b)) and
the sample mean (which is s/T ) as follows:



 

s
T
a+b
a
B
b

.
= E[|X = x] =
a+b+T
T}
a+b+T
a+b
|
{z
|
{z
}
|
{z
} | {z }
weight sample

1045/1074

sample mean

weight prior

prior mean

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Estimator III: Bayesian estimator
Example & exercise

Exercise Normal-Normal

Let X1 , X2 , . . . , XT be i.i.d. Normal , 22 , i.e.,
(Xi | = ) Normal(, 22 ).

Assume the prior density of is Normal m, 12 so that:


1
( m)2
.
() =
exp
2 12
21
Question: Find the Bayesian estimator for .

1046/1074

Solution: We know that the conditional density of our data is


given by the likelihood function:


T
Y
(xj )2
1

fX | (x | ) =
exp
2 22
22
j=1
!
PT
2
(x

)
1
j
j=1
=
exp
2 22
( 22 )T

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Estimator III: Bayesian estimator
Example & exercise

1. Posterior density:
(|x) fX | (x|) () exp

PT

j=1 (xj

)2



( m)2
exp
2 12

2 22
!
PT
2
( m)2
j=1 (xj )
= exp

2 22
2 12
!
PT
2
2
(2 + m2 2 m)
j=1 (xj + 2 xj )
= exp

2 22
2 12
!
P
2
2
22 (2 + m2 2 m) + 12 T
j=1 (xj + 2 xj )
= exp
2 22 12

 2
(22 + T 12 ) 2 (m 22 + T x 12 )

exp
2 22 12




(m 2 +T x 2 )
m22 +T x12 2
2 2 (22+T 2 ) 1

22 +T 12

2
1

= exp
exp

2
2
2
2
2
2
2 2 1 /(2 + T 1 )
2 2 1 /(22 + T 12 )

1047/1074

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Estimator III: Bayesian estimator
Example & exercise


P
m2 + T
j=1 xj
22 2
2 1

*: exp
and **:





m22 +T x12 2
2
2
2
2
2 2 1 /(2 + T 1 )
exp
are
2 +T 2
2

constants given x.
1. Thus |X is Normally distributed with mean
variance
mean:

22 12
22 +T 12
1
12

1
12

T
22

m22 +T x12
22 +T 12

and

. Note that we can rewrite it to:

m+

T
12
1
12


T
22

x,

and variance:

1
T
+ 2
2
1
2

2. The Bayesian estimator under both the mean squared loss


function and absolute error loss function is:
bB

=
1048/1074

1
12
1
12

T
22

m+

T
12
1
12

T
22

x.

1

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Convergence of series
Chebyshevs Inequality

Limit theorems & parameter estimators


Parameter estimation
Definition of an estimator
Estimator I: the method of moments
The method of moments
Example & exercise
Estimator II: maximum likelihood estimator
Maximum likelihood estimation
Example & exercise
Sampling distribution and the bootstrap
Estimator III: Bayesian estimator
Introduction
Bayesian estimation
Example & exercise
Convergence of series
Chebyshevs Inequality
Convergence concepts
Application of strong convergency: Law of Large Numbers
Application of weak convergency: Central Limit Theorem
Application of convergence in distribution: Normal Approximation to the Binomial
Application of convergence in distribution: Normal Approximation to the Poisson
Summary
Summary

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Convergence of series
Chebyshevs Inequality

Chebyshevs Inequality
The Chebyshevs inequality, states that for any random
variable X with mean and variance 2 , the following
probability inequality holds for all  > 0:
2
.
2
Note that this applies to all distributions, hence also
non-symmetric ones! This implies that:
Pr (|X | > )

2
Pr (X < ) .
2
Interesting example: set  = k then:
1
Pr (|X | > k ) 2 .
k
This provides us with an upper bound of the probability that
X deviates more than k standard deviations of its mean.
Pr (X > )

1049/1074

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Convergence of series
Chebyshevs Inequality

Application: Chebyshevs Inequality


The distribution of fire insurance claims does not have a
special distribution.
We do know that the mean claim size in the portfolio is $50
million with a standard deviation of $150 million.
Question: What is an upper bound for the probability that
the claim size is larger than $500 million?
Solution: We have:
Pr (X > k ) Pr (|X | > k )
= Pr (|X 50| > k 150)
1
1
2 = .
k
9
1050/1074

Thus, Pr (X > 500) 1/9.

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Convergence of series
Convergence concepts

Limit theorems & parameter estimators


Parameter estimation
Definition of an estimator
Estimator I: the method of moments
The method of moments
Example & exercise
Estimator II: maximum likelihood estimator
Maximum likelihood estimation
Example & exercise
Sampling distribution and the bootstrap
Estimator III: Bayesian estimator
Introduction
Bayesian estimation
Example & exercise
Convergence of series
Chebyshevs Inequality
Convergence concepts
Application of strong convergency: Law of Large Numbers
Application of weak convergency: Central Limit Theorem
Application of convergence in distribution: Normal Approximation to the Binomial
Application of convergence in distribution: Normal Approximation to the Poisson
Summary
Summary

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Convergence of series
Convergence concepts

Convergence concepts
Suppose X1 , X2 , . . . form a sequence of r.v.s. Example: Xi is
the sample variance using the first i observations.
Xn is said to converge almost surely (a.s.) to the random
variable X as n if and only if:
Pr ( : Xn () X () , as n ) =1,
a.s.

and we write Xn X , as n .
Sometimes called strong convergence. It means that beyond
some point in the sequence (), the difference will always be
less than some positive , but that point is random.
OPTIONAL:
Also expressed as: Pr (|Xn () X ()| > , i.o.) = 0, where
i.o. stands for infinitely often: Pr(An i.o.) = Pr(lim supn An ).
1051/1074

Applications: Law of large numbers, Monte Carlo integration.

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Convergence of series
Convergence concepts

Xn converges in probability to the random variable X as


n if and only if, for every  > 0,
Pr (|Xn X | > ) 0,

as n ,

and we write Xn X , as n .
Difference converges in probability and converges almost
surely: Pr (|Xn X | > ) goes to zero instead of equals zero
p

a.s.

as n goes to infinity (hence is weaker than ).

1052/1074

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Convergence of series
Convergence concepts

Xn converges in distribution to the random variable X as


n if and only if, for every x,
FXn (x) FX (x) ,

as n .

and we write Xn X , as n . Sometimes called weak


convergence.
Convergence of MGFs implies weak convergence.
Applications (see later in lecture):
- Cental Limit Theorem;
- Xn Bin(n, p) and X N(n p, n p (1 p));
- Xn Poi(n ), with n and X N(n , n ).

1053/1074

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Convergence of series
Application of strong convergency: Law of Large Numbers

Limit theorems & parameter estimators


Parameter estimation
Definition of an estimator
Estimator I: the method of moments
The method of moments
Example & exercise
Estimator II: maximum likelihood estimator
Maximum likelihood estimation
Example & exercise
Sampling distribution and the bootstrap
Estimator III: Bayesian estimator
Introduction
Bayesian estimation
Example & exercise
Convergence of series
Chebyshevs Inequality
Convergence concepts
Application of strong convergency: Law of Large Numbers
Application of weak convergency: Central Limit Theorem
Application of convergence in distribution: Normal Approximation to the Binomial
Application of convergence in distribution: Normal Approximation to the Poisson
Summary
Summary

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Convergence of series
Application of strong convergency: Law of Large Numbers

The Law of Large Numbers


Suppose X1 , X2 , . . . , Xn are independent random variables
with common mean E[Xk ] = and common variance
Var (Xk ) = 2 , for k = 1, 2, . . . , n.
Define the sequence of sample means as:
n
1X
Xk .
Xn =
n
k=1

1054/1074

Then, according to the law of large numbers, for any  > 0,


we have:



2
2
lim Pr X n >  = lim 2n = lim
= 0,
n
n n 2
n 
Proof: special case: N(, 2 ): X N(0, 2 /n), thus
when n we have lim 2 /n = 0.
n
General case: When second moment exists, use Chebychevs
inequality with 0.

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Convergence of series
Application of strong convergency: Law of Large Numbers

The law of large numbers (LLN) is sometimes written as:





as n .
Pr X n >  0,
The result above is sometimes called the (weak) law of large
p
numbers and sometimes we write X n , because this is the
same concept as convergence in probability to a constant.
However, there is also what we call the (strong) law of large
numbers which simply states that the sample mean converges
almost surely to :
a.s.

X n ,

as n .

Important result in Probability and Statistics!


Intuitively, the law of large number states that the sample
mean X n converges to the true value .
How accurate the estimate is will depend on:
1055/1074

I) how large the sample size is;

II) the variance 2 .

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Convergence of series
Application of strong convergency: Law of Large Numbers

Application of LLN: Monte Carlo Integration


Suppose we wish to calculate
Z
I (g ) =

g (x) dx,

where elementary techniques of integration will not work.


Using the Monte Carlo method, we generate U [0, 1] variables
say X1 , X2 , . . . , Xn and compute:
n

X
bIn (g ) = 1
g (Xk ) ,
n
k=1

where bIn (g ) denotes the approximation of I (g ), we have:


bIn (g ) a.s.
I (g ), as n .
1056/1074

Prove: next slide.

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Convergence of series
Application of strong convergency: Law of Large Numbers

Proof: Using the law of large numbers, we have


bIn (g ) = 1 Pn g (Xk ) a.s.
E [g (X )] which is:
k=1
n
Z

Z
g (x) 1dx =

E [g (X )] =
0

g (x) dx = I (g ) .
0

Try this in Excel using the integral of the standard normal


density. How good is your approximation for 100 (1,000
10,000 100,000 and 1,000,000) random numbers?
This method is called Monte Carlo integration.

1057/1074

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Convergence of series
Application of strong convergency: Law of Large Numbers

Application of LLN: Pooling of Risks in Insurance


Individuals may be faced with large and unpredictable losses.
Insurance may help reduce the financial consequences of such
losses by pooling individual risks. This is based on the LLN.
If X1 , X2 , . . . , Xn are the amount of losses faced by n different
individuals, but homogeneous enough to have a common
distribution, and if these individuals pool together and each
agrees to pay:
n
1 X
Xn =
Xk .
n
k=1

Then, the LLN tells us that the amount each person will end
up paying becomes more predictable as the size of the group
increases. In effect, this amount will become
closer to , the average loss each individual expects.
1058/1074

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Convergence of series
Application of weak convergency: Central Limit Theorem

Limit theorems & parameter estimators


Parameter estimation
Definition of an estimator
Estimator I: the method of moments
The method of moments
Example & exercise
Estimator II: maximum likelihood estimator
Maximum likelihood estimation
Example & exercise
Sampling distribution and the bootstrap
Estimator III: Bayesian estimator
Introduction
Bayesian estimation
Example & exercise
Convergence of series
Chebyshevs Inequality
Convergence concepts
Application of strong convergency: Law of Large Numbers
Application of weak convergency: Central Limit Theorem
Application of convergence in distribution: Normal Approximation to the Binomial
Application of convergence in distribution: Normal Approximation to the Poisson
Summary
Summary

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Convergence of series
Application of weak convergency: Central Limit Theorem

Central Limit Theorem


Suppose X1 , X2 , . . . , Xn are independent, identically
distributed random variables with finite mean and finite
variance 2 . As before, denote the sample mean by X n .
Then, the central limit theorem states:
Xn d
 N (0, 1) ,

as n .

This holds for all r.v. with finite mean and variance, not only
normal r.v.!
Prove & rewriting CLT: see next slides.
1059/1074

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Convergence of series
Application of weak convergency: Central Limit Theorem

Rewriting Central Limit Theorem


We can write this result as:
lim Pr

Xn
 x

!
= (x) ,

for all x where () denotes the cdf of a standard normal r.v..


Intuitively for large n, the random variable:
Zn =

Xn


is approximately standard normally distributed.

1060/1074

The Central Limit Theorem is


Pusually expressed in terms of
the standardized sums Sn = nk=1 Xk . Then the CLT applies
to the random variable:
Sn n d
Zn =
N (0, 1) ,
as n .
n

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Convergence of series
Application of weak convergency: Central Limit Theorem

Proof of the Central Limit Theorem


Let X1 , X2 , . . . be a sequence of independent
r.v.s with mean
P
and variance 2 and denote Sn = ni=1 Xi . Prove that
Sn n

n
converges to the standard normal distribution.
Zn =

General procedure to prove Xn X :


1. Find the m.g.f. of X : MX (t);
2. Find the m.g.f. of Xn : MXn (t);
3. Take the limit n of m.g.f. of Xn : lim MXn (t) and
n

rewrite it. This should be equal to MX (t).


Note: useful are expansions for log and exp (see F&T page 2)!

1061/1074

1. Proof: Consider the case with = 0 and assuming


the MGF

exists for X , then we have: MZ (t) = exp t 2 /2 .

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Convergence of series
Application of weak convergency: Central Limit Theorem

2. Recall Sn =

n
P

n
P

Xi , the m.g.f. of Zn =

i=1

S
n
n

Xi

n
i=1

is

obtained by:

MZn (t) =Msn


n


n
t

= MXi
n
* using MaX (t) = MX (a t) ** using Sn is the sum of n i.i.d.
random variables Xi , thus MPni=1 Xi (t) = MXn i (t).
Note that we only assumed that:

MXi (t) =f t, 2 ;
E [Xi ] =;
Var (Xi ) = 2 < ,
1062/1074

hence, for any distribution Xi with mean and finite variance!

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Convergence of series
Application of weak convergency: Central Limit Theorem

Note: lim b nc = 0, for b R and c > 0.


n

Recall from week 1: 1) An m.g.f. uniquely defines a


distribution; 2) The m.g.f. is a function of all moments.
Consider Taylor series around zero for any M (t):
i

X
t

M (i) (t)
M (t) =
i! |
{z t=0}
i=0
i th moment


1 2

t M (2) (t)
+ O(t 3 ),
2
t=0
t=0
where O(t 3 ) covers all terms ck t k , with ck R for k 3.


=M (0) + t M (1) (t)

We have M (0) = E[e 0X ] = 1 and because we assumed that E[Xi ] = 0:




(1)
MXi (t)
1063/1074

t=0

=E [Xi ] = 0,

and



(2)
MXi (t)

3. Proof continues on next slide.

t=0

 
2
= E Xi2 = Var (Xi ) + (E [Xi ]) = 2 .

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Convergence of series
Application of weak convergency: Central Limit Theorem

Now we can align the results from the previous two slides:
n
1062
lim MZn (t) = lim MXi t/( n)
n
n
!n
i


X
t/( n)
1063

(i)
= lim
MXi (t)
n
i!
t=0
i=0


2
3 !!n
t
t
1
1063
2

+O
= lim 1 + 0 +
n
2 n
/ n

2
 3/2 !!
1
1
t
2

lim log (MZn (t)) = lim n log 1 +


+O
n
n
2 n
n


 3/2 !

t 2
1
1
t2

=
lim n
+O
= ,
n
2
n
2
n
|
{z
}
 





3/2
2
1/2
=n O ( n1 )
+O ( n1 )
=O ( n1 )
0, if n
  
P (1)i+1 ai
t2
1 3/2
2
* using log(1 + a)= i=1
=
a
+
O(a
),
with
a=
+
O
.
i
n
n
1064/1074

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Convergence of series
Application of weak convergency: Central Limit Theorem

Application CLT: An insurer offers builders risk insurance. It


has yearly 400 contracts and offers the product already 9
years. The sample mean of a claim is $10 million and the
sample standard deviation is $25 million.
Question: What is the probability that in a year the claim
size is larger than $5 billion?
Solution: Using CLT (why is sample s.d.?)
Xn d
 N (0, 1) , as n

n

2 
X n N , / n

n X n N n , n 2

0.9772 = Pr 400 X 400 400 10 million + 2 20 25 million .

Thus, Pr 400 X 400 > $5 billion = 1 0.9772 = 0.0228.
1065/1074

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Convergence of series
Application of convergence in distribution: Normal Approximation to the Binomial

Limit theorems & parameter estimators


Parameter estimation
Definition of an estimator
Estimator I: the method of moments
The method of moments
Example & exercise
Estimator II: maximum likelihood estimator
Maximum likelihood estimation
Example & exercise
Sampling distribution and the bootstrap
Estimator III: Bayesian estimator
Introduction
Bayesian estimation
Example & exercise
Convergence of series
Chebyshevs Inequality
Convergence concepts
Application of strong convergency: Law of Large Numbers
Application of weak convergency: Central Limit Theorem
Application of convergence in distribution: Normal Approximation to the Binomial
Application of convergence in distribution: Normal Approximation to the Poisson
Summary
Summary

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Convergence of series
Application of convergence in distribution: Normal Approximation to the Binomial

Normal Approximation to the Binomial


From week 2 we know: a Binomial random variable is the sum
of Bernoulli random variables. Let Xk Bernoulli (p). Then:
S = X1 + X2 + . . . + Xn
has a Binomial(n, p) distribution.
Applying the Central Limit Theorem, S must be
approximately normal with mean E[S] = n p and variance
Var (S) = n p q, so that approximately for large n we have:
S np
N (0, 1) .

npq
Question: What is the probability that X = 60 if
X Bin(1000, 0.06)? Not in Binomial tables!
1066/1074

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Convergence of series
Application of convergence in distribution: Normal Approximation to the Binomial

In practice, for large n and for p around 0.5 (but in particular


np > 5 and np (1 p) > 5 or n > 30) then can approximate
the binomial probabilities with the Normal distribution.
Use = n p and 2 = n p (1 p).
Continuity correction for binomial: note that Binomial random
variable X takes integer values k = 0, 1, 2, . . . but Normal
probability is continuous so that for value:
Pr (X = k) ,
we require the Normal approximation:
!


k+ 12
k 12
<Z <
Pr

and similarly for probability that Pr (X k).


1067/1074

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Convergence of series
Application of convergence in distribution: Normal Approximation to the Binomial

n = 5, p = 0.1
0.4

p.d.f. N(0.5,0.45)

0.2
0

x
Binomial(30,0.1) p.m.f.
n = 30, p = 0.1

0.2
0.15

p.d.f. N(3,2.7)

0.1
0.05

1068/1074

0
0

10

20
x

30

probability mass function

Binomial(5,0.1) p.m.f.

probability mass function

probability mass function

probability mass function

Normal approximation to Binomial

Binomial(10,0.1) p.m.f.

0.4

n = 10, p = 0.1

0.3

p.d.f. N(1,0.9)

0.2
0.1
0

0.08
0.06

5
10
x
Binomial(200,0.1) p.m.f.
n = 200, p = 0.1
p.d.f. N(20,18)

0.04
0.02
0
0

100
x

200

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Convergence of series
Application of convergence in distribution: Normal Approximation to the Poisson

Limit theorems & parameter estimators


Parameter estimation
Definition of an estimator
Estimator I: the method of moments
The method of moments
Example & exercise
Estimator II: maximum likelihood estimator
Maximum likelihood estimation
Example & exercise
Sampling distribution and the bootstrap
Estimator III: Bayesian estimator
Introduction
Bayesian estimation
Example & exercise
Convergence of series
Chebyshevs Inequality
Convergence concepts
Application of strong convergency: Law of Large Numbers
Application of weak convergency: Central Limit Theorem
Application of convergence in distribution: Normal Approximation to the Binomial
Application of convergence in distribution: Normal Approximation to the Poisson
Summary
Summary

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Convergence of series
Application of convergence in distribution: Normal Approximation to the Poisson

Normal approximation to the Poisson


Approximation of Poisson by Normal for large values of .
Let Xn be a sequence of Poisson random variables with
increasing parameters 1 , 2 , . . . such that n .
We have:
E[Xn ] =n
Var (Xn ) =n
Standardize the random variable (i.e., subtract mean and
divide by standard deviation):
Xn E[Xn ]
Xn n d
Zn = p
=
Z N(0, 1).
n
Var (Xn )
1069/1074

Proof: See next slides.

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Convergence of series
Application of convergence in distribution: Normal Approximation to the Poisson


1. We have the m.g.f. of Z : MZ (t) = exp t 2 /2 .
2. Next, we need to find the m.g.f. of Zn . We know (week 2):

MXn (t) = exp n e t 1 .
Thus, using the calculation rules for m.g.f., we have:

MZn (t) =M X
n n (t) = M Xn (t)
n
n
n
 p

 p 

=exp n t MXn t/ n
 p




= exp n t exp n e t/ n 1
 p


= exp n t + n e t/ n 1

* using MaX +b (t) = exp (b t) MX (a t).


1070/1074

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Convergence of series
Application of convergence in distribution: Normal Approximation to the Poisson

3. Find the limit of the MZn (t) and proof it equals MZ (t):

 p

lim MZn (t) = lim exp n t + n e t/ n 1
n
n
 t

p
lim log (MZn (t)) = lim t n + n e n 1
n
n

2
p
t
1
t

= lim t n + n 1 + +
n
n 2!
n
!

3
1
t

+
+ . . . 1
3!
n


1 2
1
= lim t + O
= t 2 /2
n 2!
n

lim MZn (t) = exp t 2 /2 = MZ (t).
n

1071/1074

* usingexponential expansion: e a =
a = t/ n .

ai
i=1 i! ,

with

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Convergence of series
Application of convergence in distribution: Normal Approximation to the Poisson

probability mass function

Poisson(0.1) p.m.f.

= 0.1

p.d.f. N(0.1,0.1)
0.5

1
2
x
Poisson(10) p.m.f.
= 10

0.1

probability mass function

probability mass function

probability mass function

Normal approximation to Poisson

p.d.f. N(10,10)
0.05

0
0

1072/1074

10

20
x

30

Poisson(1) p.m.f.
=1

0.3
p.d.f. N(1,1)

0.2
0.1
0

4
6
x
Poisson(100) p.m.f.
= 100

0.03
0.02

p.d.f. N(100,100)

0.01
0
0

100
x

200

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Summary
Summary

Limit theorems & parameter estimators


Parameter estimation
Definition of an estimator
Estimator I: the method of moments
The method of moments
Example & exercise
Estimator II: maximum likelihood estimator
Maximum likelihood estimation
Example & exercise
Sampling distribution and the bootstrap
Estimator III: Bayesian estimator
Introduction
Bayesian estimation
Example & exercise
Convergence of series
Chebyshevs Inequality
Convergence concepts
Application of strong convergency: Law of Large Numbers
Application of weak convergency: Central Limit Theorem
Application of convergence in distribution: Normal Approximation to the Binomial
Application of convergence in distribution: Normal Approximation to the Poisson
Summary
Summary

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Summary
Summary

Parameter estimators
Method of moments:
1. Equate (the first) k sample moments to the corresponding k
population moments;
2. Equate the k population moments to the parameters of the
distribution;
3. Solve the resulting system of simultaneous equations.

Maximum likelihood:
1. Determine the likelihood function L (1 , 2 , . . . , k ; x);
2. Determine the log-likelihood function
` (1 , 2 , . . . , k ; x) = log (L (1 , 2 , . . . , k ; x));
3. Equate the derivatives of ` (1 , 2 , . . . , k ; x) w.r.t.
1 , 2 , . . . , k to zero ( global/local minimum/maximum).
4. Check wether second derivative is negative (maximum) and
boundary conditions.

Bayesian:
1073/1074

1. Posterior density using (1) (difficult/tidious integral!) or (2).


2. Compute the Bayesian estimator under a given loss function.

ACTL2002/ACTL5101 Probability and Statistics: Week 5


Summary
Summary

LLN & CLT


Law of large numbers: Let Xi , . . . , Xn be independent
random variables with equal mean E[Xk ] = and variance
Var (Xk ) = 2 for k = 1, . . . , n, then for all  > 0 we have:



Pr X n >  0, as n .
Central limit theorem: Let Xi , . . . , Xn be independent and
identically distributed random variables with mean E[Xk ] =
and variance Var (Xk ) = 2 for k = 1, . . . , n, then:
Xn d
N(0, 1),
/ n
1074/1074

as n .

You might also like