Chapter 1. Estimation Methods

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 24

Chapter 1.

Estimation Methods

7
Properties of an estimator
• Assume an unknown parameter  , for example mean,
variance, relationship between two variables, etc.
• An estimator is a given strategy or method to estimate .
• Using the same method but different samples we may get
different estimates of , hence can be treated as a random
variable with properties such as mean and variance.

Example of a distribution of 𝜃
Normal Distribution

𝐸[𝜃]
Properties of an estimator
• Desired properties of an estimator,
 Unbiasedness: =
 Efficiency: has minimum variance among unbiased estimators.

 Consistency: converges in probability to the true value of the


unknown parameter (unlimited data should reveal the true value)

 
lim P θˆn  θ  δ  1 for all δ  0
n 

𝑛 ∞

𝐸[𝜃] = 𝜃
Estimation methods: Ordinary Least Square (OLS)
Assume a linear relationship between variable and K explanatory variables:
i =1,.....N

In matrix notation:
× × × ×
𝑥 𝑥 𝑥
𝑦 𝑥 𝑥 . 𝑥 𝜀
𝑦 𝑥 𝑥 . 𝑥 𝛽 𝜀
. . . . . .
= + .
. . . . . . .
𝛽
𝑦 𝑥 𝑥 . 𝑥 𝜀

If = 1 for all i then b1 is the intercept term:


𝑦 1 𝑥 . 𝑥 𝜀
𝛽
𝑦 1 𝑥 . 𝑥 𝜀 10
. .
= . . . . + .
. . .
. . . .
𝛽
𝑦 1 𝑥 . 𝑥 𝜀
Estimation methods: Ordinary Least Square (OLS)
Example: with intercept and only one explanatory variable: 𝑦 = 𝛽 + 𝛽 𝑥 +𝜀

yi
𝛽 +𝛽 𝑥
is equation of a line with
𝜀 intercept: 𝛽
𝜀 slope: 𝛽

xi
𝜀 is the distance between
each observation and the line

Find 𝛽 and 𝛽 that minimize the total distance between the line and all the 11
observations.
Estimation methods: Ordinary Least Square (OLS)
Estimation: Find b that minimize the sum of the squared errors

Sum of the squared


× ×
errors:

𝜺𝜺
First order condition: 𝜷  𝟏

12
Estimation methods: Ordinary Least Square (OLS)

𝑣𝑎𝑟 𝜺 ≡
We also need:
𝒗𝒂𝒓 𝜷 = 𝜎 𝑿′𝑿

𝑣𝑎𝑟 𝛽 𝑐𝑜𝑣 𝑎𝑟 𝛽 , 𝛽 . 𝑐𝑜𝑣 𝑎𝑟 𝛽 , 𝛽


Note: 𝒗𝒂𝒓 𝜷 = 𝑐𝑜𝑣 𝑎𝑟 𝛽 , 𝛽 𝑣𝑎𝑟 𝛽 . .
× . . . .
𝑐𝑜𝑣𝑎𝑟 𝛽 , 𝛽 𝑐𝑜𝑣𝑎𝑟 𝛽 , 𝛽 . 𝑣𝑎𝑟 𝛽

13
Estimation methods: Ordinary Least Square (OLS)
Assumptions:
1. Linear relationship
2. E[e] = 0
3. Homoskedasticity and no-autocorrelation
var(e s2 , where is an identity matrix
 efficiency
4. X and e are independent cov(e,xk) = 0 for all k
 unbiasedness
5. Columns in X are linear independent  (X'X)-1 exists
6. Normal distribution: . 14

distribution mean variance


Estimation methods: Ordinary Least Square (OLS)

If assumptions are fulfilled:

Test of hypothesis on parameter values


To test if the true value of the parameter bk is equal to a given value A
against the alternative hypothesis that it is different from A:

H0: bk = A and H1: bk  A

t- statistics: ̂

t-distribution with (T-N)


degree of freedom
where,
15

Normally we test if the parameter is equal to zero, A = 0 .


.

Estimation methods: Generalised Least Square (GLS)

Heteroskedasticity and autocorrelation problems:


, where  I is a N× N pos. def. matrix
𝑣𝑎𝑟 𝜀 𝑐𝑜𝑣 𝜀 , 𝜀 . 𝑐𝑜𝑣 𝜀 , 𝜀
𝑐𝑜𝑣 𝜀 , 𝜀 𝑣𝑎𝑟 𝜀 . 𝑐𝑜𝑣 𝜀 , 𝜀
𝒗𝒂𝒓 𝜺 =
× . . . .
𝑐𝑜𝑣 𝜀 , 𝜀 𝑐𝑜𝑣 𝜀 , 𝜀 . 𝑣𝑎𝑟 𝜀

Element  : i=1 and j=2


  . 
  . 
𝒗𝒂𝒓 𝜺 = 𝜎  = 𝜎 Note the matrix is
. . . .
×
symmetric: ij  ji
  . 

ii  1 for some i  Heteroskedasticity


16
ij  0 for some i and j  Autocorrelation in time-series data or cross-
sectional correlation in cross-sectional data.
.

Estimation methods: Generalised Least Square (GLS)

Heteroskedasticity and autocorrelation problems:


Some examples:
1 0 . 0 ii  1  Homoskedastic
0 1 . 0
𝑣𝑎𝑟 𝜺 = 𝜎
× . . . . ij  0  no autocorrelation
0 0 . 1

 0 . 0  ii  1  Heteroskedastic
0  . 0
𝑣𝑎𝑟 𝜺 = 𝜎
× . . . .  ij  0  no autocorrelation
0 0 . 

  0 . 0 ii  1  Heterosk.
   . 0
𝑣𝑎𝑟 𝜺 = 𝜎 0   . . ij  0 for 𝑗 = 𝑖 ∓ 1 17
× . . . .  ,
0 0 .    1st order autocorrelation
,
.

Estimation methods: Generalised Least Square (GLS)


OLS estimation is not efficient in case of heteroskedasticity and
autocorrelation/cross-sectional correlation

Estimation method: GLS

𝟏
 𝟏  𝟏

18
Some definitions

• Probability distribution for a discrete random variable X:

fX(x) = prob (X = x), fX(x) is prob. mass function and x is an outcome.

0  f X ( x)  1 f
i
X ( xi )  1 i=1,…..,N

Discrete distribution Discrete distribution


(Uniform) (Higher probability for values closer to mean)
30% 30%

20% 20%

Probability
Probability

10% 10%

0% 0%
1 2 3 4 5 6 1 2 3 4 5 6
Outcomes Outcomes 19
Some definitions
Continuous distribution
Discrete distribution
0.4%
3.5%
30%
0.3%
3.0%

Probability
Probability
0.3%
2.5%
20% Density function fX(x)
0.2%
2.0%
which gives the
0.2%
1.5%
10%
0.1%
1.0%
relative likelihood for
0.1%
0.5%
X to have a given
0.0%
0%
value.
a b
1.00
1.18
1.36
1.54
1.72
1.90
2.08
2.26
2.44
2.62

2.98
3.16
3.34
3.52
3.70
3.88
4.06
4.24
4.42

4.78
4.96
5.14
5.32
5.50
5.68
5.86
1.00
1.20
1.40
1.60
1.80
2.00
2.20
2.40
2.60
2.80
3.00
3.20
3.40
3.60
3.80
4.00
4.20
4.40
4.60
4.80
5.00
5.20
5.40
5.60
5.80
6.00
1 2 3 4 5 6
Outcomes

• Probability distribution for a continuous random variable, i.e. infinite


number of possible outcomes:
b
proba  x  b    f X ( x)dx
a 20
•Cumulative distribution (or the distribution function):

Some definitions
Continuous distribution
0.4%

0.3%

0.3%
Probability
0.2%

0.2%

0.1%

0.1%

0.0%
1.00
1.18
1.36
1.54
1.72
1.90
2.08
2.26
2.44
2.62
2.80
2.98
3.16
3.34
3.52
3.70
3.88
4.06
4.24
4.42
4.60
4.78
4.96
5.14
5.32
5.50
5.68
5.86
Outcomes

• Probability distribution for a continuous random variable, i.e. infinite


number of possible outcomes:
b 
proba  x  b    f X ( x)dx f X ( x)dx  1
a  21
Some definitions
Continuous distribution
0.4%

0.3%

0.3%
Probability
0.2%

0.2%

0.1%

0.1%

0.0%

x
1.00
1.18
1.36
1.54
1.72
1.90
2.08
2.26
2.44
2.62
2.80
2.98
3.16
3.34
3.52
3.70
3.88
4.06
4.24
4.42
4.60
4.78
4.96
5.14
5.32
5.50
5.68
5.86
Outcomes

• Probability distribution for a continuous random variable, i.e. infinite


number of possible outcomes:
b 
proba  x  b    f X ( x)dx f X ( x)dx  1
a  22
x

Cumulative distribution function: FX ( x)   f (t )dt



,

Some definitions

Assume two discrete variables X and Y:

• Joint distribution: f X ,Y ( x, y)  prob X  x, Y  y 

• Marginal distribution: f X ( x)  prob X  x 

• Conditional distribution: f X ( x | y )  prob X  x | Y  y 

Relationship: fX,Y(x,y) = fY(y|x).fX(x) = fX(x|y).fY(y)

If X and Y are independent: fY(y|x) = fY(y) and fX (x|y)= fX(x) 23

fX,Y (x,y) = fX(x).fY(y)


,

Example: Dependent random variables


Assume a box with 3 black (b) and 2 white (w) balls.
We want to have two random draws (X=1st and Y=2nd draw) without
replacement from the box:

1st draw: X 2nd draw: Y

b with f X (b) =3/5 b with f Y (b| b) =2/4

w with fX (w)= 2/5 b with f Y (b| w) =3/4

f XY (b, b) = f Y (b| b) f X (b) =2/4×3/5=3/10

f XY (w, b) = f Y (b| w) f X (w) =3/4×2/5=3/10


24
,

Example: Independent random variables


Assume a box with 3 black (b) and 2 white (w) balls.
We want to have two random draws (X=1st and Y=2nd draw) with replacement
from the box:

1st draw: X 2nd draw: Y

b with f X (b) =3/5 b with f Y (b| b) =3/5

w with fX (w)= 2/5 b with f Y (b| w) =3/5

 f Y (b| b) = f Y (b| w)= f Y (b)

f XY (b, b) = f Y (b) f X (b) =3/5×3/5=9/25 25

f XY (w, b) = f Y (b) f X (w) =3/5×2/5=6/25


Estimation methods: Maximum Likelihood (ML)

Requirements for ML: Number of observations is large and their


distribution is known. Observations are independent (as random
draw with replacement) and have the same probability distribution.

Likelihood function of the parameter  given the observed data:


L( |x1,…,xN) f(x1, xN|) = f(x1|) f(x2|) ….f(xN|)

 L( ) =  f x |  
i i

Estimation: Find the value of  that maximizes the likelihood


function (the most probable value for  given the observations): 26
Max L( )  ˆ or for simplicity max ln L( )  ˆ
Estimation methods: Maximum Likelihood (ML)
Example, linear regression: Y = Xb + e
For each observation i: yi = b ' xi + ei

xi: K  1 vector b: K  1 vector and i = 1,....N

is a random variable and assume

To estimate the unknown parameters of the model and we need to


maximize the likelihood function for the random variable

What is the likelihood function for ? 27


Estimation methods: Maximum Likelihood (ML)
Proof is not required
Log likelihood function for ei assuming normal distribution:

Density function of normal dist.: 𝑓 𝜀 |𝜇, 𝜎 = 2𝜋𝜎 𝑒

Normal Distribution
𝐿 𝜇, 𝜎 = 𝑓 𝜀 |𝜇, 𝜎

= 2𝜋𝜎 𝑒

𝜇
= 2𝜋𝜎 𝑒

𝑁 𝜀 −𝜇
Taking log: 𝑙𝑛 𝐿 𝜇, 𝜎 =− 𝑙𝑛 2𝜋𝜎 −
2 2𝜎 28
Since 𝜇 = 0 and 𝑁 𝜀
𝜀 = 𝑦 − 𝜷′𝒙  𝑙𝑛 𝐿 𝜷, 𝜎 =− 𝑙𝑛 2𝜋𝜎 −
2 2𝜎
Estimation methods: Maximum Likelihood (ML)

Log likelihood function for ei assuming normal distribution 𝜀

In matrix form:

Estimation: Find the value of b and s 2 that maximize


,

29
Estimation methods: Maximum Likelihood (ML)

Distribution of the estimated parameters Not required

 1 1 
ˆ ~ N  , Ι  where I is the information matrix.
 N 

Two methods to obtain I:

1 N
 2 ln Li
i) based on the second derivatives: I2D 
N

i 1   

ii) based on the outer product of the first derivatives



1 N
  ln Li   ln Li 
I OP 
N
 
i 1   



  30

We usually use numerical derivatives.

You might also like