Chapter 1. Estimation Methods

Chapter 1.
Estimation Methods
7
Properties of an estimator
• Assume an unknown parameter  , for example mean,
variance, relationship between two variables, etc.
• An estimator is a given strategy or method to estimate .
• Using the same method but different samples we may get
different estimates of , hence can be treated as a random
variable with properties such as mean and variance.
Example of a distribution of 𝜃
Normal Distribution
𝐸[𝜃]
Properties of an estimator
• Desired properties of an estimator,
 Unbiasedness: =
 Efficiency: has minimum variance among unbiased estimators.
 Consistency: converges in probability to the true value of the

unknown parameter (unlimited data should reveal the true value)
 
lim P θˆn  θ  δ  1 for all δ  0
n 
𝑛 ∞
𝐸[𝜃] = 𝜃
Estimation methods: Ordinary Least Square (OLS)
Assume a linear relationship between variable and K explanatory variables:
i =1,.....N
In matrix notation:
× × × ×
𝑥 𝑥 𝑥
𝑦 𝑥 𝑥 . 𝑥 𝜀
𝑦 𝑥 𝑥 . 𝑥 𝛽 𝜀
. . . . . .
= + .
. . . . . . .
𝛽
𝑦 𝑥 𝑥 . 𝑥 𝜀
If = 1 for all i then b1 is the intercept term:

𝑦 1 𝑥 . 𝑥 𝜀
𝛽
𝑦 1 𝑥 . 𝑥 𝜀 10
. .
= . . . . + .
. . .
. . . .
𝛽
𝑦 1 𝑥 . 𝑥 𝜀
Example: with intercept and only one explanatory variable: 𝑦 = 𝛽 + 𝛽 𝑥 +𝜀
yi
𝛽 +𝛽 𝑥
is equation of a line with
𝜀 intercept: 𝛽
𝜀 slope: 𝛽
xi
𝜀 is the distance between
each observation and the line
Find 𝛽 and 𝛽 that minimize the total distance between the line and all the 11
observations.
Estimation: Find b that minimize the sum of the squared errors
Sum of the squared

× ×
errors:
𝜺𝜺
First order condition: 𝜷  𝟏
12
𝑣𝑎𝑟 𝜺 ≡
We also need:
𝒗𝒂𝒓 𝜷 = 𝜎 𝑿′𝑿
𝑣𝑎𝑟 𝛽 𝑐𝑜𝑣 𝑎𝑟 𝛽 , 𝛽 . 𝑐𝑜𝑣 𝑎𝑟 𝛽 , 𝛽

Note: 𝒗𝒂𝒓 𝜷 = 𝑐𝑜𝑣 𝑎𝑟 𝛽 , 𝛽 𝑣𝑎𝑟 𝛽 . .
× . . . .
𝑐𝑜𝑣𝑎𝑟 𝛽 , 𝛽 𝑐𝑜𝑣𝑎𝑟 𝛽 , 𝛽 . 𝑣𝑎𝑟 𝛽
13
Assumptions:
1. Linear relationship
2. E[e] = 0
3. Homoskedasticity and no-autocorrelation
var(e s2 , where is an identity matrix
 efficiency
4. X and e are independent cov(e,xk) = 0 for all k
 unbiasedness
5. Columns in X are linear independent  (X'X)-1 exists
6. Normal distribution: . 14
distribution mean variance

If assumptions are fulfilled:
Test of hypothesis on parameter values

To test if the true value of the parameter bk is equal to a given value A
against the alternative hypothesis that it is different from A:
H0: bk = A and H1: bk  A
t- statistics: ̂
t-distribution with (T-N)

degree of freedom
where,
15
Normally we test if the parameter is equal to zero, A = 0 .

.
Estimation methods: Generalised Least Square (GLS)
Heteroskedasticity and autocorrelation problems:

, where  I is a N× N pos. def. matrix
𝑣𝑎𝑟 𝜀 𝑐𝑜𝑣 𝜀 , 𝜀 . 𝑐𝑜𝑣 𝜀 , 𝜀
𝑐𝑜𝑣 𝜀 , 𝜀 𝑣𝑎𝑟 𝜀 . 𝑐𝑜𝑣 𝜀 , 𝜀
𝒗𝒂𝒓 𝜺 =
× . . . .
𝑐𝑜𝑣 𝜀 , 𝜀 𝑐𝑜𝑣 𝜀 , 𝜀 . 𝑣𝑎𝑟 𝜀
Element  : i=1 and j=2

  . 
  . 
𝒗𝒂𝒓 𝜺 = 𝜎  = 𝜎 Note the matrix is
. . . .
×
symmetric: ij  ji
  . 
ii  1 for some i  Heteroskedasticity

16
ij  0 for some i and j  Autocorrelation in time-series data or cross-
sectional correlation in cross-sectional data.
.
Heteroskedasticity and autocorrelation problems:

Some examples:
1 0 . 0 ii  1  Homoskedastic
0 1 . 0
𝑣𝑎𝑟 𝜺 = 𝜎
× . . . . ij  0  no autocorrelation
0 0 . 1
 0 . 0  ii  1  Heteroskedastic
0  . 0
𝑣𝑎𝑟 𝜺 = 𝜎
× . . . .  ij  0  no autocorrelation
0 0 . 
  0 . 0 ii  1  Heterosk.
   . 0
𝑣𝑎𝑟 𝜺 = 𝜎 0   . . ij  0 for 𝑗 = 𝑖 ∓ 1 17
× . . . .  ,
0 0 .    1st order autocorrelation
,
.

OLS estimation is not efficient in case of heteroskedasticity and
autocorrelation/cross-sectional correlation
Estimation method: GLS
𝟏
 𝟏  𝟏
18
Some definitions
• Probability distribution for a discrete random variable X:
fX(x) = prob (X = x), fX(x) is prob. mass function and x is an outcome.
0  f X ( x)  1 f
i
X ( xi )  1 i=1,…..,N
Discrete distribution Discrete distribution

(Uniform) (Higher probability for values closer to mean)
30% 30%
20% 20%
Probability
Probability
10% 10%
0% 0%
1 2 3 4 5 6 1 2 3 4 5 6
Outcomes Outcomes 19
Some definitions
Continuous distribution
Discrete distribution
0.4%
3.5%
30%
0.3%
3.0%
Probability
Probability
0.3%
2.5%
20% Density function fX(x)
0.2%
2.0%
which gives the
0.2%
1.5%
10%
0.1%
1.0%
relative likelihood for
0.1%
0.5%
X to have a given
0.0%
0%
value.
a b
1.00
1.18
1.36
1.54
1.72
1.90
2.08
2.26
2.44
2.62
2.98
3.16
3.34
3.52
3.70
3.88
4.06
4.24
4.42
4.78
4.96
5.14
5.32
5.50
5.68
5.86
1.00
1.20
1.40
1.60
1.80
2.00
2.20
2.40
2.60
2.80
3.00
3.20
3.40
3.60
3.80
4.00
4.20
4.40
4.60
4.80
5.00
5.20
5.40
5.60
5.80
6.00
1 2 3 4 5 6
Outcomes
• Probability distribution for a continuous random variable, i.e. infinite

number of possible outcomes:
b
proba  x  b    f X ( x)dx
a 20
•Cumulative distribution (or the distribution function):
Some definitions
0.4%
0.3%
0.3%
Probability
0.2%
0.2%
0.1%
0.1%
0.0%
1.00
1.18
1.36
1.54
1.72
1.90
2.08
2.26
2.44
2.62
2.80
2.98
3.16
3.34
3.52
3.70
3.88
4.06
4.24
4.42
4.60
4.78
4.96
5.14
5.32
5.50
5.68
5.86
Outcomes

b 
proba  x  b    f X ( x)dx f X ( x)dx  1
a  21
Some definitions
0.4%
0.3%
0.3%
Probability
0.2%
0.2%
0.1%
0.1%
0.0%
x
1.00
1.18
1.36
1.54
1.72
1.90
2.08
2.26
2.44
2.62
2.80
2.98
3.16
3.34
3.52
3.70
3.88
4.06
4.24
4.42
4.60
4.78
4.96
5.14
5.32
5.50
5.68
5.86
Outcomes

b 
proba  x  b    f X ( x)dx f X ( x)dx  1
a  22
x
Cumulative distribution function: FX ( x)   f (t )dt


,
Some definitions
Assume two discrete variables X and Y:
• Joint distribution: f X ,Y ( x, y)  prob X  x, Y  y 
• Marginal distribution: f X ( x)  prob X  x 
• Conditional distribution: f X ( x | y )  prob X  x | Y  y 
Relationship: fX,Y(x,y) = fY(y|x).fX(x) = fX(x|y).fY(y)
If X and Y are independent: fY(y|x) = fY(y) and fX (x|y)= fX(x) 23
fX,Y (x,y) = fX(x).fY(y)

,
Example: Dependent random variables

Assume a box with 3 black (b) and 2 white (w) balls.
We want to have two random draws (X=1st and Y=2nd draw) without
replacement from the box:
1st draw: X 2nd draw: Y
b with f X (b) =3/5 b with f Y (b| b) =2/4
w with fX (w)= 2/5 b with f Y (b| w) =3/4
f XY (b, b) = f Y (b| b) f X (b) =2/4×3/5=3/10
f XY (w, b) = f Y (b| w) f X (w) =3/4×2/5=3/10

24
,
Example: Independent random variables

Assume a box with 3 black (b) and 2 white (w) balls.
We want to have two random draws (X=1st and Y=2nd draw) with replacement
from the box:
1st draw: X 2nd draw: Y
b with f X (b) =3/5 b with f Y (b| b) =3/5
w with fX (w)= 2/5 b with f Y (b| w) =3/5
 f Y (b| b) = f Y (b| w)= f Y (b)
f XY (b, b) = f Y (b) f X (b) =3/5×3/5=9/25 25
f XY (w, b) = f Y (b) f X (w) =3/5×2/5=6/25

Estimation methods: Maximum Likelihood (ML)
Requirements for ML: Number of observations is large and their

distribution is known. Observations are independent (as random
draw with replacement) and have the same probability distribution.
Likelihood function of the parameter  given the observed data:

L( |x1,…,xN) f(x1, xN|) = f(x1|) f(x2|) ….f(xN|)
 L( ) =  f x |  
i i
Estimation: Find the value of  that maximizes the likelihood

function (the most probable value for  given the observations): 26
Max L( )  ˆ or for simplicity max ln L( )  ˆ
Example, linear regression: Y = Xb + e
For each observation i: yi = b ' xi + ei
xi: K  1 vector b: K  1 vector and i = 1,....N
is a random variable and assume
To estimate the unknown parameters of the model and we need to

maximize the likelihood function for the random variable
What is the likelihood function for ? 27

Proof is not required
Log likelihood function for ei assuming normal distribution:
Density function of normal dist.: 𝑓 𝜀 |𝜇, 𝜎 = 2𝜋𝜎 𝑒
Normal Distribution
𝐿 𝜇, 𝜎 = 𝑓 𝜀 |𝜇, 𝜎
= 2𝜋𝜎 𝑒
𝜇
= 2𝜋𝜎 𝑒
𝑁 𝜀 −𝜇
Taking log: 𝑙𝑛 𝐿 𝜇, 𝜎 =− 𝑙𝑛 2𝜋𝜎 −
2 2𝜎 28
Since 𝜇 = 0 and 𝑁 𝜀
𝜀 = 𝑦 − 𝜷′𝒙  𝑙𝑛 𝐿 𝜷, 𝜎 =− 𝑙𝑛 2𝜋𝜎 −
2 2𝜎
Log likelihood function for ei assuming normal distribution 𝜀
In matrix form:
Estimation: Find the value of b and s 2 that maximize

,
29
Distribution of the estimated parameters Not required
 1 1 
ˆ ~ N  , Ι  where I is the information matrix.
 N 
Two methods to obtain I:
1 N
 2 ln Li
i) based on the second derivatives: I2D 
N

i 1   
ii) based on the outer product of the first derivatives


1 N
  ln Li   ln Li 
I OP 
N
 
i 1   



  30
We usually use numerical derivatives.

Chapter 1. Estimation Methods

Uploaded by

Copyright:

Available Formats

You might also like

Chapter 1. Estimation Methods

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 1. Estimation Methods

Uploaded by

Copyright:

Available Formats

Chapter 1.

 Consistency: converges in probability to the true value of the

If = 1 for all i then b1 is the intercept term:

Sum of the squared

𝑣𝑎𝑟 𝛽 𝑐𝑜𝑣 𝑎𝑟 𝛽 , 𝛽 . 𝑐𝑜𝑣 𝑎𝑟 𝛽 , 𝛽

distribution mean variance

If assumptions are fulfilled:

Test of hypothesis on parameter values

H0: bk = A and H1: bk  A

t-distribution with (T-N)

Normally we test if the parameter is equal to zero, A = 0 .

Estimation methods: Generalised Least Square (GLS)

Heteroskedasticity and autocorrelation problems:

Element  : i=1 and j=2

ii  1 for some i  Heteroskedasticity

Estimation methods: Generalised Least Square (GLS)

Heteroskedasticity and autocorrelation problems:

Estimation methods: Generalised Least Square (GLS)

Estimation method: GLS

• Probability distribution for a discrete random variable X:

fX(x) = prob (X = x), fX(x) is prob. mass function and x is an outcome.

Discrete distribution Discrete distribution

• Probability distribution for a continuous random variable, i.e. infinite

• Probability distribution for a continuous random variable, i.e. infinite

• Probability distribution for a continuous random variable, i.e. infinite

Cumulative distribution function: FX ( x)   f (t )dt

Assume two discrete variables X and Y:

• Joint distribution: f X ,Y ( x, y)  prob X  x, Y  y 

• Marginal distribution: f X ( x)  prob X  x 

• Conditional distribution: f X ( x | y )  prob X  x | Y  y 

Relationship: fX,Y(x,y) = fY(y|x).fX(x) = fX(x|y).fY(y)

If X and Y are independent: fY(y|x) = fY(y) and fX (x|y)= fX(x) 23

fX,Y (x,y) = fX(x).fY(y)

Example: Dependent random variables

1st draw: X 2nd draw: Y

b with f X (b) =3/5 b with f Y (b| b) =2/4

w with fX (w)= 2/5 b with f Y (b| w) =3/4

f XY (b, b) = f Y (b| b) f X (b) =2/4×3/5=3/10

f XY (w, b) = f Y (b| w) f X (w) =3/4×2/5=3/10

Example: Independent random variables

1st draw: X 2nd draw: Y

b with f X (b) =3/5 b with f Y (b| b) =3/5

w with fX (w)= 2/5 b with f Y (b| w) =3/5

 f Y (b| b) = f Y (b| w)= f Y (b)

f XY (b, b) = f Y (b) f X (b) =3/5×3/5=9/25 25

f XY (w, b) = f Y (b) f X (w) =3/5×2/5=6/25

Requirements for ML: Number of observations is large and their

Likelihood function of the parameter  given the observed data:

Estimation: Find the value of  that maximizes the likelihood

xi: K  1 vector b: K  1 vector and i = 1,....N

is a random variable and assume

To estimate the unknown parameters of the model and we need to

What is the likelihood function for ? 27

Density function of normal dist.: 𝑓 𝜀 |𝜇, 𝜎 = 2𝜋𝜎 𝑒

Log likelihood function for ei assuming normal distribution 𝜀

Estimation: Find the value of b and s 2 that maximize

Distribution of the estimated parameters Not required

Two methods to obtain I: