Estimation

Chapter 2: Statistics
Part I - Estimation
Simon Fraser University

ECON 333
Summer 2023
Chapter 1 Estimation
Disclaimer
I do not allow this content to be published without my consent.

All rights reserved ©2023 Thomas Vigié
Probability vs Statistics
The theory of probability provides us with tools to describe events that are
random (e.g. throwing a die, flipping a coin, tomorrow’s temperature, patterns
of a boss in a video game)
Statistics consists in learning about these random events using data
Data are often, if not always, obtained as a sample taken from a population
of interest
A sample of the population of all Canadian adults
A sample of the population of all SFU students
A sample of the population of all firms
We need other tools to understand what data can tell us
Outline
Population vs samples
Estimators
Desirable properties
Sampling distribution
Moments of a sample proportion
Moments of a sample mean
Asymptotic results
Suggested reading: Chapter 3 in Stock and Watson
Population values vs sample values
The quantities previously studied are population quantities: They correspond

to the entire groups of units of interest
In practice, we only observe a sample of that population
So probabilities, averages etc derived from a sample are subject to
change: Add one observation, or give me another sample and the new
probabilities and averages will not be the same
So sample quantities are themselves random!! We will talk about sampling
distributions
Hence: E[X], V[X] and P[X] are not random, but X̄, p̂, β̂, σ̂ 2 are
Sample values: Example
Imagine you work in a factory that produces cheese

The population consists of all the cheese produced that day
One might wonder how salty a particular type of cheese is on average (each unit
of that cheese will not have the exact same salt content)
But we are not going to measure the salt level in each single unit cheese: We
look at a random sample from different batches
And we use the measurements on that sample to either estimate the true
average level of salt or infer something about it: the former is estimation, the
latter is statistical inference
Random sampling
The way a sample is drawn from a population will affect the properties of an
estimator
If observations are correlated and we don’t know it, we might fail to pick up
that correlation and our estimates might be biased or inconsistent
We also want our sample to be representative of the population we are
interested in
If we collect a sample by calling people at their house between 10 am and 3 pm,
our sample won’t be representative as it misses all the people who are not at
home at those times. The sample will be biased towards people who work from
home or don’t work
The easiest way is to draw randomly: Each member of the population has an
equal chance of being selected
Estimators
Estimators
An estimator is a rule to compute an estimate of a given quantity given some

data
It produces no more than a guess, educated or not
If we had access to the population data, we would use it to find, say, E[X]
Since we only have access to a sample of it, we are going to use an estimator
to compute an estimate of E[X]
Ideally, the bigger the sample, the closer we should get to the truth since the
sample gets closer to the population data
Estimators as random objects
Estimates are sample values, so an estimator is random (remove one

observation in a sample and the estimate will change, let alone changing the
whole sample), and hence has an expectation, a variance, a distribution, etc
Computing estimates over different samples should tell us something reliable on
average
Example: We want to know the proportion of smokers in Canada. But we don’t
have access to the whole population data. We can get a sample, but how do we
make a good guess about the real proportion of smokers?
Different estimators can be used
We need theoretical results to justify the use of one estimator over the other
Desirable properties
Desirable properties of estimators
Definition: Consistency
An estimator θ̂ of a nonrandom quantity θ is consistent (or asymptotically
unbiased) if it converges in probability towards the value it estimates:
P
θ̂ → θ

P
where → denotes convergence in probability, i.e. ∀ε > 0, P |θ̂ − θ| > ε → 0 as
n→∞
Definition: Unbiasedness
An estimator θ̂ of a nonrandom quantity θ is unbiased if on average, it equals the
true value of the quantity of interest:
E[ θ̂ ] = θ
Desirable properties of estimators (cont’d)
Definition: Asymptotic normality

An estimator θ̂ is said to be asymptotically normal if, as the sample size n → ∞:
√ d
n(θ̂) → N (θ, Ω)
where θ and Ω are some nonrandom quantities (expectation and variance

respectively)
Desirable properties of estimators (cont’d)
Many (most) estimators are biased. For some, we can have an idea of the bias
direction
But being consistent makes them reliable given the sample used is big enough.
If an estimator is consistent, the bigger the sample, the more accurate the
estimator, and the closer the estimate is to the true value
Asymptotic normality allows to make inference about the true value of the
parameter, i.e. to draw probabilistic conclusions about the true parameter, such
as (not) rejecting hypotheses about the true value or building confidence
intervals around the true value
Sampling distributions
Sampling distributions of estimators
Since estimators are random, they have a distribution: The sampling

distribution
Some classic estimators have well documented expectation and variance
But for a fixed sample size n, sampling distributions (the pmf/pdf and cdf) can
be complicated and depend on the distribution of the random variable we take
the average of (which we don’t often know)
For a large sample size n, the sampling distribution can be approximated by
the asymptotic distribution (see asymptotic results)
Sampling distribution of estimators (cont’d)
Definition: i.i.d. random variables

A collection of random variables are said to be Independent and Identically
Distributed (i.i.d.) if they are independent and follow the same distribution.
Assumption: i.i.d. sample
An Independent and Identically Distributed (aka i.i.d.) sample is available:

{yi }, i = 1, ..., n
This assumption allows us to use classical statistical theorems with ease

Is it realistic? Random sampling can meet that assumption
Note: For this assumption, we use lower case letters are they are observations of
the random variables Yi . But you often see Yi being used to talk about sample
values
Sampling distribution of a proportion
Consider a population of people. Some are smokers, others aren’t. Let p be the
true proportion of smokers
We would like to know p, but we can’t observe everyone. . .
Consider a i.i.d. sample taken from that population
We would like to estimate p. Any guess?
A sensible estimator is the sample proportion p̂
Why is it sensible?
Moments of a sample proportion: Expectation
Let Xi be the random variable equal to 1 if observation i is a smoker, and 0
otherwise. Since there is a proportion p of smokers in the population,
E[Xi ] = p and V[Xi ] = p(1 − p) (Xi is a Bernoulli r.v.!)
Then the sample proportion is p̂ = n1 n
P
i=1 Xi
n
" #
1X
E[p̂] = E Xi
n i=1
n
1 X
= E[Xi ] (by linearity of the expectation)
n i=1
n
1X
= p (by the i.i.d. assumption, “identically distributed” part)
n i=1
=p
Moments of a sample proportion: Variance
n
" #
1X
V[p̂] = V Xi
n i=1
n
1 X
= V[Xi ] (by the i.i.d. assumption, “independently distributed” part)
n2 i=1
1
= × nV[Xi ] (by the i.i.d. assumption, “identically distributed” part)
n2
1
= × n × p(1 − p)
n2
p(1 − p)
=
n
Moments of a sample average
Consider the cheese factory from before

Consider a sample composed of randomly selected pounds of cheese of the
Ossau-Iraty type (try it, it is delicious!)
We would like to estimate µ, the (population) average level of salt of a given
pound of cheese i, denoted Xi
The variance of Xi is denoted σ 2
A sensible estimator is the sample mean X̄ = n1 n
P
i=1 Xi
Why is it sensible?
Moments of a sample average: Expectation
n
" #
1X
E[X̄] = E Xi
n i=1
n
1X
= E[Xi ] (by linearity of the expectation)
n i=1
n
1X
= µ (by the i.i.d. assumption, “identically distributed” part)
n i=1
=µ
Moments of a sample average: Variance
n
" #
1X
V[X̄] = V Xi
n i=1
n
1 X
= V[Xi ] (by the i.i.d. assumption, “independently distributed” part)
n2 i=1
n
1
σ 2 (by the i.i.d. assumption, “identically distributed” part)
X
=
n2 i=1
1
= × n × σ2
n2
σ2
=
n
Comments
As n increases, the variance of p̂ and of X̄ decreases

So at the limit, when n goes to infinity (or “asymptotically”), p̂ and X̄ are not
random variables anymore
They will converge to the true values p and µ per the law of large numbers,
making the sample proportions and means estimators of choice
Asymptotic results
The Law of large numbers
Ideally, the bigger the sample, the closer our estimate to the true value since we
are using more and more data
Then, a big sample allows to use an estimator which is biased, since it is less
and less biased as the sample size increases
Theorem 1 : Weak law of large numbers
Let {Xi }n
i=1 be a sequence of i.i.d. random variable with finite expectation µ.
Then the average X̄ = n1 n
P
i=1 Xi converges to µ in probability, i.e.:
P
X̄ → µ
The Law of large numbers (cont’d)
It is a powerful result: As long as we have an i.i.d. sample with finite

expectation, the sample average gets closer and closer to the true average as n
goes to infinity
Note: The sample proportion is a sample average of Bernoulli random
variables, so it is a consistent estimator of the true proportion
Any estimator which is a mean will have that property
The central limit theorem
The sampling distribution is difficult to obtain if we don’t know the distribution

of the random variables used (for instance, does salt in Ossau-Iraty cheese
follow a normal distribution?)
Hence, it is difficult to use this distribution to say something about the true
value
But it is crucial when doing inference
Theorem 2 : The central limit theorem (CLT)
Let {Xi }n variable with finite expectation µ and
i=1 be a sequence of i.i.d. random P
finite variance σ . Then the average X̄ = n n
2 1
i=1 Xi converges to µ in probability,
i.e.: √ d
n(X̄ − µ) → N (0, σ 2 )
The central limit theorem (cont’d)
We saw that X̄ converges to µ when n increases

Yet, the CLT says that as n increases, X̄ converges to a random variable!
√ √
It comes from n: X̄ − µ goes to 0, but n goes to infinity
Their multiplication at the limit (∞ × 0) ends up in a normal random variable!!
When the sample size is big enough, we use the CLT to approximate the
distributions of p̂ and X̄
Since the distributions contain the true values p and µ, we will be able to
conduct inference
Asymptotic distribution of a sample proportion and mean
Theorem 3 : Asymptotic distribution of the sample proportion

Let {Xi }ni=1 be a sequence of i.i.d. PBernoulli random variable with parameter p.
Then the sample proportion p̂ = n1 n i=1 Xi has the following asymptotic
distribution: √ d
np̂ → N (p, p(1 − p))
Theorem 4 : Asymptotic distribution of the sample mean

Let {Xi }n i=1 be a sequence of i.i.d. random variablesPwith finite expectation µ and
finite variance σ . Then the sample average X̄ = n n
2 1
i=1 Xi has the following
asymptotic distribution:
√ d
nX̄ → N (µ, σ 2 )
Asymptotic distribution of the sample mean: known vs
unknown variance
The asymptotic distribution of the sample mean depends on whether σ 2 is known or

not
If it is not known, then it must be estimated, and that additional uncertainty will affect
the asymptotic distribution
An unbiased estimator (See problem set 3) of σ 2 is
n
1 X
s2 = (Xi − X̄)2
n − 1 i=1
And one can show that

√ X̄ − µ d
n → tn−1
s
When the sample size is big enough (typically n > 100), the Student distribution is
equivalent to the normal distribution, and one can use the normal distribution table
Asymptotic distribution of a sample proportion and mean
The previous two theorems mean that when the sample size is big enough, we can
approximate the sampling distributions of the sample proportion and sample mean the
following way:
√ p̂ − p d
np → N (0, 1)
p(1 − p)
√ X̄ − µ d
n → N (0, 1)
σ
√ X̄ − µ d
n → tn−1
s
Even if the Xi themselves are not normally distributed!
That will be useful for inference
Note: If we knew the distribution of the Xi is normal, the exact sampling distribution
of the sample mean would also normal and there is no need to use the asymptotic
distribution to approximate it

Estimation

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Estimation

Uploaded by

Copyright:

Available Formats

Chapter 2: Statistics

Simon Fraser University

I do not allow this content to be published without my consent.

The quantities previously studied are population quantities: They correspond

Imagine you work in a factory that produces cheese

An estimator is a rule to compute an estimate of a given quantity given some

Estimates are sample values, so an estimator is random (remove one

Definition: Asymptotic normality

where θ and Ω are some nonrandom quantities (expectation and variance

Since estimators are random, they have a distribution: The sampling

Definition: i.i.d. random variables

Assumption: i.i.d. sample

An Independent and Identically Distributed (aka i.i.d.) sample is available:

This assumption allows us to use classical statistical theorems with ease

Consider the cheese factory from before

As n increases, the variance of p̂ and of X̄ decreases

It is a powerful result: As long as we have an i.i.d. sample with finite

The sampling distribution is difficult to obtain if we don’t know the distribution

We saw that X̄ converges to µ when n increases

Theorem 3 : Asymptotic distribution of the sample proportion

Theorem 4 : Asymptotic distribution of the sample mean

The asymptotic distribution of the sample mean depends on whether σ 2 is known or

And one can show that

You might also like