Download as pdf or txt
Download as pdf or txt
You are on page 1of 32

Chapter 2: Statistics

Part I - Estimation

Simon Fraser University


ECON 333
Summer 2023

Chapter 1 Estimation
Disclaimer

I do not allow this content to be published without my consent.


All rights reserved ©2023 Thomas Vigié

Chapter 1 Estimation
Probability vs Statistics

The theory of probability provides us with tools to describe events that are
random (e.g. throwing a die, flipping a coin, tomorrow’s temperature, patterns
of a boss in a video game)
Statistics consists in learning about these random events using data
Data are often, if not always, obtained as a sample taken from a population
of interest
A sample of the population of all Canadian adults
A sample of the population of all SFU students
A sample of the population of all firms
We need other tools to understand what data can tell us

Chapter 1 Estimation
Outline

Population vs samples
Estimators
Desirable properties
Sampling distribution
Moments of a sample proportion
Moments of a sample mean
Asymptotic results
Suggested reading: Chapter 3 in Stock and Watson

Chapter 1 Estimation
Population values vs sample values

The quantities previously studied are population quantities: They correspond


to the entire groups of units of interest
In practice, we only observe a sample of that population
So probabilities, averages etc derived from a sample are subject to
change: Add one observation, or give me another sample and the new
probabilities and averages will not be the same
So sample quantities are themselves random!! We will talk about sampling
distributions
Hence: E[X], V[X] and P[X] are not random, but X̄, p̂, β̂, σ̂ 2 are

Chapter 1 Estimation
Sample values: Example

Imagine you work in a factory that produces cheese


The population consists of all the cheese produced that day
One might wonder how salty a particular type of cheese is on average (each unit
of that cheese will not have the exact same salt content)
But we are not going to measure the salt level in each single unit cheese: We
look at a random sample from different batches
And we use the measurements on that sample to either estimate the true
average level of salt or infer something about it: the former is estimation, the
latter is statistical inference

Chapter 1 Estimation
Random sampling

The way a sample is drawn from a population will affect the properties of an
estimator
If observations are correlated and we don’t know it, we might fail to pick up
that correlation and our estimates might be biased or inconsistent
We also want our sample to be representative of the population we are
interested in
If we collect a sample by calling people at their house between 10 am and 3 pm,
our sample won’t be representative as it misses all the people who are not at
home at those times. The sample will be biased towards people who work from
home or don’t work
The easiest way is to draw randomly: Each member of the population has an
equal chance of being selected

Chapter 1 Estimation
Estimators

Chapter 1 Estimation
Estimators

An estimator is a rule to compute an estimate of a given quantity given some


data
It produces no more than a guess, educated or not
If we had access to the population data, we would use it to find, say, E[X]
Since we only have access to a sample of it, we are going to use an estimator
to compute an estimate of E[X]
Ideally, the bigger the sample, the closer we should get to the truth since the
sample gets closer to the population data

Chapter 1 Estimation
Estimators as random objects

Estimates are sample values, so an estimator is random (remove one


observation in a sample and the estimate will change, let alone changing the
whole sample), and hence has an expectation, a variance, a distribution, etc
Computing estimates over different samples should tell us something reliable on
average
Example: We want to know the proportion of smokers in Canada. But we don’t
have access to the whole population data. We can get a sample, but how do we
make a good guess about the real proportion of smokers?
Different estimators can be used
We need theoretical results to justify the use of one estimator over the other

Chapter 1 Estimation
Desirable properties

Chapter 1 Estimation
Desirable properties of estimators
Definition: Consistency
An estimator θ̂ of a nonrandom quantity θ is consistent (or asymptotically
unbiased) if it converges in probability towards the value it estimates:
P
θ̂ → θ
 
P
where → denotes convergence in probability, i.e. ∀ε > 0, P |θ̂ − θ| > ε → 0 as
n→∞

Definition: Unbiasedness
An estimator θ̂ of a nonrandom quantity θ is unbiased if on average, it equals the
true value of the quantity of interest:

E[ θ̂ ] = θ

Chapter 1 Estimation
Desirable properties of estimators (cont’d)

Definition: Asymptotic normality


An estimator θ̂ is said to be asymptotically normal if, as the sample size n → ∞:
√ d
n(θ̂) → N (θ, Ω)

where θ and Ω are some nonrandom quantities (expectation and variance


respectively)

Chapter 1 Estimation
Desirable properties of estimators (cont’d)

Many (most) estimators are biased. For some, we can have an idea of the bias
direction
But being consistent makes them reliable given the sample used is big enough.
If an estimator is consistent, the bigger the sample, the more accurate the
estimator, and the closer the estimate is to the true value
Asymptotic normality allows to make inference about the true value of the
parameter, i.e. to draw probabilistic conclusions about the true parameter, such
as (not) rejecting hypotheses about the true value or building confidence
intervals around the true value

Chapter 1 Estimation
Sampling distributions

Chapter 1 Estimation
Sampling distributions of estimators

Since estimators are random, they have a distribution: The sampling


distribution
Some classic estimators have well documented expectation and variance
But for a fixed sample size n, sampling distributions (the pmf/pdf and cdf) can
be complicated and depend on the distribution of the random variable we take
the average of (which we don’t often know)
For a large sample size n, the sampling distribution can be approximated by
the asymptotic distribution (see asymptotic results)

Chapter 1 Estimation
Sampling distribution of estimators (cont’d)

Definition: i.i.d. random variables


A collection of random variables are said to be Independent and Identically
Distributed (i.i.d.) if they are independent and follow the same distribution.

Assumption: i.i.d. sample

An Independent and Identically Distributed (aka i.i.d.) sample is available:


{yi }, i = 1, ..., n

This assumption allows us to use classical statistical theorems with ease


Is it realistic? Random sampling can meet that assumption
Note: For this assumption, we use lower case letters are they are observations of
the random variables Yi . But you often see Yi being used to talk about sample
values

Chapter 1 Estimation
Sampling distribution of a proportion

Consider a population of people. Some are smokers, others aren’t. Let p be the
true proportion of smokers
We would like to know p, but we can’t observe everyone. . .
Consider a i.i.d. sample taken from that population
We would like to estimate p. Any guess?
A sensible estimator is the sample proportion p̂
Why is it sensible?

Chapter 1 Estimation
Moments of a sample proportion: Expectation
Let Xi be the random variable equal to 1 if observation i is a smoker, and 0
otherwise. Since there is a proportion p of smokers in the population,
E[Xi ] = p and V[Xi ] = p(1 − p) (Xi is a Bernoulli r.v.!)
Then the sample proportion is p̂ = n1 n
P
i=1 Xi

n
" #
1X
E[p̂] = E Xi
n i=1
n
1 X
= E[Xi ] (by linearity of the expectation)
n i=1
n
1X
= p (by the i.i.d. assumption, “identically distributed” part)
n i=1
=p

Chapter 1 Estimation
Moments of a sample proportion: Variance

n
" #
1X
V[p̂] = V Xi
n i=1
n
1 X
= V[Xi ] (by the i.i.d. assumption, “independently distributed” part)
n2 i=1
1
= × nV[Xi ] (by the i.i.d. assumption, “identically distributed” part)
n2
1
= × n × p(1 − p)
n2
p(1 − p)
=
n

Chapter 1 Estimation
Moments of a sample average

Consider the cheese factory from before


Consider a sample composed of randomly selected pounds of cheese of the
Ossau-Iraty type (try it, it is delicious!)
We would like to estimate µ, the (population) average level of salt of a given
pound of cheese i, denoted Xi
The variance of Xi is denoted σ 2
A sensible estimator is the sample mean X̄ = n1 n
P
i=1 Xi
Why is it sensible?

Chapter 1 Estimation
Moments of a sample average: Expectation

n
" #
1X
E[X̄] = E Xi
n i=1
n
1X
= E[Xi ] (by linearity of the expectation)
n i=1
n
1X
= µ (by the i.i.d. assumption, “identically distributed” part)
n i=1

Chapter 1 Estimation
Moments of a sample average: Variance

n
" #
1X
V[X̄] = V Xi
n i=1
n
1 X
= V[Xi ] (by the i.i.d. assumption, “independently distributed” part)
n2 i=1
n
1
σ 2 (by the i.i.d. assumption, “identically distributed” part)
X
=
n2 i=1
1
= × n × σ2
n2
σ2
=
n

Chapter 1 Estimation
Comments

As n increases, the variance of p̂ and of X̄ decreases


So at the limit, when n goes to infinity (or “asymptotically”), p̂ and X̄ are not
random variables anymore
They will converge to the true values p and µ per the law of large numbers,
making the sample proportions and means estimators of choice

Chapter 1 Estimation
Asymptotic results

Chapter 1 Estimation
The Law of large numbers

Ideally, the bigger the sample, the closer our estimate to the true value since we
are using more and more data
Then, a big sample allows to use an estimator which is biased, since it is less
and less biased as the sample size increases
Theorem 1 : Weak law of large numbers
Let {Xi }n
i=1 be a sequence of i.i.d. random variable with finite expectation µ.
Then the average X̄ = n1 n
P
i=1 Xi converges to µ in probability, i.e.:

P
X̄ → µ

Chapter 1 Estimation
The Law of large numbers (cont’d)

It is a powerful result: As long as we have an i.i.d. sample with finite


expectation, the sample average gets closer and closer to the true average as n
goes to infinity
Note: The sample proportion is a sample average of Bernoulli random
variables, so it is a consistent estimator of the true proportion
Any estimator which is a mean will have that property

Chapter 1 Estimation
The central limit theorem

The sampling distribution is difficult to obtain if we don’t know the distribution


of the random variables used (for instance, does salt in Ossau-Iraty cheese
follow a normal distribution?)
Hence, it is difficult to use this distribution to say something about the true
value
But it is crucial when doing inference
Theorem 2 : The central limit theorem (CLT)
Let {Xi }n variable with finite expectation µ and
i=1 be a sequence of i.i.d. random P
finite variance σ . Then the average X̄ = n n
2 1
i=1 Xi converges to µ in probability,
i.e.: √ d
n(X̄ − µ) → N (0, σ 2 )

Chapter 1 Estimation
The central limit theorem (cont’d)

We saw that X̄ converges to µ when n increases


Yet, the CLT says that as n increases, X̄ converges to a random variable!
√ √
It comes from n: X̄ − µ goes to 0, but n goes to infinity
Their multiplication at the limit (∞ × 0) ends up in a normal random variable!!
When the sample size is big enough, we use the CLT to approximate the
distributions of p̂ and X̄
Since the distributions contain the true values p and µ, we will be able to
conduct inference

Chapter 1 Estimation
Asymptotic distribution of a sample proportion and mean

Theorem 3 : Asymptotic distribution of the sample proportion


Let {Xi }ni=1 be a sequence of i.i.d. PBernoulli random variable with parameter p.
Then the sample proportion p̂ = n1 n i=1 Xi has the following asymptotic
distribution: √ d
np̂ → N (p, p(1 − p))

Theorem 4 : Asymptotic distribution of the sample mean


Let {Xi }n i=1 be a sequence of i.i.d. random variablesPwith finite expectation µ and
finite variance σ . Then the sample average X̄ = n n
2 1
i=1 Xi has the following
asymptotic distribution:
√ d
nX̄ → N (µ, σ 2 )

Chapter 1 Estimation
Asymptotic distribution of the sample mean: known vs
unknown variance

The asymptotic distribution of the sample mean depends on whether σ 2 is known or


not
If it is not known, then it must be estimated, and that additional uncertainty will affect
the asymptotic distribution
An unbiased estimator (See problem set 3) of σ 2 is
n
1 X
s2 = (Xi − X̄)2
n − 1 i=1

And one can show that


√ X̄ − µ d
n → tn−1
s
When the sample size is big enough (typically n > 100), the Student distribution is
equivalent to the normal distribution, and one can use the normal distribution table

Chapter 1 Estimation
Asymptotic distribution of a sample proportion and mean
The previous two theorems mean that when the sample size is big enough, we can
approximate the sampling distributions of the sample proportion and sample mean the
following way:
√ p̂ − p d
np → N (0, 1)
p(1 − p)
√ X̄ − µ d
n → N (0, 1)
σ
√ X̄ − µ d
n → tn−1
s
Even if the Xi themselves are not normally distributed!
That will be useful for inference
Note: If we knew the distribution of the Xi is normal, the exact sampling distribution
of the sample mean would also normal and there is no need to use the asymptotic
distribution to approximate it
Chapter 1 Estimation

You might also like