Professional Documents
Culture Documents
Distribution of Sample Mean Monte Carlo Simulations: ECO220Y - Intro To Data Analysis and Applied Econometrics
Distribution of Sample Mean Monte Carlo Simulations: ECO220Y - Intro To Data Analysis and Applied Econometrics
Blanchenay ECO220
Last lecture
• Sample statistics are RV
– Can find distributions analytically,
theoretically, or empirically
• Proportion in sample is linear transformation of
binomial
• Can be approximated by Normal distribution
under certain condition (large sample size)
Blanchenay ECO220
• Can use approximation to find probabilities
• Variance goes down with sample size
Firefox for Android – Google Play
ratings
𝜇𝑋 = 𝐸 𝑋 ≈ 4.376
Blanchenay ECO220
• Expectation, Variance, Distribution?
• What is 𝑃(𝑋ത ≥ 4.5)?
1
Expectation of 𝑋ത = (𝑋1 + ⋯ + 𝑋10 )
10
1
𝐸 𝑋ത = 𝐸 𝑋1 + 𝑋2 + ⋯ + 𝑋10
10
1 1
= 𝐸 𝑋1 + ⋯ + 𝐸 𝑋10 = 10 ⋅ 𝐸 𝑋
10 10
ഥ = 𝑬 𝑿 = 𝝁𝑿
𝑬 𝑿 = 4.376
Blanchenay ECO220
Still true if we take a sample of 𝑛 = 50?
1
Variance of 𝑋ത = (𝑋1 + ⋯ + 𝑋10 )
10
1
𝑉 𝑋ത = 2 𝑉 𝑋1 + 𝑋2 + ⋯ + 𝑋10
10
Assume independent draws
1 1
= 2 𝑉 𝑋1 + ⋯ + 𝑉 𝑋10 = 2 10 ⋅ 𝑉 𝑋
10 10
𝑽 𝑿 𝝈𝟐𝑿
𝑽 𝑿ഥ = = = 0.124
𝟏𝟎 𝟏𝟎
𝝈𝑿
ഥ =
Blanchenay ECO220
𝒔𝒅 𝑿
𝟏𝟎
Still true if we take a sample of 𝑛 = 50?
10% Rule
For draws of a sample to be considered
independent:
Blanchenay ECO220
enough 𝑛: less than 10% population
– Intuition: if you have drawn 50% of
population, it’s easier to guess the next draw
Summary
1
For 𝑋 = (𝑋1 + ⋯ + 𝑋𝑛 )
ത
𝑛
ഥ = 𝑬 𝑿 = 𝝁𝑿
• 𝑬 𝑿
𝑽 𝑿 𝝈𝟐𝑿
ഥ =
• 𝑽 𝑿 =
𝒏 𝒏
Blanchenay ECO220
𝒔𝒅 𝑿 𝝈𝑿
ഥ
• 𝒔𝒅 𝑿 = =
𝒏 𝒏
Distribution of 𝑋ത ?
If 𝑋 Normally distributed 𝑁 𝜇, 𝜎 2
• 𝑋ത linear combinations of 𝑛 r.v. Normally
distributed
• With previous info:
𝜎𝑋2
𝑋ത ∼ 𝑁 𝜇𝑋 ,
𝑛
Blanchenay ECO220
get another random variable
– If there’s confusion denote 𝑋ത10 … 𝑋ത50
Central Limit Theorem
For 𝒏 large enough, the distribution of sample
mean 𝑿 ഥ approximately follows a Normal,
regardless of the original distribution of 𝑿
• Computation of 𝐸(𝑋)
ത and 𝑉(𝑋)
ത still apply
• “𝑛 large enough”:
– Rule of thumb: 𝒏 ≥ 𝟑𝟎
Blanchenay ECO220
– If distribution of 𝑋 almost normal, can use less
– If distribution of 𝑋 very different from normal,
probably need more
Blanchenay ECO220
Average of 𝑛 = 50 dice rolls
Blanchenay ECO220
True CLT
For 𝒏 large enough, the distribution of a Linear
combination of 𝒏 independent RVs follows a
Normal, regardless of the individual
distribution of the RVs
Blanchenay ECO220
Empirical method
Blanchenay ECO220
Monte Carlo simulation
Manhattan Project: secret
development of nuclear weapon
• Q: how much do neutrinos
travel through material
(shielding)?
• Von Neumann & Ulam: can’t
solve equations exactly
• Solution: simulate them as
random process a large
Blanchenay ECO220
number of time using ENIAC
(first computer)
Monte Carlo simulation
1. Fix 𝑛
2. Draw 𝑛 values at random from population
3. Compute statistics and record it
4. Repeat steps 2 and 3 a higher number of times
(eg 100,000)
Blanchenay ECO220
– Graphically
– Numerically (STATA)
Firefox for Android – Google Play
ratings X (Firefox rating)
-------------------------------------------------------------
Percentiles Smallest
1% 1 1
5% 1 1
10% 3 1 Obs 2,954,953
25% 4 1 Sum of Wgt. 2,954,953
Blanchenay ECO220
• Expectation, Variance, Distribution?
• What is 𝑃(𝑋ത ≥ 4.5)?
Firefox Monte Carlo simulation
1. Fix 𝑛 = 10
2. Draw 10 values at random from population
3. Compute statistics (here 𝑋)
ത and record it
4. Repeat steps 2 and 3 for 1,000,000 samples
Blanchenay ECO220
Distribution of 𝑋ത
• 68.0% of values
between 4.378-
0.351 and
4.378+0.351
• 96.8% of values
within 2sd
• 99.3% of values
within 3sd
Blanchenay ECO220
𝑃 𝑋ത ≥ 4.5 ?
If we had assumed 𝑋ത ∼ 𝑁 4.376,0.124 :
4.5 − 4.376
𝑃 𝑋ത ≥ 4.5 = 𝑃 𝑍 ≥ = 0.362
0.124
Blanchenay ECO220
Discrepancies
If we take one sample of 10 and find 𝑋ത ≥ 4.5:
• Sampling error
• Simulation error: chance difference between
true probability distribution and simulated
probability distribution
– Do large number of samples to reduce it
Blanchenay ECO220
Distribution of 𝑋ത50
• 68.8% of values
between 4.376-
0.351 and
4.378+0.351
• 95.2 % of values
within 2sd
• 99.6% of values
within 3sd
Blanchenay ECO220
Reminder: Distrib. of 𝑃 as sample size ↑
Blanchenay ECO220
Precision
2
𝜎𝑋
𝐸 𝑋ത = 𝜇𝑋 𝑉 𝑋ത =
𝑛
Blanchenay ECO220
• Variance of 𝑋ത becomes smaller
Distribution of 𝑋ത as 𝑛 ↗
Blanchenay ECO220
Distribution of 𝑋ത (100,000 repetitions)
Sample ഥ)
𝑬(𝑿 ഥ)
𝑽𝒂𝒓(𝑿 % obs. % obs. % obs.
size 𝒏 within within within
1sd 2sd 3sd
Blanchenay ECO220
1000 4.376 0.0012 68.8 95.5 99.7
+∞ 4.376 0 68.3 95.4 99.7
Sample mean as an estimate of population
mean
Blanchenay ECO220
• Higher 𝑛 ⇒ lower V 𝑋ത ⇒ 𝑋ത more precise
estimate of 𝜇𝑋
Key messages
𝑉 𝑋
For a given 𝑛: 𝐸 𝑋 = 𝐸(𝑋) and 𝑉 𝑋 =
ത ത
𝑛
– 10 percent rule
Shape: CLT: distribution of 𝑋ത approximately Normal if
𝑛 large enough (the less Normally distributed 𝑋 is, the
higher 𝑛 should be)
• If not sure: Monte Carlo simulations let us
empirically estimate distribution
Blanchenay ECO220
– Can be generalized to other statistics than 𝑋ത
• As 𝑛 increases, variance of 𝑋ത decreases
• Estimate 𝜇𝑋 using 𝑋ത