Professional Documents
Culture Documents
CL202: Introduction To Data Analysis: MB+SCP
CL202: Introduction To Data Analysis: MB+SCP
MB+SCP
mbhushan,sachinp@iitb.ac.in
Spring 2015
X1 + · · · + Xn E [X1 ] + · · · + E [Xn ]
E [X̄ ] = E =
n n
nµ
= =µ
n
E [X̄ ] = µ
X̄ is an unbiased estimator of µ.
For a statistic θ̂ to be an unbiased estimator of θ,
E [θ̂] = θ
σ2
E [X̄ ] = µ var (X̄ ) =
n
Theorem
Let X1 , X2 , ..., Xn be a sequence of independent and identically distributed random
variables each having mean µ and variance σ 2 . Then for large n, the distribution of
X1 + X2 + .... + Xn
is approximately normal with mean nµ and variance nσ 2 .
E [X ] = np var (X ) = np(1 − p)
np(1 − p) ≥ 10
Thus
( )
X − (450)(0.3) 150.5 − (450)(0.3
P{X > 150.5} = P p ≥ p
(450)(0.3)(0.7) (450)(0.3)(0.7)
Pn
A Using CLT, distribution of i=1 Xi is approximately normal when n is large.
Then distribution of X̄ is also normal since constant multiple of a normal RV
is also normal RV, with
Hence,
X̄ − µ
√
σ/ n
has a standard normal distribution.
CLT does not tell us how large sample size n needs to be for the normal
approximation of X̄ to be valid.
n depends on population distribution of the sample data.
For binomial np(1 − p) ≥ 10, for normal any n ≥ 1 is ok.
Rule of thumb: sample size n ≥ 30 works for almost all distributions, i.e. no
matter how nonnormal the underlying population is, the sample mean of a
sample of size atleast 30 will be approximately normal.
In most cases, normal approximation will be valid for much smaller sample
sizes.
1
If X ∼ N (0, 1), then X̄ ∼ N 0,
n
We compute E [S 2 ] as follows:
(Xi − X̄ )2 = (Xi − µ + µ − X̄ )2
= (Xi − µ)2 + (µ − X̄ )2 + 2(Xi − µ)(µ − X̄ )
n
X Xn n
X
(Xi − X̄ )2 = (Xi − µ)2 + (µ − X̄ )2
i=1 i=1 i=1
n
X
+2 (Xi − µ)(µ − X̄ )
i=1
We had
Pn Pn
− X̄ )2
i=1 (Xi i=1 (Xi− µ)2 n
S2 = = − (X̄ − µ)2
n−1 n−1 n−1
Taking expectations
Pn 2
Pn 2
i=1 (Xi − X̄ ) i=1 (Xi − µ) n
E [S 2 ] = E =E −E (X̄ − µ)2
n−1 n−1 n−1
But,
Pn
− µ)2
i=1 (Xi n
E = σ2
n−1 n−1
σ2
E [(X̄ − µ)2 ] = var (X̄ ) =
n
This implies
Pn
− X̄ )2 σ2
i=1 (Xi n
E = σ2 −
n−1 n−1 n−1
= σ2
or
E [S 2 ] = σ 2
Theorem
If X1 , X2 , ..., Xn is a sample from a normal population having mean µ and variance
σ 2 , then X̄ and S 2 are independent random variables, with X̄ being normal with
mean µ and variance σ 2 /n and (n − 1)S 2 /σ 2 being chi-square random variable
with n − 1 degrees of freedom.
√ X̄ − µ
n ∼ tn−1
S
Proof: t random variable with n degrees of freedom is defined as:
Z
p
χ2n /n
with Z being a standard normal RV, χ2n being a chi-square RV with n degrees of
freedom and both are independent. Consider
X̄ −µ
√
σ/ n √ X̄ − µ
q = n
(n−1)S 2 S
σ 2 (n−1)