Download as pdf or txt
Download as pdf or txt
You are on page 1of 33

CL202: Introduction to Data Analysis

MB+SCP

Mani Bhushan, Sachin Patwardhan


Department of Chemical Engineering,
Indian Institute of Technology Bombay
Mumbai, India- 400076

mbhushan,sachinp@iitb.ac.in

Acknowledgements: Santosh Noronha (some material from his slides)

Spring 2015

MB+SCP (IIT Bombay) CL202 Spring 2015 1 / 33


Today’s lecture:

Sampling Statistics and their Distributions


Chapter 6 of textbook

MB+SCP (IIT Bombay) CL202 Spring 2015 2 / 33


Populations and samples

We sample to draw some conclusions about an entire population.


Examples: Exit polls, census.
We assume that the population can be described by a probability distribution.
We assume that the samples gives us measurements following this
distribution,
We assume that the samples are randomly chosen.

MB+SCP (IIT Bombay) CL202 Spring 2015 3 / 33


Formally...

If X1 , ..., Xn are independent random variables having a common distribution


F i.e. X1 , ..., Xn are IID (independently and identically distributed), then we
say that they constitute a sample (or a random sample) from the distribution
F.
In most applications, F not completely known and problem is to use samples
to infer F .
Parameteric inference: F specified in terms of unknown parameters, such as
height is gaussian with unknown mean and variance,
Non-parameteric inference: nothing known about the form of F .

MB+SCP (IIT Bombay) CL202 Spring 2015 4 / 33


Sample selection

Simple random sample: each member of a random sample is chosen


independently and has the same (nonzero) probability of being chosen.
Cluster sampling: If the entire population cannot be uniformly sampled,
divide it into clusters and sample some clusters.
It is very difficult to verify that a sample is really a random sample. The
population is assumed to be (effectively) infinite.
Sampling strategies more complicated if the population is actually finite.

MB+SCP (IIT Bombay) CL202 Spring 2015 5 / 33


Clinical Trials and Randomization

Clinical trials are randomized:


Block randomization: one block gets the treatment and another the placebo.
Blind trials: Patients do not know what they get.
Double blind trials: both doctors and patients do not know what they get.

MB+SCP (IIT Bombay) CL202 Spring 2015 6 / 33


Distributions and Parameteric Estimation

We wish to infer the parameters of a population distribution.


I mean, variance, proportion etc.
We use a statistic derived from the sample data.
I Sample mean, sample variance etc.
A statistic is a random variable.
I What distribution does it follow?
I What are the parameters of its distribution?

MB+SCP (IIT Bombay) CL202 Spring 2015 7 / 33


Notation

Population parameters are typically in Greek.


I µ = population mean.
I σ 2 = population variance.
The corresponding statistic is not in Greek
I X̄ = sample mean.
I S 2 = sample variance.
We can also denote an estimate of the parameter θ as θ̂.
Regression:
I You want: y = αx + β.
I You get: y = ax + b.

MB+SCP (IIT Bombay) CL202 Spring 2015 8 / 33


What can we say about Sample Mean
Consider a population with mean µ and variance σ 2 .
We have n random samples X1 , X2 , . . ., Xn from this population.
Define sample mean as:
X1 + · · · + Xn
X̄ =
n

X̄ is a random variable since it is a function of the random variables


X1 , ..., Xn .
What do we expect from X̄ ?
E [Xi ] = µ var (Xi ) = σ 2

 
X1 + · · · + Xn E [X1 ] + · · · + E [Xn ]
E [X̄ ] = E =
n n

= =µ
n

MB+SCP (IIT Bombay) CL202 Spring 2015 9 / 33


Why do we infer µ using the sample mean?

E [X̄ ] = µ

X̄ is an unbiased estimator of µ.
For a statistic θ̂ to be an unbiased estimator of θ,

E [θ̂] = θ

MB+SCP (IIT Bombay) CL202 Spring 2015 10 / 33


What is the variance of X̄ ?

We had var (Xi ) = σ 2 .


 
X1 + · · · + Xn
var (X̄ ) = var
n
1
= (var (X1 + · · · + Xn ))
n2
1
= (var (X1 ) + · · · + var (xn )) Independence!
n2
1 2
= nσ
n2
σ2
=
n

MB+SCP (IIT Bombay) CL202 Spring 2015 11 / 33


Xi versus X̄ ?

E [Xi ] = µ var (Xi ) = σ 2

σ2
E [X̄ ] = µ var (X̄ ) =
n

Variance of sample mean decreases with increasing n.


Sample more and gain accuracy when using X̄ as an estimate of µ.
What can we say about distribution of sample mean?

MB+SCP (IIT Bombay) CL202 Spring 2015 12 / 33


The Central Limit Theorem

Theorem
Let X1 , X2 , ..., Xn be a sequence of independent and identically distributed random
variables each having mean µ and variance σ 2 . Then for large n, the distribution of
X1 + X2 + .... + Xn
is approximately normal with mean nµ and variance nσ 2 .

i.e. for large n


(X1 + · · · + Xn ) ∼ N (nµ, nσ 2 )
 
(X1 + · · · + Xn ) − nµ
P √ <x ≈ P(Z < x)
σ n
with Z being a standard normal random variable.
One of the most powerful results in probability
Proposed in 1733 by French mathematician A. deMoivre.
Forgotten until Laplace published it in 1812: used normal to approximate
binomial.
In 1901 Lyapunov defined it in general terms and proved it formally.
MB+SCP (IIT Bombay) CL202 Spring 2015 13 / 33
The Central Limit Theorem

Consider the total obtained on rolling several dice.

MB+SCP (IIT Bombay) CL202 Spring 2015 14 / 33


The Central Limit Theorem

Consider the total obtained on rolling several dice.

MB+SCP (IIT Bombay) CL202 Spring 2015 15 / 33


The Central Limit Theorem

Consider the total obtained on rolling several dice.

MB+SCP (IIT Bombay) CL202 Spring 2015 16 / 33


The Central Limit Theorem

Consider the total obtained on rolling several dice.

MB+SCP (IIT Bombay) CL202 Spring 2015 17 / 33


The Central Limit Theorem

Consider the total obtained on rolling several dice.

MB+SCP (IIT Bombay) CL202 Spring 2015 18 / 33


Central Limit Theorem (CLT) and the Binomial

Let X = X1 + · · · + Xn , where Xi is a Bernoulli variable (1 if success and 0 if


failure; probability of success p).

E [Xi ] = p var (Xi ) = p(1 − p)

E [X ] = np var (X ) = np(1 − p)

CLT suggests that for large n


X − np
p ∼ N (0, 1)
np(1 − p)

Rule of thumb: The Binomial can be approximated well with a normal if

np(1 − p) ≥ 10

MB+SCP (IIT Bombay) CL202 Spring 2015 19 / 33


Example 6.3c
Q Ideal size of a first year class at a college is 150 students. The college,
knowing from past experience that on the average only 30% of those
accepted for admission will actually attend, uses a policy of approving the
applications of 450 students. Compute the probability that more than 150
first year students attend this college.
A X is the number of students that attend. Then X is a binomial RV with
n = 450 and p = 0.3 assuming each student independently decides whether
to attend or not.
Note X (binomial) is discrete while normal is continuous. Thus

P{X = i} = P{i − 0.5 < X < i + 0.5}

Thus
( )
X − (450)(0.3) 150.5 − (450)(0.3
P{X > 150.5} = P p ≥ p
(450)(0.3)(0.7) (450)(0.3)(0.7)

≈ P{Z ≥ 1.59} = 0.06


MB+SCP (IIT Bombay) CL202 Spring 2015 20 / 33
Approximate Distribution of Sample Mean

Q Let X1 , ..., Xn be a sample from a population having mean µ and variance σ 2 .


What is the distribution of sample mean?
Pn
Xi
X̄ = i=1
n

Pn
A Using CLT, distribution of i=1 Xi is approximately normal when n is large.
Then distribution of X̄ is also normal since constant multiple of a normal RV
is also normal RV, with

E [X̄ ] = µ; var (X̄ ) = σ 2 /n

Hence,
X̄ − µ

σ/ n
has a standard normal distribution.

MB+SCP (IIT Bombay) CL202 Spring 2015 21 / 33


CLT and Sample Size for X̄ to be Normal

CLT does not tell us how large sample size n needs to be for the normal
approximation of X̄ to be valid.
n depends on population distribution of the sample data.
For binomial np(1 − p) ≥ 10, for normal any n ≥ 1 is ok.
Rule of thumb: sample size n ≥ 30 works for almost all distributions, i.e. no
matter how nonnormal the underlying population is, the sample mean of a
sample of size atleast 30 will be approximately normal.
In most cases, normal approximation will be valid for much smaller sample
sizes.

MB+SCP (IIT Bombay) CL202 Spring 2015 22 / 33


Densities of Sample Means of a Normal Population

 
1
If X ∼ N (0, 1), then X̄ ∼ N 0,
n

MB+SCP (IIT Bombay) CL202 Spring 2015 23 / 33


Sample variance

Let X1 , X2 , ..., Xn be a random sample from a distribution with mean µ and


variance σ 2 .
Let X̄ be the sample mean.
The statistic S 2 defined as:
Pn
i=1 (Xi − X̄ )2
S2 =
n−1
is called the sample variance. It is a random variable.

S= S 2 is called the sample standard deviation.

MB+SCP (IIT Bombay) CL202 Spring 2015 24 / 33


E [S 2 ]?

We compute E [S 2 ] as follows:

(Xi − X̄ )2 = (Xi − µ + µ − X̄ )2
= (Xi − µ)2 + (µ − X̄ )2 + 2(Xi − µ)(µ − X̄ )
n
X Xn n
X
(Xi − X̄ )2 = (Xi − µ)2 + (µ − X̄ )2
i=1 i=1 i=1
n
X
+2 (Xi − µ)(µ − X̄ )
i=1

The middle term is ...


n
X
(µ − X̄ )2 = n(µ − X̄ )2
i=1

MB+SCP (IIT Bombay) CL202 Spring 2015 25 / 33


E [S 2 ] (Continued)

The last term is ...


n
X n
X
2 (Xi − µ)(µ − X̄ ) = − 2(X̄ − µ) (Xi − µ)
i=1 i=1
2
= − 2n(X̄ − µ)
n
X n
X
(Xi − X̄ )2 = (Xi − µ)2 + n(µ − X̄ )2 − 2n(X̄ − µ)2
i=1 i=1
n
X
= (Xi − µ)2 − n(X̄ − µ)2
i=1
Pn 2
Pn
i=1 (Xi − X̄ ) i=1 (Xi− µ)2 n(X̄ − µ)2
= −
n−1 n−1 n−1

MB+SCP (IIT Bombay) CL202 Spring 2015 26 / 33


E [S 2 ] (Continued)

We had
Pn Pn
− X̄ )2
i=1 (Xi i=1 (Xi− µ)2 n
S2 = = − (X̄ − µ)2
n−1 n−1 n−1

Taking expectations
 Pn 2
  Pn 2
  
i=1 (Xi − X̄ ) i=1 (Xi − µ) n
E [S 2 ] = E =E −E (X̄ − µ)2
n−1 n−1 n−1

But,
 Pn
− µ)2

i=1 (Xi n
E = σ2
n−1 n−1
σ2
E [(X̄ − µ)2 ] = var (X̄ ) =
n

MB+SCP (IIT Bombay) CL202 Spring 2015 27 / 33


E [S 2 ] (continued)

This implies
 Pn
− X̄ )2 σ2

i=1 (Xi n
E = σ2 −
n−1 n−1 n−1
= σ2

or
E [S 2 ] = σ 2

Therefore S 2 is an unbiased estimator of σ 2 .


This is the real reason for having n − 1 in the denominator in the S 2
expression instead of n, and not really the degrees of freedom argument given
earlier.

MB+SCP (IIT Bombay) CL202 Spring 2015 28 / 33


Sampling Distributions from a Normal Population
Consider distribution of statistics (X̄ , S 2 ) obtained from samples from a Normal
population.

Theorem
If X1 , X2 , ..., Xn is a sample from a normal population having mean µ and variance
σ 2 , then X̄ and S 2 are independent random variables, with X̄ being normal with
mean µ and variance σ 2 /n and (n − 1)S 2 /σ 2 being chi-square random variable
with n − 1 degrees of freedom.

X̄ is approximately normal due to CLT in general. However in this case, it is


exactly normal since sum of normal RVs is normal (Show this using Moment
Generating Function)?
(n − 1)S 2 /σ 2 is a chisquare with n − 1 degrees of freedom.
X̄ , S 2 are independent random variables.
Proof of last two statements in book.
Independence of X̄ and S 2 holds due to normal population and is not true in
general.
MB+SCP (IIT Bombay) CL202 Spring 2015 29 / 33
Example 6.5a to illustrate use of S 2 distribution

Q The time it takes for a processor to carry out a particular computation is


normally distributed with mean 20 seconds and standard deviation 3 seconds.
If a sample of 15 such computations is observed, what is the probability that
the sample variance is greater than 12?
A n = 15, σ 2 = 32 = 9. Then, P{S 2 > 12} =
 
2 14
P 14S /9 > 12 = P{χ214 > 18.67}
9

MB+SCP (IIT Bombay) CL202 Spring 2015 30 / 33


Corollary
Let X1 , X2 , ..., Xn be a sample from a normal population with mean µ and variance
σ 2 . Let X̄ be the sample mean and S the sample standard deviation, then

√ X̄ − µ
n ∼ tn−1
S
Proof: t random variable with n degrees of freedom is defined as:
Z
p
χ2n /n

with Z being a standard normal RV, χ2n being a chi-square RV with n degrees of
freedom and both are independent. Consider
X̄ −µ

σ/ n √ X̄ − µ
q = n
(n−1)S 2 S
σ 2 (n−1)

is a t-RV with n − 1 degrees of freedom due to earlier theorem

MB+SCP (IIT Bombay) CL202 Spring 2015 31 / 33


Summary

For any sample: E [X̄ ] = µ, E [S 2 ] = σ 2


CLT: sum of IID RVs is approximately normal. As a result, X̄ is
approximately normal with mean µ and variance σ 2 /n.
When sample from normal population, then X̄ is exactly normal and
(n − 1)S 2 /σ 2 is a chi-square RV with n − 1 degrees of freedom. Further,
X̄ , S 2 are independent.

Also, n (X̄ −µ)S ∼ tn−1 .

MB+SCP (IIT Bombay) CL202 Spring 2015 32 / 33


THANK YOU

MB+SCP (IIT Bombay) CL202 Spring 2015 33 / 33

You might also like