CL202: Introduction To Data Analysis: MB+SCP

CL202: Introduction to Data Analysis
MB+SCP
Mani Bhushan, Sachin Patwardhan

Department of Chemical Engineering,
Indian Institute of Technology Bombay
Mumbai, India- 400076
mbhushan,sachinp@iitb.ac.in
Acknowledgements: Santosh Noronha (some material from his slides)
Spring 2015
MB+SCP (IIT Bombay) CL202 Spring 2015 1 / 33

Today’s lecture:
Sampling Statistics and their Distributions

Chapter 6 of textbook

Populations and samples
We sample to draw some conclusions about an entire population.

Examples: Exit polls, census.
We assume that the population can be described by a probability distribution.
We assume that the samples gives us measurements following this
distribution,
We assume that the samples are randomly chosen.

Formally...
If X1 , ..., Xn are independent random variables having a common distribution

F i.e. X1 , ..., Xn are IID (independently and identically distributed), then we
say that they constitute a sample (or a random sample) from the distribution
F.
In most applications, F not completely known and problem is to use samples
to infer F .
Parameteric inference: F specified in terms of unknown parameters, such as
height is gaussian with unknown mean and variance,
Non-parameteric inference: nothing known about the form of F .

Sample selection
Simple random sample: each member of a random sample is chosen

independently and has the same (nonzero) probability of being chosen.
Cluster sampling: If the entire population cannot be uniformly sampled,
divide it into clusters and sample some clusters.
It is very difficult to verify that a sample is really a random sample. The
population is assumed to be (effectively) infinite.
Sampling strategies more complicated if the population is actually finite.

Clinical Trials and Randomization
Clinical trials are randomized:

Block randomization: one block gets the treatment and another the placebo.
Blind trials: Patients do not know what they get.
Double blind trials: both doctors and patients do not know what they get.

Distributions and Parameteric Estimation
We wish to infer the parameters of a population distribution.

I mean, variance, proportion etc.
We use a statistic derived from the sample data.
I Sample mean, sample variance etc.
A statistic is a random variable.
I What distribution does it follow?
I What are the parameters of its distribution?

Notation
Population parameters are typically in Greek.

I µ = population mean.
I σ 2 = population variance.
The corresponding statistic is not in Greek
I X̄ = sample mean.
I S 2 = sample variance.
We can also denote an estimate of the parameter θ as θ̂.
Regression:
I You want: y = αx + β.
I You get: y = ax + b.

What can we say about Sample Mean
Consider a population with mean µ and variance σ 2 .
We have n random samples X1 , X2 , . . ., Xn from this population.
Define sample mean as:
X1 + · · · + Xn
X̄ =
n
X̄ is a random variable since it is a function of the random variables

X1 , ..., Xn .
What do we expect from X̄ ?
E [Xi ] = µ var (Xi ) = σ 2

X1 + · · · + Xn E [X1 ] + · · · + E [Xn ]
E [X̄ ] = E =
n n
nµ
= =µ
n

Why do we infer µ using the sample mean?
E [X̄ ] = µ
X̄ is an unbiased estimator of µ.
For a statistic θ̂ to be an unbiased estimator of θ,
E [θ̂] = θ

What is the variance of X̄ ?
We had var (Xi ) = σ 2 .

X1 + · · · + Xn
var (X̄ ) = var
n
1
= (var (X1 + · · · + Xn ))
n2
1
= (var (X1 ) + · · · + var (xn )) Independence!
n2
1 2
= nσ
n2
σ2
=
n

Xi versus X̄ ?
E [Xi ] = µ var (Xi ) = σ 2
σ2
E [X̄ ] = µ var (X̄ ) =
n
Variance of sample mean decreases with increasing n.

Sample more and gain accuracy when using X̄ as an estimate of µ.
What can we say about distribution of sample mean?

The Central Limit Theorem
Theorem
Let X1 , X2 , ..., Xn be a sequence of independent and identically distributed random
variables each having mean µ and variance σ 2 . Then for large n, the distribution of
X1 + X2 + .... + Xn
is approximately normal with mean nµ and variance nσ 2 .
i.e. for large n

(X1 + · · · + Xn ) ∼ N (nµ, nσ 2 )

(X1 + · · · + Xn ) − nµ
P √ <x ≈ P(Z < x)
σ n
with Z being a standard normal random variable.
One of the most powerful results in probability
Proposed in 1733 by French mathematician A. deMoivre.
Forgotten until Laplace published it in 1812: used normal to approximate
binomial.
In 1901 Lyapunov defined it in general terms and proved it formally.
Consider the total obtained on rolling several dice.





Central Limit Theorem (CLT) and the Binomial
Let X = X1 + · · · + Xn , where Xi is a Bernoulli variable (1 if success and 0 if

failure; probability of success p).
E [Xi ] = p var (Xi ) = p(1 − p)
E [X ] = np var (X ) = np(1 − p)
CLT suggests that for large n

X − np
p ∼ N (0, 1)
np(1 − p)
Rule of thumb: The Binomial can be approximated well with a normal if
np(1 − p) ≥ 10

Example 6.3c
Q Ideal size of a first year class at a college is 150 students. The college,
knowing from past experience that on the average only 30% of those
accepted for admission will actually attend, uses a policy of approving the
applications of 450 students. Compute the probability that more than 150
first year students attend this college.
A X is the number of students that attend. Then X is a binomial RV with
n = 450 and p = 0.3 assuming each student independently decides whether
to attend or not.
Note X (binomial) is discrete while normal is continuous. Thus
P{X = i} = P{i − 0.5 < X < i + 0.5}
Thus
( )
X − (450)(0.3) 150.5 − (450)(0.3
P{X > 150.5} = P p ≥ p
(450)(0.3)(0.7) (450)(0.3)(0.7)
≈ P{Z ≥ 1.59} = 0.06

Approximate Distribution of Sample Mean
Q Let X1 , ..., Xn be a sample from a population having mean µ and variance σ 2 .

What is the distribution of sample mean?
Pn
Xi
X̄ = i=1
n
Pn
A Using CLT, distribution of i=1 Xi is approximately normal when n is large.
Then distribution of X̄ is also normal since constant multiple of a normal RV
is also normal RV, with
E [X̄ ] = µ; var (X̄ ) = σ 2 /n
Hence,
X̄ − µ
√
σ/ n
has a standard normal distribution.

CLT and Sample Size for X̄ to be Normal
CLT does not tell us how large sample size n needs to be for the normal
approximation of X̄ to be valid.
n depends on population distribution of the sample data.
For binomial np(1 − p) ≥ 10, for normal any n ≥ 1 is ok.
Rule of thumb: sample size n ≥ 30 works for almost all distributions, i.e. no
matter how nonnormal the underlying population is, the sample mean of a
sample of size atleast 30 will be approximately normal.
In most cases, normal approximation will be valid for much smaller sample
sizes.

Densities of Sample Means of a Normal Population

1
If X ∼ N (0, 1), then X̄ ∼ N 0,
n

Sample variance
Let X1 , X2 , ..., Xn be a random sample from a distribution with mean µ and

variance σ 2 .
Let X̄ be the sample mean.
The statistic S 2 defined as:
Pn
i=1 (Xi − X̄ )2
S2 =
n−1
is called the sample variance. It is a random variable.
√
S= S 2 is called the sample standard deviation.

E [S 2 ]?
We compute E [S 2 ] as follows:
(Xi − X̄ )2 = (Xi − µ + µ − X̄ )2
= (Xi − µ)2 + (µ − X̄ )2 + 2(Xi − µ)(µ − X̄ )
n
X Xn n
X
(Xi − X̄ )2 = (Xi − µ)2 + (µ − X̄ )2
i=1 i=1 i=1
n
X
+2 (Xi − µ)(µ − X̄ )
i=1
The middle term is ...

n
X
(µ − X̄ )2 = n(µ − X̄ )2
i=1

E [S 2 ] (Continued)
The last term is ...

n
X n
X
2 (Xi − µ)(µ − X̄ ) = − 2(X̄ − µ) (Xi − µ)
i=1 i=1
2
= − 2n(X̄ − µ)
n
X n
X
(Xi − X̄ )2 = (Xi − µ)2 + n(µ − X̄ )2 − 2n(X̄ − µ)2
i=1 i=1
n
X
= (Xi − µ)2 − n(X̄ − µ)2
i=1
Pn 2
Pn
i=1 (Xi − X̄ ) i=1 (Xi− µ)2 n(X̄ − µ)2
= −
n−1 n−1 n−1

E [S 2 ] (Continued)
We had
Pn Pn
− X̄ )2
i=1 (Xi i=1 (Xi− µ)2 n
S2 = = − (X̄ − µ)2
n−1 n−1 n−1
Taking expectations
Pn 2
Pn 2

i=1 (Xi − X̄ ) i=1 (Xi − µ) n
E [S 2 ] = E =E −E (X̄ − µ)2
n−1 n−1 n−1
But,
Pn
− µ)2

i=1 (Xi n
E = σ2
n−1 n−1
σ2
E [(X̄ − µ)2 ] = var (X̄ ) =
n

E [S 2 ] (continued)
This implies
Pn
− X̄ )2 σ2

i=1 (Xi n
E = σ2 −
n−1 n−1 n−1
= σ2
or
E [S 2 ] = σ 2
Therefore S 2 is an unbiased estimator of σ 2 .

This is the real reason for having n − 1 in the denominator in the S 2
expression instead of n, and not really the degrees of freedom argument given
earlier.

Sampling Distributions from a Normal Population
Consider distribution of statistics (X̄ , S 2 ) obtained from samples from a Normal
population.
Theorem
If X1 , X2 , ..., Xn is a sample from a normal population having mean µ and variance
σ 2 , then X̄ and S 2 are independent random variables, with X̄ being normal with
mean µ and variance σ 2 /n and (n − 1)S 2 /σ 2 being chi-square random variable
with n − 1 degrees of freedom.
X̄ is approximately normal due to CLT in general. However in this case, it is

exactly normal since sum of normal RVs is normal (Show this using Moment
Generating Function)?
(n − 1)S 2 /σ 2 is a chisquare with n − 1 degrees of freedom.
X̄ , S 2 are independent random variables.
Proof of last two statements in book.
Independence of X̄ and S 2 holds due to normal population and is not true in
general.
Example 6.5a to illustrate use of S 2 distribution
Q The time it takes for a processor to carry out a particular computation is

normally distributed with mean 20 seconds and standard deviation 3 seconds.
If a sample of 15 such computations is observed, what is the probability that
the sample variance is greater than 12?
A n = 15, σ 2 = 32 = 9. Then, P{S 2 > 12} =

2 14
P 14S /9 > 12 = P{χ214 > 18.67}
9

Corollary
Let X1 , X2 , ..., Xn be a sample from a normal population with mean µ and variance
σ 2 . Let X̄ be the sample mean and S the sample standard deviation, then
√ X̄ − µ
n ∼ tn−1
S
Proof: t random variable with n degrees of freedom is defined as:
Z
p
χ2n /n
with Z being a standard normal RV, χ2n being a chi-square RV with n degrees of
freedom and both are independent. Consider
X̄ −µ
√
σ/ n √ X̄ − µ
q = n
(n−1)S 2 S
σ 2 (n−1)
is a t-RV with n − 1 degrees of freedom due to earlier theorem

Summary
For any sample: E [X̄ ] = µ, E [S 2 ] = σ 2

CLT: sum of IID RVs is approximately normal. As a result, X̄ is
approximately normal with mean µ and variance σ 2 /n.
When sample from normal population, then X̄ is exactly normal and
(n − 1)S 2 /σ 2 is a chi-square RV with n − 1 degrees of freedom. Further,
X̄ , S 2 are independent.
√
Also, n (X̄ −µ)S ∼ tn−1 .

THANK YOU

CL202: Introduction To Data Analysis: MB+SCP

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CL202: Introduction To Data Analysis: MB+SCP

Uploaded by

Copyright:

Available Formats

CL202: Introduction to Data Analysis

Mani Bhushan, Sachin Patwardhan

Acknowledgements: Santosh Noronha (some material from his slides)

MB+SCP (IIT Bombay) CL202 Spring 2015 1 / 33

Sampling Statistics and their Distributions

MB+SCP (IIT Bombay) CL202 Spring 2015 2 / 33

We sample to draw some conclusions about an entire population.

MB+SCP (IIT Bombay) CL202 Spring 2015 3 / 33

If X1 , ..., Xn are independent random variables having a common distribution

MB+SCP (IIT Bombay) CL202 Spring 2015 4 / 33

Simple random sample: each member of a random sample is chosen

MB+SCP (IIT Bombay) CL202 Spring 2015 5 / 33

Clinical trials are randomized:

MB+SCP (IIT Bombay) CL202 Spring 2015 6 / 33

We wish to infer the parameters of a population distribution.

MB+SCP (IIT Bombay) CL202 Spring 2015 7 / 33

Population parameters are typically in Greek.

MB+SCP (IIT Bombay) CL202 Spring 2015 8 / 33

X̄ is a random variable since it is a function of the random variables

MB+SCP (IIT Bombay) CL202 Spring 2015 9 / 33

MB+SCP (IIT Bombay) CL202 Spring 2015 10 / 33

We had var (Xi ) = σ 2 .

MB+SCP (IIT Bombay) CL202 Spring 2015 11 / 33

E [Xi ] = µ var (Xi ) = σ 2

Variance of sample mean decreases with increasing n.

MB+SCP (IIT Bombay) CL202 Spring 2015 12 / 33

i.e. for large n

Consider the total obtained on rolling several dice.

MB+SCP (IIT Bombay) CL202 Spring 2015 14 / 33

Consider the total obtained on rolling several dice.

MB+SCP (IIT Bombay) CL202 Spring 2015 15 / 33

Consider the total obtained on rolling several dice.

MB+SCP (IIT Bombay) CL202 Spring 2015 16 / 33

Consider the total obtained on rolling several dice.

MB+SCP (IIT Bombay) CL202 Spring 2015 17 / 33

Consider the total obtained on rolling several dice.

MB+SCP (IIT Bombay) CL202 Spring 2015 18 / 33

Let X = X1 + · · · + Xn , where Xi is a Bernoulli variable (1 if success and 0 if

E [Xi ] = p var (Xi ) = p(1 − p)

CLT suggests that for large n

Rule of thumb: The Binomial can be approximated well with a normal if

MB+SCP (IIT Bombay) CL202 Spring 2015 19 / 33

P{X = i} = P{i − 0.5 < X < i + 0.5}

≈ P{Z ≥ 1.59} = 0.06

Q Let X1 , ..., Xn be a sample from a population having mean µ and variance σ 2 .

E [X̄ ] = µ; var (X̄ ) = σ 2 /n

MB+SCP (IIT Bombay) CL202 Spring 2015 21 / 33

MB+SCP (IIT Bombay) CL202 Spring 2015 22 / 33

MB+SCP (IIT Bombay) CL202 Spring 2015 23 / 33

Let X1 , X2 , ..., Xn be a random sample from a distribution with mean µ and

MB+SCP (IIT Bombay) CL202 Spring 2015 24 / 33

The middle term is ...

MB+SCP (IIT Bombay) CL202 Spring 2015 25 / 33

The last term is ...

MB+SCP (IIT Bombay) CL202 Spring 2015 26 / 33

MB+SCP (IIT Bombay) CL202 Spring 2015 27 / 33

Therefore S 2 is an unbiased estimator of σ 2 .

MB+SCP (IIT Bombay) CL202 Spring 2015 28 / 33

X̄ is approximately normal due to CLT in general. However in this case, it is

Q The time it takes for a processor to carry out a particular computation is

MB+SCP (IIT Bombay) CL202 Spring 2015 30 / 33

is a t-RV with n − 1 degrees of freedom due to earlier theorem