Download as pdf or txt
Download as pdf or txt
You are on page 1of 58

QM1a - Session 10

July - Aug 2022

Maya Ganesh, Assistant Professor

Production & Quantitative Methods, IIM Ahmedabad

1/32
Recap

Maya Ganesh, Assistant Professor QM1a - Session 10 July - Aug 2022 1 / 32


Sample statistics

Maya Ganesh, Assistant Professor QM1a - Session 10 July - Aug 2022 2 / 32


Sample statistics

A sample statistic is a characteristic of the sample

Maya Ganesh, Assistant Professor QM1a - Session 10 July - Aug 2022 2 / 32


Sample statistics

A sample statistic is a characteristic of the sample

A population parameter is a characteristic of the population

Maya Ganesh, Assistant Professor QM1a - Session 10 July - Aug 2022 2 / 32


Sample statistics

A sample statistic is a characteristic of the sample

A population parameter is a characteristic of the population

We use different notations to distinguish between the two groups of


numbers

Maya Ganesh, Assistant Professor QM1a - Session 10 July - Aug 2022 2 / 32


Sample statistics

A sample statistic is a characteristic of the sample

A population parameter is a characteristic of the population

We use different notations to distinguish between the two groups of


numbers

Maya Ganesh, Assistant Professor QM1a - Session 10 July - Aug 2022 2 / 32


Sample

Maya Ganesh, Assistant Professor QM1a - Session 10 July - Aug 2022 3 / 32


Sampling variation

Sample mean varies from one sample to another

Maya Ganesh, Assistant Professor QM1a - Session 10 July - Aug 2022 4 / 32


Sampling variation

Sample mean varies from one sample to another

Sample mean can be (and most likely is) different from the popula-
tion mean

Maya Ganesh, Assistant Professor QM1a - Session 10 July - Aug 2022 4 / 32


Sampling variation

Sample mean varies from one sample to another

Sample mean can be (and most likely is) different from the popula-
tion mean

Sample mean is a random variable

Maya Ganesh, Assistant Professor QM1a - Session 10 July - Aug 2022 4 / 32


Sampling variation

Sample mean varies from one sample to another

Sample mean can be (and most likely is) different from the popula-
tion mean

Sample mean is a random variable

Maya Ganesh, Assistant Professor QM1a - Session 10 July - Aug 2022 4 / 32


Sample mean if population is normally distributed

Suppose that the random variables X1 , X2 , X3 , ....Xn form a random


sample from the normal distribution with mean µ and variance ‡ 2 ,
and let X̄ denote their sample mean. Then:

Maya Ganesh, Assistant Professor QM1a - Session 10 July - Aug 2022 7 / 32


Sample mean if population is normally distributed

Suppose that the random variables X1 , X2 , X3 , ....Xn form a random


sample from the normal distribution with mean µ and variance ‡ 2 ,
and let X̄ denote their sample mean. Then:

‡2
X̄ has the normal distribution with mean µ and variance
n

Maya Ganesh, Assistant Professor QM1a - Session 10 July - Aug 2022 7 / 32


Sample mean if population is normally distributed

Suppose that the random variables X1 , X2 , X3 , ....Xn form a random


sample from the normal distribution with mean µ and variance ‡ 2 ,
and let X̄ denote their sample mean. Then:

‡2
X̄ has the normal distribution with mean µ and variance
n

! ‡2 "
X̄ ≥ N µ,
n

Maya Ganesh, Assistant Professor QM1a - Session 10 July - Aug 2022 7 / 32


Central limit theorem

The distribution of the sample mean:

I will be normal when the distribution of data in the population


is normal
I will be approximately normal even if the distribution of data in
the population is not normal,under some conditions

Maya Ganesh, Assistant Professor QM1a - Session 10 July - Aug 2022 8 / 32


Central limit theorem

The distribution of the sample mean:

I will be normal when the distribution of data in the population


is normal
I will be approximately normal even if the distribution of data in
the population is not normal,under some conditions

Mean of sample mean (i.e. mean of X̄ ) = µ (the same as the


population mean of the raw data)

Maya Ganesh, Assistant Professor QM1a - Session 10 July - Aug 2022 8 / 32


Central limit theorem

The distribution of the sample mean:

I will be normal when the distribution of data in the population


is normal
I will be approximately normal even if the distribution of data in
the population is not normal,under some conditions

Mean of sample mean (i.e. mean of X̄ ) = µ (the same as the


population mean of the raw data)

Standard deviation of sample mean (i.e SD of X̄ ) = Ô , where ‡
n
is the population standard deviation and n is the sample size

Maya Ganesh, Assistant Professor QM1a - Session 10 July - Aug 2022 8 / 32


Central limit theorem - Be careful!

Maya Ganesh, Assistant Professor QM1a - Session 10 July - Aug 2022 9 / 32


Central limit theorem - Key components
The central limit theorem is valid when,
I Randomization- Sampling is random. The data must be sam-
pled randomly such that every member in a population has the
same statistical possibility of being selected. of being selected
to be in the sample.

Maya Ganesh, Assistant Professor QM1a - Session 10 July - Aug 2022 10 / 32


Central limit theorem - Key components
The central limit theorem is valid when,
I Randomization- Sampling is random. The data must be sam-
pled randomly such that every member in a population has the
same statistical possibility of being selected. of being selected
to be in the sample.
I Independence - Each data point in the sample is independent
of the other

Maya Ganesh, Assistant Professor QM1a - Session 10 July - Aug 2022 10 / 32


Central limit theorem - Key components
The central limit theorem is valid when,
I Randomization- Sampling is random. The data must be sam-
pled randomly such that every member in a population has the
same statistical possibility of being selected. of being selected
to be in the sample.
I Independence - Each data point in the sample is independent
of the other
I 10% condition - It’s often cited that a sample should be no
more than 10% of a population if sampling is done without
replacement.

Maya Ganesh, Assistant Professor QM1a - Session 10 July - Aug 2022 10 / 32


Central limit theorem - Key components
The central limit theorem is valid when,
I Randomization- Sampling is random. The data must be sam-
pled randomly such that every member in a population has the
same statistical possibility of being selected. of being selected
to be in the sample.
I Independence - Each data point in the sample is independent
of the other
I 10% condition - It’s often cited that a sample should be no
more than 10% of a population if sampling is done without
replacement.
I Large Sample Condition - The sample size is large enough

Maya Ganesh, Assistant Professor QM1a - Session 10 July - Aug 2022 10 / 32


Central limit theorem - Key components
The central limit theorem is valid when,
I Randomization- Sampling is random. The data must be sam-
pled randomly such that every member in a population has the
same statistical possibility of being selected. of being selected
to be in the sample.
I Independence - Each data point in the sample is independent
of the other
I 10% condition - It’s often cited that a sample should be no
more than 10% of a population if sampling is done without
replacement.
I Large Sample Condition - The sample size is large enough
I A sample size of 30 is usually considered large enough but there
are more precise conditions (depending on skewness, kurtosis
etc)
I Adequate sample size depends on the distribution of data –
primarily its symmetry and presence of outliers
I If data is quite symmetric and has few outliers, even smaller
samples are fine. Otherwise, we need larger samples
Maya Ganesh, Assistant Professor QM1a - Session 10 July - Aug 2022 10 / 32
Central limit theorem - Example

Suppose salaries at a very large corporation have a mean of INR


62,000 and a standard deviation of INR 32,000.

If a single employee is randomly selected, what is the probability


their salary exceeds INR 66,000?

Maya Ganesh, Assistant Professor QM1a - Session 10 July - Aug 2022 11 / 32


Central limit theorem - Example

Suppose salaries at a very large corporation have a mean of INR


62,000 and a standard deviation of INR 32,000.

If a single employee is randomly selected, what is the probability


their salary exceeds INR 66,000?

It is impossible to answer this question without knowing the distri-


bution of salaries!

Maya Ganesh, Assistant Professor QM1a - Session 10 July - Aug 2022 11 / 32


Central limit theorem - Example

Suppose salaries at a very large corporation have a mean of INR


62,000 and a standard deviation of INR 32,000.

If a single employee is randomly selected, what is the probability


their salary exceeds INR 66,000?

It is impossible to answer this question without knowing the distri-


bution of salaries!

If 100 employees are randomly selected, what is the probability their


average salary exceeds INR 66,000?

Maya Ganesh, Assistant Professor QM1a - Session 10 July - Aug 2022 11 / 32


Central limit theorem - Example

Suppose salaries at a very large corporation have a mean of INR


62,000 and a standard deviation of INR 32,000.

If a single employee is randomly selected, what is the probability


their salary exceeds INR 66,000?

It is impossible to answer this question without knowing the distri-


bution of salaries!

If 100 employees are randomly selected, what is the probability their


average salary exceeds INR 66,000?

Maya Ganesh, Assistant Professor QM1a - Session 10 July - Aug 2022 11 / 32


Bernoulli random variables

I Takes 1 with probability p


I Takes 0 with probability 1 ≠ p
I Mean: p
I Variance: p (1 ≠ p)

What is sum of Bernoulli random variables?

Maya Ganesh, Assistant Professor QM1a - Session 10 July - Aug 2022 12 / 32


Bernoulli random variables

I Takes 1 with probability p


I Takes 0 with probability 1 ≠ p
I Mean: p
I Variance: p (1 ≠ p)

What is sum of Bernoulli random variables?

Binomial

Maya Ganesh, Assistant Professor QM1a - Session 10 July - Aug 2022 12 / 32


Bernoulli random variables

I Takes 1 with probability p


I Takes 0 with probability 1 ≠ p
I Mean: p
I Variance: p (1 ≠ p)

What is sum of Bernoulli random variables?

Binomial

Maya Ganesh, Assistant Professor QM1a - Session 10 July - Aug 2022 12 / 32


Sample proportion is also normally distributed
I n is the number of trials and p is the possibility of getting 1

Maya Ganesh, Assistant Professor QM1a - Session 10 July - Aug 2022 13 / 32


Sample proportion is also normally distributed
I n is the number of trials and p is the possibility of getting 1

I X is the total number of times you get 1

Maya Ganesh, Assistant Professor QM1a - Session 10 July - Aug 2022 13 / 32


Sample proportion is also normally distributed
I n is the number of trials and p is the possibility of getting 1

I X is the total number of times you get 1

I Clearly, X ≥ Bin(n, p)

Maya Ganesh, Assistant Professor QM1a - Session 10 July - Aug 2022 13 / 32


Sample proportion is also normally distributed
I n is the number of trials and p is the possibility of getting 1

I X is the total number of times you get 1

I Clearly, X ≥ Bin(n, p)

X X1 + X2 + ...Xn
I Sample proportion, p̂ = =
n n

Maya Ganesh, Assistant Professor QM1a - Session 10 July - Aug 2022 13 / 32


Sample proportion is also normally distributed
I n is the number of trials and p is the possibility of getting 1

I X is the total number of times you get 1

I Clearly, X ≥ Bin(n, p)

X X1 + X2 + ...Xn
I Sample proportion, p̂ = =
n n

(Sample proportion is nothing but the sample mean of indicator


variables)

Maya Ganesh, Assistant Professor QM1a - Session 10 July - Aug 2022 13 / 32


Sample proportion is also normally distributed
I n is the number of trials and p is the possibility of getting 1

I X is the total number of times you get 1

I Clearly, X ≥ Bin(n, p)

X X1 + X2 + ...Xn
I Sample proportion, p̂ = =
n n

(Sample proportion is nothing but the sample mean of indicator


variables)

X
I E [p̂] = E [ ]=p
n

Maya Ganesh, Assistant Professor QM1a - Session 10 July - Aug 2022 13 / 32


Sample proportion is also normally distributed
I n is the number of trials and p is the possibility of getting 1

I X is the total number of times you get 1

I Clearly, X ≥ Bin(n, p)

X X1 + X2 + ...Xn
I Sample proportion, p̂ = =
n n

(Sample proportion is nothing but the sample mean of indicator


variables)

X
I E [p̂] = E [ ]=p
n

X p (1 ≠ p)
I Var (p̂) = Var ( )=
n n
Maya Ganesh, Assistant Professor QM1a - Session 10 July - Aug 2022 13 / 32
Normal approximation of the binomial distribution

If X is binomially distributed with parameters n and p, then,

Maya Ganesh, Assistant Professor QM1a - Session 10 July - Aug 2022 16 / 32


Normal approximation of the binomial distribution

If X is binomially distributed with parameters n and p, then,

X has the same distribution as the sum of n independent Bernoulli


random variables, each with parameter p

Maya Ganesh, Assistant Professor QM1a - Session 10 July - Aug 2022 16 / 32


Normal approximation of the binomial distribution

If X is binomially distributed with parameters n and p, then,

X has the same distribution as the sum of n independent Bernoulli


random variables, each with parameter p

And, from central limit theorem, we have the distribution of mean


of Bernoulli random variables normal distribution as n approaches
Œ.

Maya Ganesh, Assistant Professor QM1a - Session 10 July - Aug 2022 16 / 32


Normal approximation of the binomial distribution

If X is binomially distributed with parameters n and p, then,

X has the same distribution as the sum of n independent Bernoulli


random variables, each with parameter p

And, from central limit theorem, we have the distribution of mean


of Bernoulli random variables normal distribution as n approaches
Œ.
p (1 ≠ p)
p̂ ≥ N(p, ) approximately
n
and, Number of occurences, X ≥ N(np, np(1 ≠ p))

Essentially, Bin(n, p) can be approximated by normal distribution for


large n.

Maya Ganesh, Assistant Professor QM1a - Session 10 July - Aug 2022 16 / 32


Poisson or Normal?

But didn’t you in an earlier session say that Poisson approximates


Binomial??

Maya Ganesh, Assistant Professor QM1a - Session 10 July - Aug 2022 17 / 32


Poisson or Normal?

I The Poisson approximation for Binomial is not so good if p is


close to 0.5. It is good if n is large and p is close to 0. How
close p should be and how large n should be, depends on how
much accurate we want. In general, if p Æ 0.1 and n Ø 50, the
approximation is quite good. (np < 5 or n(1 ≠ p) < 5)

Maya Ganesh, Assistant Professor QM1a - Session 10 July - Aug 2022 18 / 32


Poisson or Normal?

I The Poisson approximation for Binomial is not so good if p is


close to 0.5. It is good if n is large and p is close to 0. How
close p should be and how large n should be, depends on how
much accurate we want. In general, if p Æ 0.1 and n Ø 50, the
approximation is quite good. (np < 5 or n(1 ≠ p) < 5)

I The Normal approximation for Binomial is ‘not so good’ if p is


not close to 0.5 (when n is not sufficiently large). It is good
even for moderately large n if p is close to 0.5. How close p
should be and how large n should be, depends on how much
accurate we want. In general, if 0.4 Æ p Æ 0.6 and n Ø 30, the
approximation is quite satisfatory. np > 5 or n(1 ≠ p) > 5)

Maya Ganesh, Assistant Professor QM1a - Session 10 July - Aug 2022 18 / 32


Poisson or Normal?

I The Poisson approximation for Binomial is not so good if p is


close to 0.5. It is good if n is large and p is close to 0. How
close p should be and how large n should be, depends on how
much accurate we want. In general, if p Æ 0.1 and n Ø 50, the
approximation is quite good. (np < 5 or n(1 ≠ p) < 5)

I The Normal approximation for Binomial is ‘not so good’ if p is


not close to 0.5 (when n is not sufficiently large). It is good
even for moderately large n if p is close to 0.5. How close p
should be and how large n should be, depends on how much
accurate we want. In general, if 0.4 Æ p Æ 0.6 and n Ø 30, the
approximation is quite satisfatory. np > 5 or n(1 ≠ p) > 5)

Poisson approximation to Binomial

Normal approximation to Binomial


Maya Ganesh, Assistant Professor QM1a - Session 10 July - Aug 2022 18 / 32
Poisson approximation - Example

A factory puts biscuits into boxes of 100. The probability that a


biscuit is broken is 0.03. Find the probability that a box contains 2
broken biscuits

Maya Ganesh, Assistant Professor QM1a - Session 10 July - Aug 2022 19 / 32


Normal approximation - Example

Based on past experience, 7% of all luncheon vouchers are in error.


If a random sample of 400 vouchers is selected, what is the approx-
imate probability that fewer than 25 are in error?

Maya Ganesh, Assistant Professor QM1a - Session 10 July - Aug 2022 20 / 32


Normal approximation - Continuity Correction factor

In order for a continuous distribution (like the normal) to be used


to approximate a discrete one (like the binomial), a continuity cor-
rection should be used.
I First, remember that a discrete random variable can only take
on only specified values, whereas a continuous random variable
used to approximate it can take on any values whatsoever within
an interval around those specified values

Maya Ganesh, Assistant Professor QM1a - Session 10 July - Aug 2022 21 / 32


Normal approximation - Continuity Correction factor

In order for a continuous distribution (like the normal) to be used


to approximate a discrete one (like the binomial), a continuity cor-
rection should be used.
I First, remember that a discrete random variable can only take
on only specified values, whereas a continuous random variable
used to approximate it can take on any values whatsoever within
an interval around those specified values

I Second, remember that with a continuous distribution, the


probability of obtaining a particular value of a random variable
is zero. On the other hand, when the normal approximation is
used to approximate a discrete distribution, a continuity cor-
rection should be employed so that we can approximate the
probability of a specific value of the discrete distribution

Maya Ganesh, Assistant Professor QM1a - Session 10 July - Aug 2022 21 / 32


Normal approximation - Continuity Correction factor table

Maya Ganesh, Assistant Professor QM1a - Session 10 July - Aug 2022 22 / 32


Normal approximation - Example - cont.

Based on past experience, 7% of all luncheon vouchers are in error.


If a random sample of 400 vouchers is selected, what is the approx-
imate probability that fewer than 25 are in error?

Applying the continuity correction factor,

Maya Ganesh, Assistant Professor QM1a - Session 10 July - Aug 2022 23 / 32


Example
We load on a plane 100 packages whose weights are independent
random variables that are uniformly distributed between 5 and 50
pounds. What is the probability that the total weight will exceed
3000 pounds?

Maya Ganesh, Assistant Professor QM1a - Session 10 July - Aug 2022 26 / 32


Example
We load on a plane 100 packages whose weights are independent
random variables that are uniformly distributed between 5 and 50
pounds. What is the probability that the total weight will exceed
3000 pounds?

It is not easy to calculate the CDF of the total weight and the desired
probability, but an approximate answer can be quickly obtained using
the central limit theorem.

Maya Ganesh, Assistant Professor QM1a - Session 10 July - Aug 2022 26 / 32


Example
We load on a plane 100 packages whose weights are independent
random variables that are uniformly distributed between 5 and 50
pounds. What is the probability that the total weight will exceed
3000 pounds?

It is not easy to calculate the CDF of the total weight and the desired
probability, but an approximate answer can be quickly obtained using
the central limit theorem.

We want to calculate P(S100 > 3000), where S100 is the sum of the
100 packages

Maya Ganesh, Assistant Professor QM1a - Session 10 July - Aug 2022 26 / 32


Example
We load on a plane 100 packages whose weights are independent
random variables that are uniformly distributed between 5 and 50
pounds. What is the probability that the total weight will exceed
3000 pounds?

It is not easy to calculate the CDF of the total weight and the desired
probability, but an approximate answer can be quickly obtained using
the central limit theorem.

We want to calculate P(S100 > 3000), where S100 is the sum of the
100 packages

Maya Ganesh, Assistant Professor QM1a - Session 10 July - Aug 2022 26 / 32


Example
The monthly sales of a vegetable vendor is on an average 5000 INR
with a standard deviation of 1000 INR. Assume that the distribution
of monthly sales is approximately normally distributed. The veg-
etable vendor decides to source directly from farmers to get fresh
produce from next year onwards. If there is a 7% sales increase due
to this, what will be the probability that monthly sales for January
next year will be greater than 6000 INR?

Maya Ganesh, Assistant Professor QM1a - Session 10 July - Aug 2022 28 / 32


SO LONG! :)

Maya Ganesh, Assistant Professor QM1a - Session 10 July - Aug 2022 32 / 32

You might also like