Download as pdf or txt
Download as pdf or txt
You are on page 1of 51

PROBABILITY

DISTRIBUTIONS AND
SAMPLING DISTRIBUTIONS
Dr. Rajeev Saha
Population vs Sample
Population Sample
Population Variance
Sample Variance
Standard Deviation - Population

Population Variance = 𝜎 2
Standard Deviation - Sample

Sample Variance = 𝑆 2
Random Variable
• A random variable is a function that associates a real
number with each element in the sample space.
• If a sample space contains a finite number of possibilities
or an unending sequence with as many elements as there
are whole numbers, it is called a discrete sample space.
• If a sample space contains an infinite number of
possibilities equal to the number of points on a line
segment, it is called a continuous sample space.
• A random variable is called a discrete random variable if
its set of possible outcomes is countable.
• When a random variable can take on values on a
continuous scale, it is called a continuous random
variable.
Discrete Probability Distributions
• The set of ordered pairs (x, f(x)) is a probability function,
probability mass function, or probability distribution
of the discrete random variable X if, for each possible
outcome x,
Example
• Question: A shipment of 20 similar laptop computers to a retail outlet
contains 3 that are defective. If a school makes a random purchase of
2 of these computers, find the probability distribution for the number
of defectives.
• Solution: Let X be a random variable representing defective
computers whose values x are the possible numbers of defective
computers purchased by the school. Then x can only take the
numbers 0, 1, and 2.
Continuous Probability Distributions
• The function f(x) is a probability density function (pdf)
for the continuous random variable X, defined over the set
of real numbers, if
Some Probability Distributions
Discrete Probability Distributions
• Binomial
• Poisson
Continuous Probability Distributions
• Normal
• Exponential
• Chi Square
•t
•F
Normal Distribution
( x )2
e
(2 ) 2
f ( x)  −∞< x < ∞
 2
where π = 3.14159 . . . and e = 2.71828 . . . .

 x2
e /2
When μ=0, σ=1
f ( x) 
2
The Normal Curve
The area under the curve = 1 and the curve is symmetrical.

-4 -3 -2 -1 0 1 2 3 4
The Normal Curve
The Normal Curve
The Normal Curve
The Normal Curve
Standard Normal Distribution
Normal distribution with μ = 0 and σ = 1
Standard Normal Distribution
Normal distribution with μ = 0 and σ = 1
Whenever X assumes a value x, the corresponding value
of Z is given by z = (x − μ)/σ.
Therefore, if X falls between the values x = x1 and x = x2,
the random variable Z will fall between the corresponding
values z1 = (x1 −μ)/σ and z2 = (x2 − μ)/σ.
Standard Normal Distribution

The original and transformed normal distributions


Cumulative Distribution Function
F ( x)  P( X  x)  
x
F ( x)  

f ( x)dx

Useful MS Excel work sheet functions


NORMDIST(x,μ,σ,CUM)
NORMSDIST(z)
Area under the normal curve

84%

-4 -3 -2 -1 0 1 2 3 4
Cumulative Distribution Curve
-3.000 0.001
-2.500 0.006
-1.500 0.067
-1.000 0.159
-0.500 0.309
0.000 0.500
0.500 0.691
1.000 0.841
-4 -3 -2 -1 0 1 2 3 4
1.500 0.933
2.000 0.977
2.500 0.994
3.000 0.999
Example
• The daily demand for a product is normally distributed
with a mean of 350 and a standard deviation of 20. What
is the probability that the demand on any day will be less
than 320
• NORMDIST(320,350,20,1)=0.0668
Using the N(0,1) Distribution
• When X is normally distributed with mean of μ and
standard deviation of σ
• (x – μ)/ σ will be normally distributed with mean of 0 and
SD of 1
• (x – μ)/ σ is denoted by letter z
• (x – μ)/ σ= (320-350)/20= - 1.5
• NORMSDIST(- 1.5) will give the same result
Area under the normal curve
z area z area z area z area
-3.000 0.001 -1.400 0.081 0.200 0.579 1.800 0.964
-2.800 0.003 -1.200 0.115 0.400 0.655 2.000 0.977
-2.600 0.005 -1.000 0.159 0.600 0.726 2.200 0.986
-2.400 0.008 -0.800 0.212 0.800 0.788 2.400 0.992
-2.200 0.014 -0.600 0.274 1.000 0.841 2.600 0.995
-2.000 0.023 -0.400 0.345 1.200 0.885 2.800 0.997
-1.800 0.036 -0.200 0.421 1.400 0.919 3.000 0.999
-1.600 0.055 0.000 0.500 1.600 0.945 3.200 0.999
Sampling Distribution of the mean
• The sampling distribution of the mean refers to the
pattern of sample means that will occur as samples are
drawn from the population at large

• Example
• A study is performed to determine the number of
kilometres the average person in India drives a car in one
day. It is not possible to measure the number of
kilometres driven by every person in the population, so
researcher choose randomly a sample of 10 people and
record how far they have driven.
Sampling Distribution of the mean
• Researcher then randomly sample another 10 drivers in
India and record the same information. This is done a total
of 5 times. The results are displayed in the table below.
Sampling Distribution of the mean
• Each sample has its own mean value, and each value is
different
• We can continue this experiment by selecting and
measuring more samples and observe the pattern of
sample means
• This pattern of sample means represents the sampling
distribution for the number of kilometres the average
person in India drives in a day.
Sampling Distribution of the mean
• The distribution from this example represents the
sampling distribution of the mean because the mean of
each sample was the measurement of interest
• What happens to the sampling distribution if we
increase the sample size?
• Don’t confuse sample size (n) and the number of
samples. In the previous example, the sample size equals
10 and the number of samples was 5.
The Central Limit Theorem
• What happens to the sampling distribution if we increase
the sample size?

• As the sample size (n) gets larger, the sample means


tend to follow a normal probability distribution

• As the sample size (n) gets larger, the sample means


tend to cluster around the true population mean

• Holds true, regardless of the distribution of the population


from which the sample was drawn
Standard Error of the Mean
• As the sample size increases, the distribution of sample
means tends to converge closer together –to cluster
around the true population mean

• Therefore, as the sample size increase, the standard


deviation of the sample means decreases

• The standard error of the mean is the standard deviation


of the sample means
Standard Error of the Mean
• Standard error can be calculated as follows:
Standard Error of the Mean
• In many applications, the true value of (the SD of the
population) is unknown

• SE can be estimated using the sample SD


Using the Central Limit Theorem
• If we know that the sample means follow the normal
probability distribution

• And we can calculate the mean and standard deviation of


that distribution

• We can predict the likelihood that the sample means will


be more or less than certain values
Exponential Distribution
• Exponential Distribution is used in Reliability Analysis the
time for failure of a component or a system
• Probability density function is given by
f(x)=le-lx Where l denotes the average failure rate
F(k)= P(X<k)
=1-e-lk
Exponential Distribution
Example
• An electronic component has an average life of 500
hours. What is the probability that a component will last
for 600 hour or more
l1/500
X is the life of the component
P(X> 600)=1-e(1/500)600 = 0.301
Poisson Distribution
• When time between failures has an exponential
distribution, the number of failures in unit time will have a
Poisson Distribution
• Suppose a random variable X has a Poisson distribution
with average of l . The probability that x will be have a
value of r is given by

r l
le
P(r ) 
r!
Poisson Distribution (Cont)
• Suppose a machine breaks down, on the average,
once in 10 days. What is the probability that it will
break down four times in a month of 30 days
l 3 r l
l e 34 e 3
P(x=4) =  = 0.168
r! r!

Worksheet function
poisson(x,MEAN, CUM) returns
Poisson Probabilities
0.250

0.200
0 0.050 0.050
1 0.149 0.199 0.150

2 0.224 0.423 0.100

3 0.224 0.647 0.050

4 0.168 0.815 0.000


0 2 4 6 8 10 12
5 0.101 0.916
6 0.050 0.966
1.200
7 0.022 0.988
1.000
8 0.008 0.996 0.800

9 0.003 0.999 0.600

10 0.001 1.000 0.400

0.200

0.000
0 2 4 6 8 10 12
Binomial Distribution

Probability of r successes in n
trials is given by

nr
P (r )  Cr P (1  p )
n r

Worksheet function
binomdist(x, n, p, CUM)
Returns Binomial Probabilities
Example
• A coin is tossed 10 times. What is the
probability of 0,1,2,3,4,5,6,7,8,9 and 10 heads?

10  r
P (r )  Cr (0.5) (1  0.5)
10 r
Probability of r heads when a coin is
tossed 10 times
0 0.001 0.001 0.300

0.250
1 0.010 0.011 0.200

2 0.044 0.055 0.150

0.100

3 0.117 0.172 0.050

0.000
4 0.205 0.377 0 2 4 6 8 10 12

5 0.246 0.623
1.200
6 0.205 0.828 1.000

7 0.117 0.945 0.800

0.600

8 0.044 0.989 0.400

9 0.010 0.999 0.200

0.000

10 0.001 1.000 0 2 4 6 8 10 12
Example
• A lot contains 50 products out of which 5 are
defective. What is the probability that a random
sample of 10 will not contain any defective units?
• n=10, p=0.10

50  r
P (r )  Cr (0.1) (1  0.1)
50 r

= BINOMDIST(0, 10, 0.1, 0)


=0.3487
Poisson Assumption of Binomial Distribution

• A Binomial distribution may be approximated by


a Poisson distribution when n is large and p is
small so that np is a small quantity (< 10)
• N=10, p=0.1 np=1
• P(0)= POISSON(0,1,0) = 0.3678
Normal Assumption to Binomial Distribution
• Binomial Distribution with parameters n and p will have:
• Mean of np
• Variance of np(1-p)
• It can be approximated to a normal distribution with the
same parameters
An Example
• A machine produces 10 % defectives. What is the
probability that a sample of 100 products will show 15
defectives or less?
• Binomial distribution
• Mean = 10
• Variance = 100*0.1*0.9=9
• SD=3
• Assuming a normal distribution with same parameters
• Z=(15-10)/3=5/3=1.66
• P( No. of defectives < 15)=0.9515
Example
• The daily demand for a product is is 400 with a
standard deviation of 20 units. What is the minimum
opening inventory required to meet the demand with a
probability of 90%?

• For service rate of 90 %


• Z = NORMINV(0.90)=1.281
• (X-400)/20=1.281
• X=425.6 or 427
Example
• A time to complete an order is normally distributed
with a mean of 100 days with a sd of 10 days. What
delivery time should you suggest to if the risk of
inability to meet the due date is limited to 5 %?

• Survival Function S=0.05


• Cumulative probability function φ = 0.95
• Delivery period = NORMINV(0.95,100,10)
• =116.5
Recommended Text Book
Probability and Statistics for
Engineers and Scientists
(7th Edition)
Ronald Walpole
Raymond H. Myers
Sharon L. Meyers
Keying Ye
Pearson Education Singapore

You might also like