Download as pdf or txt
Download as pdf or txt
You are on page 1of 58

BUSINESS ANALYTICS

Common Probability Distributions


Introduction
Several specific distributions commonly occur in a variety of business
situations:
◦ Normal distribution—a continuous distribution characterized by a
symmetric bell-shaped curve
◦ Binomial distribution—a discrete distribution that is relevant when
we sample from a population with only two types of members or
when we perform a series of independent, identical experiments
with only two possible outcomes
◦ Poisson distribution—a discrete distribution that describes the
number of events in any period of time
◦ Exponential distributions—a continuous distribution that describes
the times between events
◦ And many others …
The Normal Distribution
The single most important distribution in statistics is the normal
distribution.
◦ It is a continuous distribution and is the basis of the familiar
symmetric bell-shaped curve.
◦ Any particular normal distribution is specified by its mean and
standard deviation.
◦ By changing the mean, the normal curve shifts to the right or left.
◦ By changing the standard deviation, the curve becomes more or
less spread out.
◦ There are really many normal distributions, not just a single one.
◦ The normal distribution is a two-parameter family, where the two
parameters are the mean and standard deviation.
Continuous Distributions and
Density Functions (slide 1 of 2)
For continuous distributions, instead of a list of possible values, there is
a continuum of possible values, such as all values between 0 and 100 or
all values greater than 0.
◦ Instead of assigning probabilities to each individual value in the
continuum, the total probability of 1 is spread over this continuum.
◦ The key to this spreading is called a density function, which acts like a
histogram.
◦ The higher the value of the density function, the more likely this
region of the continuum is.
Continuous Distributions and
Density Functions (slide 2 of 2)
A density function, usually denoted by f(x), specifies the probability
distribution of a continuous random variable X.
◦ The higher f(x) is, the more likely x is.
◦ The total area between the graph of f(x) and the horizontal axis, which
represents the total probability, is equal to 1.
◦ f(x) is nonnegative for all possible values of X.
◦ Probabilities are found from a density function as areas under the curve.
The Normal Density
The normal distribution is a continuous distribution with possible values
ranging over the entire number line—from “minus infinity” to “plus infinity.”
◦ Only a relatively small range has much chance of occurring.
◦ The normal density function is actually quite complex, in spite of its “nice”
bell-shaped appearance.
The formula for the normal density function, where μ and σ are the mean and
standard deviation, is:
Standardizing: Z-Values
The standard normal distribution has mean 0 and standard deviation 1,
so it is denoted by N(0,1).
◦ It is also referred to as the Z distribution.
To standardize a variable, subtract its mean and then divide the
difference by the standard deviation:

◦ A Z-value is the number of standard deviations to the right or left of


the mean.
◦ If Z is positive, the original value is to the right of the mean.
◦ If Z is negative, the original value is the left of the mean.
Example 1: Standardizing.xlsx
Objective: To use Excel to standardize annual returns of various mutual funds.
Solution: Data set includes the annual returns of 30 mutual funds.
Calculate the mean and standard deviation of each annual return and then
use the standardizing formula to calculate the corresponding Z-value.
OR calculate the Z-values directly, using Excel’s STANDARDIZE function.
Normal Tables and Z-Values
A common use for Z-values and the standard normal distribution is in
calculating probabilities and percentiles by the traditional method.
◦ This method is based on a table of the standard normal distribution found
in many statistics textbooks. An example of such a table is given below.
◦ The body of the table contains probabilities.
◦ The left and top margins contain possible values.
Normal Calculations in Excel
Two types of calculations are typically made with
normal distributions: finding probabilities and
finding percentiles.
◦ The functions used for normal probability calculations are NORMDIST and
NORMSDIST.
◦ The main difference between these is that the one with the
“S” (for standardized) applies only to N(0, 1) calculations, whereas
NORMDIST applies to any normal distribution.
◦ Percentile calculations that take a probability and
return a value are often called inverse calculations.
◦ The Excel functions for these are named NORMINV and NORMSINV.
◦ Again, the “S” in the second of these indicates that it
applies to the standard normal distribution.
Example 2: Normal Calculations.xlsx
(slide 1 of 2)

Objective: To calculate probabilities and percentiles for standard normal and


general normal distributions in Excel.
Solution: For “less than” probabilities, use NORMDIST or NORMSDIST directly.
For “greater than” probabilities, subtract the NORMDIST or NORMSDIST
function from 1.
For “between” probabilities, subtract the two NORMDIST or NORMSDIST
functions.
For percentile calculations, use the NORMINV or NORMSINV function with the
specified probability as the first argument.
Example 2: Normal Calculations.xlsx
(slide 2 of 2)
Empirical Rules Revisited
Three empirical rules apply to many data sets:
◦ About 68% of the data fall within one standard deviation of the
mean.
◦ About 95% fall within two standard deviations of the mean.
◦ Almost all fall within three standard deviations of the mean.
For these rules to hold with real data, the distribution of the data must
be at least approximately symmetric and bell-shaped.
Weighted Sums of Normal
Random Variables
One very attractive property of the normal distribution is that if you
create a weighted sum of normally distributed random variables, the
weighted sum is also normally distributed.
◦ This is true even if the random variables are not independent.
◦ If X1 through Xn are n independent and normally distributed random
variables with common mean μ and common standard deviation σ,
then the sum X1 + … + Xn is normally distributed with mean nμ,
variance nσ2, and standard deviation √nσ.
◦ If a1 through an are any constants, then the weighted sum a1X1 + … +
anXn is normally distributed with mean a1μ1 + … + anμn and variance
a21 σ21 + … + a2n σ2n.
Example 3: Personnel Decisions.xlsx
Objective: To determine test scores that can be used to accept or reject job
applicants at ZTel.
Solution: Scores of all applicants are approximately normally distributed with
mean 525 and standard deviation 55.
Calculate the percentage of applicants who are automatic accepts or rejects,
given the current standards of 600 for automatic accept and 425 for
automatic reject.
Find new cutoff values that reject 10% and accept 15% of applicants.
Example 4: Paper Machine
Settings.xlsx
Objective: To determine the machine settings that result in paper of acceptable
quality at PaperStock Company.
Solution: A given roll of paper must be rejected if its actual fiber content is less
than 19.8 pounds or greater than 20.3 pounds.
The variability in fiber content is 0.10 pound when the process is “good,” but
increases to 0.15 pound when the machine goes “bad.”
Calculate the probability that a given roll is rejected, for a setting of μ = 20, when
the machine is “good” and when it is “bad.”
Example 5: Tax on Stock Returns.xlsx
Objective: To determine the after-tax profit Howard Davis can be 90% certain
of earning.
Solution: Howard is in the 33% tax bracket, so his after-tax profit is 67% of his
before-tax profit. He invests $10,000 in a certain stock, whose annual return is
normally distributed with mean 5% and standard deviation 14%.
Calculate the dollar amount such that Howard’s after-tax profit is 90% certain
to be less than this amount; that is, calculate the 90th percentile of his after-
tax profit.
Example 6: Oven Demand
Simulation.xlsx (slide 1 of 3)
Objective: To construct and analyze a spreadsheet model for
microwave oven demand over the next 12 years using Excel’s
NORMINV function, and to show how models using the normal
distribution can lead to nonsensical outcomes unless they are
modified appropriately.
Solution: Using historical data, the company assumes that
demand in year 1 is normally distributed with mean 5000 and
standard deviation 1500.
It also assumes that demand in each subsequent year is normally
distributed with mean equal to the actual demand from the
previous year and standard deviation 1500.
Example 6: Oven Demand
Simulation.xlsx (slide 2 of 3)
Using this model may lead to nonsensical results as shown
below:
Example 6: Oven Demand
Simulation.xlsx (slide 3 of 3)
One way to modify the model is to let the standard
deviation and mean move together. That is, if the mean is low, then
the standard deviation will also be low.
To be even safer, it is possible to truncate the demand distribution at some
nonnegative value such as 250, as shown below.
The Binomial Distribution
The binomial distribution is a discrete distribution that can occur in two
situations:
◦ When sampling from a population with only two types of members
(males and females, for example)
◦ When performing a sequence of identical experiments, each of
which has only two possible outcomes
Consider a situation where there are n independent, identical trials,
where the probability of a success on each trial is p and the probability
of a failure is 1 – p.
◦ Define X to be the random number of successes in the n trials.
◦ Then X has a binominal distribution with parameters n and p, i.e.:
𝑛
𝑃(𝑋 = 𝑥) = 𝑥
𝑝 𝑥 (1 − 𝑝)𝑛−𝑥 , 𝑥 = 0,1,2, … , 𝑛
Example 7: Binomial Calculations.xlsx
Objective: To use Excel’s BINOMDIST and CRITBINOM functions for calculating
binomial probabilities and percentiles in the context of flashlight batteries.
Solution: Let X be the number of successes in 100 trials of flashlight batteries,
where a success means that the battery is still functioning after eight hours.

Find the probabilities of various events,


using the BINOMDIST function, as
shown in the spreadsheet below.

Find the 95th percentile of the


distribution of X, using the CRITBINOM
function.
Mean and Standard Deviation
of the Binomial Distribution
It can be shown that the mean and standard deviation of a binomial
distribution with parameters n and p are given by the following
equations.

The empirical rules discussed in Chapter 2 also apply, at least


approximately, to the binomial distribution.
◦ There is about a 95% chance that the actual number of successes will be
within two standard deviations of the mean.
◦ There is almost no chance that the number of successes will be more than
three standard deviations from the mean.
The Binomial Distribution in the
Context of Sampling
If sampling is done without replacement, each member of the
population can be sampled only once.
◦ That is, once a person is sampled, his or her name is struck from the
list and cannot be sampled again.
If sampling is done with replacement, then it is possible, although
maybe not likely, to select a given member of the population more than
once.
Most real-world sampling is performed without replacement – see
hypergeometric distribution.
The binomial model applies only to sampling with replacement.
◦ However, if no more than 10% of the population is sampled, the
binomial model can be used safely even if sampling is performed
without replacement.
The Normal Approximation
to the Binomial
If you graph the binomial probabilities, you will see an interesting
phenomenon: the graph begins to look symmetric and bell-shaped when n is
fairly large and p is not too close to 0 or 1.
◦ The normal distribution provides a very good approximation to the binomial under
these conditions.
◦ One practical consequence of the normal approximation to the binomial is that the
empirical rules apply very well to binomial distributions.
Example 8: Beating the Market.xlsx
Objective: To determine the probability of a mutual fund outperforming a
standard market index at least 37 out of 52 weeks.
Solution: The number of weeks where a given fund outperforms the market
index is binomially distributed with n = 52 and p = 0.5. This probability is quite
small (0.00159).

Now let Y be the number of the 400


best mutual funds that beat the market
at least 37 of 52 weeks. Y is also
binomially distributed, with parameters
n = 400 and p = 0.00159. The resulting
probability is nearly 0.5.
Example 9: Supermarket Spending.xlsx

Objective: To use the normal and binomial distributions to calculate the


typical number of customers who spend at least $100 per day and the
probability that at least 30% of all 500 daily customers spend at least $100.
Solution: Historical data indicate that the amount spent per customer is
normally distributed with mean $85 and standard deviation $30.
If 500 customers shop in a given day, calculate the mean and standard
deviation of the number who spend at least $100.

Then calculate the probability that


at least 30% of the 500 customers
spend at least $100. This is the
probability that a binomially
distributed random variable, with n
= 500 and p = 0.309, is at least 150.
Example 10: Airline Overbooking.xlsx
(slide 1 of 2)

Objective: To assess the benefits and drawbacks of airline overbooking.


Solution: Assume that the no-show rate is 10%—that is, each ticketed
passenger shows up with probability 0.90.
For a flight with 200 seats, calculate the probability that more than 205
passengers show up; that more than 200 passengers show up; that at least
195 seats are filled; and that at least 190 seats are filled.
Use the BINOMDIST function and a data table to determine the probabilities.
Example 10: Airline Overbooking.xlsx
(slide 2 of 2)

To see how sensitive these probabilities are to the number of tickets issued,
create a one-way data table, as shown at the bottom of the spreadsheet
below.
Example 11: Election Returns.xlsx
Objective: To use a binomial model to determine whether early returns
reflect the eventual winner of an election between two candidates.
Solution: Suppose that a small percentage of the votes have been counted
and the Republican is currently ahead 540 to 460. On what basis can the
networks declare the Republican the winner, if there are millions of voters?

Use a binomial model to see


how unlikely the event “at
least 540 out of 1000” is,
assuming that the Democrat
will be the eventual winner.
Example 12: Basketball Simulation.xlsx
Objective: To formulate a
nonbinomial model of basketball
shooting, and to use it to find the
probability of a “450 shooter” making
at least 13 out of 25 shots.
Solution: Assume the shooter makes
45% of his shots in the long run.
Use simulation to create a model that
implies that the shooter gets better
the more shots he makes and worse
the more he misses.
Consider his nth shot. If he has made
his last k shots, assume the
probability of making shot n is 0.45 +
kd1.
If he has missed his last k shots,
assume the probability of making
shot n is 0.45 − kd2.
The Poisson and Exponential
Distributions
In most statistical applications, the Poisson and exponential
distributions play a much less important role than the normal and
binomial distributions.
However, in many applied management science models, the Poisson
and exponential distributions are key distributions.
◦ For example, much of the study of probabilistic inventory models,
queuing models, and reliability models relies heavily on these two
distributions.
The Poisson Distribution
(slide 1 of 3)
The Poisson distribution is a discrete distribution. It usually applies to
the number of events occurring within a specified period of time or
space.
◦ Its possible values are all of the nonnegative integers: 0, 1, 2, and so
on—there is no upper limit.
◦ Even though there is an infinite number of possible values, this
causes no real problems because the probabilities of all sufficiently
large values are essentially 0.
The Poisson distribution is characterized by a single parameter, usually
labeled λ (Greek lambda), which must be positive.
◦ It is both the mean and the variance of the Poisson distribution.
◦ It is often called a rate—arrivals per hour, for example.
λ𝑒 −λ
𝑃 𝑋=𝑥 = , 𝑥 = 0,1,2, …
𝑥!
The Poisson Distribution
(slide 2 of 3)
All Poisson distributions have the same basic shape as in the figure below.
◦ That is, they first increase and then decrease.
The Poisson Distribution
(slide 3 of 3)
Typical examples of the Poisson distribution:
◦ A bank manager is studying the arrival pattern to the bank. The
events are customer arrivals, the number of arrivals in an hour is
Poisson distributed, and λ represents the expected number of
arrivals per hour.
◦ A retailer is interested in the number of customers who order a
particular product in a week. Then the events are customer orders
for the product, the number of customer orders in a week is Poisson
distributed, and λ is the expected number of orders per week.
In Excel, calculate Poisson probabilities with the POISSON function.

Usually the Poisson distribution applies to the number of events


occurring within a specified period of time or space
Example 13: Poisson Demand
Distributionc.xlsx (slide 1 of 2)
Objective: To model the probability distribution of monthly demand for LED
screen TVs with a particular Poisson distribution.
Solution: Because the histogram of demands from previous months
resembles a Poisson distribution, try modeling the monthly demand with a
Poisson distribution.
The historical average demand per month is about 17, so let the mean
demand per month λ = 17.
Now test the Poisson model by calculating the probabilities of various events.
Example 13: Poisson Demand
Distribution.xlsx (slide 2 of 2)
The Exponential Distribution
(slide 1 of 2)
The most common probability distribution used to model the times
between customer arrivals, often called interarrival times, is the
exponential distribution.
◦ In general, the continuous random variable X has an exponential
distribution with parameter λ (with λ > 0) if the density function of
X has the form:

◦ The mean and standard deviation of this distribution are both


equal to the reciprocal of the parameter λ.
◦ The cumulative distributive function:
𝑥
𝑥
𝐹 𝑥 = 𝑃 𝑋 ≤ 𝑥 = න λ 𝑒 −λ𝑡 𝑑𝑡 = −𝑒 −λ𝑡 ൧0 = 1 − 𝑒 −λ𝑥 .
0
◦ For any exponential distribution, the probability to the left of a
given value x > 0 can be calculated with Excel’s EXPONDIST
function.
The Exponential Distribution
(slide 2 of 2)
The exponential density function has the shape shown below.

Because this density function decreases continuously from left to right,


its most likely value is x = 0.
The Exponential Distribution
- Exercise
The magnitude of earthquakes recorded in a region of
North America can be modelled as having an exponential
distribution with a mean 2.4, as measured on the Richter
scale. Find the probability that an earthquake striking this
region will

a) exceed 3.0 on the Richter scale.


b) fall between 2.0 and 3.0 on the Richter scale.
Other Useful Distributions
We present a few other distributions which arise naturally
and can also be very useful.
Discrete Distributions
Hypergeometric RV (discrete)
The hypergeometric distribution arises when one selects a random sample of size
n, without replacement, from a finite population of size N divided into two classes
consisting of D elements of the first kind and N − D of the second kind. Such a
scheme is called sampling without replacement from a finite dichotomous
population.

DN − D
  
 x n − x 
f (x) =
N 
 
n 
D
Mean: E [ X ] = n   .
N 
N −n D D 
Variance: V [ X ] =  ( n
 N ) (1 − ) .
 N − 1   N 
Example: Suppose an urn contains D = 10 red balls and N − D = 15 white balls.
A random sample of size n = 8 , without replacement, is drawn and the number of
red balls is denoted by X . What is the probability of having x red balls in the
sample.
43
Geometric RV (discrete)
Definition: A random variable X is said to have a geometric probability
distribution if and only if

p( x ) = q x −1 p, x = 1, 2, 3,..., 0  p  1.

Mean: E(X) = 1/p


Variance: Var(X) = (1-p)/p2

The random variable X is the number of trials up to and including the first success

Example: Suppose that the probability of engine malfunction during any 1-hour
period is p = 0.02 . Find the probability that a given engine will survive 2 hours.

44
Continuous Distributions
Uniform Probability
Distribution
Example: Random variable x representing the flight time of an airplane
traveling from Chicago to New York City
With every interval of a given length being equally likely, the random
variable x is said to have a uniform probability of distribution
Uniform Probability
Distribution

Uniform Probability Distribution for Flight Time


Uniform Probability
Distribution

The Area Under the Graph Provides the Probability of a


Flight Time Between 120 and 130 Minutes
Uniform Probability
Distribution

The calculation of the expected value and variance for a continuous


random variable is analogous to that for a discrete random variable
For uniform continuous probability distribution, the formulas for the
expected value and variance are:
𝑎+𝑏 (𝑏−𝑎)2
𝐸 𝑥 = Var 𝑥 =
2 12
Uniform Probability
Distribution - Exercise
If a point is randomly located in an interval (a, b), and if X denotes the
location of the point, then is assumed to have a uniform distribution
over (a, b). A plant efficiency expert randomly selects a location along a
500-foot assembly line from which to observe the work habits of the
workers on the line. What is the probability that the point she selects

a) is within 25 feet of the end of the line?


b) Is within 25 feet of the beginning of the line?
c) Is closer to the beginning of the line that to the end of the line?
Triangular Probability
Distribution
Useful only when subjective probability estimates are available
In the triangular probability distribution, we need only specify:
◦ The minimum possible value a
◦ The maximum possible value b
◦ The most likely value (or mode) of the distribution m

If these values can be knowledgeably estimated, then as an


approximation of the actual probability density function, we can
assume that the triangular distribution applies
Triangular Probability
Distribution

Triangular Probability Distribution for Time Required for Initial


Assessment of Corporate Headquarters Construction
Triangular Probability
Distribution
Note in the figure that the probability density function is a triangular
shape
The general form of the triangular probability density function is:
Triangular Probability
Distribution
The geometry required to find the area under the graph for any given
value is slightly more complex that that required to find the area for a
uniform distribution
Triangular Probability
Distribution

Triangular Distribution to Determine


P(10 ≤ x ≤ 18) = P(x ≤ 18) – P(x ≤ 10)
Do not worry about the formulae!
Gamma RV
A random variable Y is said to have a gamma distribution with parameters  and
 if and only if the density function of Y is

 y  −1e − y /  
 0y  
f ( y ) =    ( ) 
0, elsewhere,

where


( ) =  y  −1e − y dy .
0

The quantity ( ) is known as the gamma function. Direct integration will verify
that (1) = 1. Integration by parts will verify that ( ) = ( − 1)( − 1) for any
  1 and that (n ) = (n − 1)! , provided that n is an integer.

If Y has a gamma distribution with parameters  and  , then

 = E (Y ) =  and  2 = Var (Y ) =  2 .

56
Gamma RV
 Interesting relationship with Poisson process: a
Gamma RV is the waiting time until the αth event
(arrival) occurs.
 Example: Length of stay in hospital, A&E, etc

57
More Continuous Distributions
Chi-square Distributions: Special case of Gamma distribution with
parameters k/2 and 2. It is also the distribution of the sum of squares of
k independent standard normal variables. Used a lot in inferential
statistics (e.g. Chi-square test)
Beta Distributions: Often used to model the duration of an activity in a
project
Weibull Distributions: Very versatile in terms of shape. Widely used
lifetime distributions in reliability engineering
Lognormal Distributions: Describe variables which can be modelled as
the product of many small independent positive variables. Because of
this, they are widely used to model natural phenomena

You might also like