Professional Documents
Culture Documents
Distributions
Distributions
Distributions
Probability distributions are a fundamental concept in statistics. They are used both on a theoretical
level and a practical level.
• To calculate confidence intervals for parameters and to calculate critical regions for hypothesis tests.
• For univariate data, it is often useful to determine a reasonable distributional model for the data.
• Statistical intervals and hypothesis tests are often based on specific distributional assumptions.
Before computing an interval or test based on a distributional assumption, we need to verify that the
assumption is justified for the given data set. In this case, the distribution does not need to be the
best-fitting distribution for the data, but an adequate enough model so that the statistical technique
yields valid conclusions.
• Simulation studies with random numbers generated from using a specific probability distribution are
often needed.
Related Distributions
Probability distributions are typically defined in terms of the probability density function. However, there are a
number of probability functions used in applications.
Probability Density For a continuous function, the probability density function (pdf) is the probability
Function that the variate has the value x. Since for continuous distributions the
probability at a single point is zero, this is often expressed in terms of an
integral between two points.
1
Cumulative The cumulative distribution function (cdf) is the probability that the variable
Distribution takes a value less than or equal to x. That is
Function
2
The horizontal axis is the allowable domain for the given
probability function. Since the vertical axis is a probability, it
must fall between zero and one. It increases from zero to one as
we go from left to right on the horizontal axis.
Percent Point Function The percent point function (ppf) is the inverse of the cumulative
distribution function. For this reason, the percent point function is also
commonly referred to as the inverse distribution function. That is, for a
distribution function we calculate the probability that the variable is less
than or equal to x for a given x. For the percent point function, we start
with the probability and compute the corresponding x for the cumulative
distribution. Mathematically, this can be expressed as
or alternatively
3
Since the horizontal axis is a probability, it goes from zero to
one. The vertical axis goes from the smallest to the largest
value of the cumulative distribution function.
Hazard Function The hazard function is the ratio of the probability density function to the
survival function, S(x).
4
conditional failure density function rather than the hazard
function.
Cumulative Hazard The cumulative hazard function is the integral of the hazard function. It
Function can be interpreted as the probability of failure at time x given survival until
time x.
5
For a survival function, the y value on the graph starts at 1 and
monotonically decreases to zero. The survival function should
be compared to the cumulative distribution function.
Inverse Survival Just as the percent point function is the inverse of the cumulative
Function distribution function, the survival function also has an inverse function.
The inverse survival function can be defined in terms of the percent point
function.
6
probability. Therefore the horizontal axis goes from 0 to 1
regardless of the particular distribution. The appearance is
similar to the percent point function. However, instead of going
from the smallest to the largest value on the vertical axis, it
goes from the largest to the smallest value.
Families of Distributions
Shape Parameters
Many probability distributions are not a single distribution, but are in fact a family of distributions.
This is due to the distribution having one or more shape parameters.
The Weibull distribution is an example of a distribution that has a shape parameter. The following
graph plots the Weibull pdf with the following values for the shape parameter: 0.5, 1.0, 2.0, and 5.0.
The Weibull distribution has a relatively simple distributional form. However, the shape
parameter allows the Weibull to assume a wide variety of shapes. This combination of
simplicity and flexibility in the shape of the Weibull distribution has made it an
effective distributional model in reliability applications. This ability to model a wide
variety of distributional shapes using a relatively simple distributional form is possible
with many other distributional families as well.
The sections on parameter estimation are restricted to the method of moments and
maximum likelihood. This is because the least squares and PPCC and probability plot
7
estimation procedures are generic. The maximum likelihood equations are not listed if
they involve solving simultaneous equations. This is because these methods require
sophisticated computer software to solve. Except where the maximum likelihood
estimates are trivial, you should depend on a statistical software program to compute
them. References are given for those who are interested.
Be aware that different sources may give formulas that are different from those shown
here. In some cases, these are simply mathematically equivalent formulations. In other
cases, a different parameterization may be used
The PPCC plot can be used to estimate the shape parameter of a distribution with a single shape
parameter. After finding the best value of the shape parameter, the probability plot can be used to
estimate the location and scale parameters of a probability distribution
The general formula for the probability density function of the normal distribution is
where is the location parameter and is the scale parameter. The case where = 0
and = 1 is called the standard normal distribution. The equation for the standard
normal distribution is
8
Since the general form of probability functions can be expressed in terms of the
standard distribution, all subsequent formulas in this section are given for the standard
form of the function.
The following is the plot of the standard normal probability density function.
9
The formula for the percent point function of the normal distribution does not exist in a simple closed
formula. It is computed numerically.
where is the cumulative distribution function of the standard normal distribution and
is the probability density function of the standard normal distribution.
10
The normal cumulative hazard function can be computed from the normal cumulative distribution
function.
The normal survival function can be computed from the normal cumulative distribution function.
The normal inverse survival function can be computed from the normal percent point function.
11
Mean The location parameter .
Median The location parameter .
Mode The location parameter .
Range Infinity in both directions.
Standard Deviation The scale parameter .
Coefficient of Variation
Skewness 0
Kurtosis 3
Probability The general formula for the probability density function of the uniform distribution is
Density
Function
12
Cumulative The formula for the cumulative distribution function of the uniform distribution is
Distribution
Function
Percent Point The formula for the percent point function of the uniform distribution is
Function
13
Hazard Function The formula for the hazard function of the uniform distribution is
Cumulative The formula for the cumulative hazard function of the uniform distribution is
Hazard Function
14
Survival The uniform survival function can be computed from the uniform cumulative
Function distribution function.
Inverse Survival The uniform inverse survival function can be computed from the uniform percent point
Function function.
15
Common Mean (A + B)/2
Statistics Median (A + B)/2
Range B-A
Standard
Deviation
Coefficient of
Variation
Skewness 0
Kurtosis 9/5
Parameter The method of moments estimators for A and B are
Estimation
A=a-h
B=a+h
The maximum likelihood estimators for a and h are
Comments The uniform distribution defines equal probability over a given range for a continuous
distribution. For this reason, it is important as a reference distribution.
16
numbers.
Probability The general formula for the probability density function of the exponential
Density distribution is
Function
Cumulative The formula for the cumulative distribution function of the exponential
Distribution distribution is
Function
17
Percent Point The formula for the percent point function of the exponential distribution is
Function
Hazard Function The formula for the hazard function of the exponential distribution is
18
Cumulative The formula for the cumulative hazard function of the exponential
Hazard Function distribution is
Survival The formula for the survival function of the exponential distribution is
Function
19
Inverse Survival The formula for the inverse survival function of the exponential distribution
Function is
20
Common Mean
Statistics Median
Mode Zero
Range Zero to plus infinity
Standard
Deviation
Coefficient of 1
Variation
Skewness 2
Kurtosis 9
Parameter For the full sample case, the maximum likelihood estimator of the scale
Estimation parameter is the sample mean. Maximum likelihood estimation for the
exponential distribution is discussed in the chapter on reliability (Chapter
8). It is also discussed in chapter 19 of Johnson, Kotz, and Balakrishnan.
Comments The exponential distribution is primarily used in reliability applications. The
exponential distribution is used to model data with a constant failure rate
(indicated by the hazard plot which is simply equal to a constant).
t Distribution
Probability The formula for the probability density function of the t distribution is
Density
Function
21
These plots all have a similar shape. The difference is in the
heaviness of the tails. In fact, the t distribution with equal to 1
is a Cauchy distribution. The t distribution approaches a normal
distribution as becomes large. The approximation is quite good
for values of > 30.
Cumulative The formula for the cumulative distribution function of the t distribution is
Distribution complicated and is not included here. It is given in the Evans, Hastings,
Function and Peacock book.
Percent Point The formula for the percent point function of the t distribution does not
Function exist in a simple closed form. It is computed numerically.
The following are the plots of the t percent point function with
22
the same values of as the pdf plots above.
Other Probability Since the t distribution is typically used to develop hypothesis tests and
Functions confidence intervals and rarely for modeling applications, we omit the
formulas and plots for the hazard, cumulative hazard, survival, and
inverse survival probability functions.
Common Mean 0 (It is undefined for equal to 1.)
Statistics Median 0
Mode 0
Range Infinity in both directions.
Standard
Deviation
23
F Distribution
Probability The F distribution is the ratio of two chi-square distributions with degrees of
Density freedom and , respectively, where each chi-square has first been divided by
Function its degrees of freedom. The formula for the probability density function of the F
distribution is
where and are the shape parameters and is the gamma function. The
formula for the gamma function is
Cumulative The formula for the Cumulative distribution function of the F distribution is
Distribution
Function
where k = / ( + *x) and Ik is the incomplete beta function. The formula for
the incomplete beta function is
The following is the plot of the F cumulative distribution function with the same
values of and as the pdf plots above.
24
Percent Point The formula for the percent point function of the F distribution does not exist in a
Function simple closed form. It is computed numerically.
The following is the plot of the F percent point function with the same
values of and as the pdf plots above.
Other Probability Since the F distribution is typically used to develop hypothesis tests and
Functions confidence intervals and rarely for modeling applications, we omit the formulas
and plots for the hazard, cumulative hazard, survival, and inverse survival
probability functions.
Common The formulas below are for the case where the location parameter is zero and
Statistics the scale parameter is one.
25
Mean
Mode
Coefficient
of Variation
Skewness
Parameter Since the F distribution is typically used to develop hypothesis tests and
Estimation confidence intervals and rarely for modeling applications, we omit any discussion
of parameter estimation.
Comments The F distribution is used in many cases for the critical regions for hypothesis
tests and in determining confidence intervals. Two common examples are the
analysis of variance and the F test to determine if the variances of two
populations are equal.
Chi-Square Distribution
Probability The chi-square distribution results when independent variables with
Density standard normal distributions are squared and summed. The formula for
Function the probability density function of the chi-square distribution is
26
Cumulative The formula for the cumulative distribution function of the chi-square
Distribution distribution is
Function
27
Percent Point The formula for the percent point function of the chi-square distribution
Function does not exist in a simple closed form. It is computed numerically.
Other Probability Since the chi-square distribution is typically used to develop hypothesis
Functions tests and confidence intervals and rarely for modeling applications, we
omit the formulas and plots for the hazard, cumulative hazard, survival,
and inverse survival probability functions.
28
Common Mean
Statistics
Median approximately - 2/3 for large
Mode
Skewness
Kurtosis
Comments The chi-square distribution is used in many cases for the critical regions
for hypothesis tests and in determining confidence intervals. Two
common examples are the chi-square test for independence in an RxC
contingency table and the chi-square test to determine if the standard
deviation of a population is equal to a pre-specified value.
Cauchy Distribution
Probability The general formula for the probability density function of the Cauchy
Density distribution is
Function
29
this section are given for the standard form of the function.
Cumulative The formula for the cumulative distribution function for the Cauchy
Distribution distribution is
Function
Percent Point The formula for the percent point function of the Cauchy distribution is
Function
30
The following is the plot of the Cauchy percent point function.
Hazard Function The Cauchy hazard function can be computed from the Cauchy
probability density and cumulative distribution functions.
Cumulative The Cauchy cumulative hazard function can be computed from the
Hazard Function Cauchy cumulative distribution function.
31
Survival The Cauchy survival function can be computed from the Cauchy
Function cumulative distribution function.
Inverse Survival The Cauchy inverse survival function can be computed from the Cauchy
Function percent point function.
32
Common Mean The mean is undefined.
Statistics Median The location parameter t.
Mode The location parameter t.
Range Infinity in both directions.
Standard The standard deviation is undefined.
Deviation
Coefficient of The coefficient of variation is undefined.
Variation
Skewness The skewness is undefined.
Kurtosis The kurtosis is undefined.
Parameter The likelihood functions for the Cauchy maximum likelihood estimates are
Estimation given in chapter 16 of Johnson, Kotz, and Balakrishnan. These equations
typically must be solved numerically on a computer.
Comments The Cauchy distribution is important as an example of a pathological
case. Cauchy distributions look similar to a normal distribution. However,
they have much heavier tails. When studying hypothesis tests that
assume normality, seeing how the tests perform on data from a Cauchy
distribution is a good indicator of how sensitive the tests are to heavy-tail
departures from normality. Likewise, it is a good check for robust
techniques that are designed to work well under a wide variety of
distributional assumptions.
33
where is the location parameter and is the scale parameter.
The case where = 0 and = 1 is called the standard double
exponential distribution. The equation for the standard
double exponential distribution is
Cumulative The formula for the cumulative distribution function of the double
Distribution exponential distribution is
Function
34
Percent Point The formula for the percent point function of the double exponential
Function distribution is
Hazard Function The formula for the hazard function of the double exponential distribution
is
35
The following is the plot of the double exponential hazard
function.
Cumulative The formula for the cumulative hazard function of the double exponential
Hazard Function distribution is
Survival The double exponential survival function can be computed from the
Function cumulative distribution function of the double exponential distribution.
36
function.
Inverse Survival The formula for the inverse survival function of the double exponential
Function distribution is
37
Common Mean
Statistics Median
Mode
Range Negative infinity to positive infinity
Standard
Deviation
Skewness 0
Kurtosis 6
Coefficient of
Variation
Parameter The maximum likelihood estimators of the location and scale parameters
Estimation of the double exponential distribution are
Weibull Distribution
Probability The formula for the probability density function of the general Weibull distribution is
Density
Function
where is the shape parameter, is the location parameter and is the scale
parameter. The case where = 0 and = 1 is called the standard Weibull
distribution. The case where = 0 is called the 2-parameter Weibull
distribution. The equation for the standard Weibull distribution reduces to
38
Cumulative The formula for the cumulative distribution function of the Weibull distribution is
Distribution
Function
The following is the plot of the Weibull cumulative distribution function with
the same values of as the pdf plots above.
Percent Point The formula for the percent point function of the Weibull distribution is
Function
The following is the plot of the Weibull percent point function with the same
values of as the pdf plots above.
39
Hazard Function The formula for the hazard function of the Weibull distribution is
The following is the plot of the Weibull hazard function with the same values
of as the pdf plots above.
Cumulative The formula for the cumulative hazard function of the Weibull distribution is
Hazard Function
The following is the plot of the Weibull cumulative hazard function with the
same values of as the pdf plots above.
40
Survival The formula for the survival function of the Weibull distribution is
Function
The following is the plot of the Weibull survival function with the same values
of as the pdf plots above.
Inverse Survival The formula for the inverse survival function of the Weibull distribution is
Function
The following is the plot of the Weibull inverse survival function with the same
values of as the pdf plots above.
41
Common The formulas below are with the location parameter equal to zero and the scale
Statistics parameter equal to one.
Mean
Median
Mode
Coefficient of
Variation
Parameter Maximum likelihood estimation for the Weibull distribution is discussed in the Reliability
Estimation chapter (Chapter 8). It is also discussed in Chapter 21 of Johnson, Kotz, and
Balakrishnan.
Comments The Weibull distribution is used extensively in reliability applications to model failure
times.
Lognormal Distribution
Probability A variable X is lognormally distributed if Y = LN(X) is normally distributed
Density with “LN” denoting the natural logarithm. The general formula for the
Function probability density function of the lognormal distribution is
42
where is the shape parameter, is the location parameter and
m is the scale parameter. The case where = 0 and m = 1 is
called the standard lognormal distribution. The case where
equals zero is called the 2-parameter lognormal distribution.
The equation for the standard lognormal distribution is
43
Percent Point The formula for the percent point function of the lognormal distribution is
Function
where is the percent point function of the normal
distribution.
The following is the plot of the lognormal percent point function
with the same values of as the pdf plots above.
Hazard Function The formula for the hazard function of the lognormal distribution is
44
Cumulative The formula for the cumulative hazard function of the lognormal
Hazard Function distribution is
Survival The formula for the survival function of the lognormal distribution is
Function
45
Inverse Survival The formula for the inverse survival function of the lognormal distribution
Function is
Common The formulas below are with the location parameter equal to zero and the
Statistics scale parameter equal to one.
46
Mean
Median Scale parameter m (= 1 if scale parameter
not specified).
Mode
Kurtosis
Coefficient of
Variation
Parameter The maximum likelihood estimates for the scale parameter, m, and the
Estimation shape parameter, , are
and
where
Beta Distribution
Probability The general formula for the probability density function of the beta distribution is
Density
Function
where p and q are the shape parameters, a and b are the lower and
upper bounds, respectively, of the distribution, and B(p,q) is the beta
function. The beta function has the formula
47
Typically we define the general form of a distribution in terms of location
and scale parameters. The beta is different in that we define the general
distribution in terms of the lower and upper bounds. However, the location
and scale parameters can be defined in terms of the lower and upper
limits as follows:
location = a
scale = b - a
Since the general form of probability functions can be expressed in terms of the
standard distribution, all subsequent formulas in this section are given for the
standard form of the function.
The following is the plot of the beta probability density function for four
different values of the shape parameters.
Cumulative The formula for the cumulative distribution function of the beta distribution is also
Distribution called the incomplete beta function ratio (commonly denoted by Ix) and is defined as
Function
The following is the plot of the beta cumulative distribution function with
the same values of the shape parameters as the pdf plots above.
48
Percent Point The formula for the percent point function of the beta distribution does not exist in a
Function simple closed form. It is computed numerically.
The following is the plot of the beta percent point function with the same
values of the shape parameters as the pdf plots above.
Other Probability Since the beta distribution is not typically used for reliability applications, we omit the
Functions formulas and plots for the hazard, cumulative hazard, survival, and inverse survival
probability functions.
Common The formulas below are for the case where the lower limit is zero and the upper limit
Statistics is one.
49
Mean
Mode
Range 0 to 1
Standard
Deviation
Coefficient of
Variation
Skewness
Parameter First consider the case where a and b are assumed to be known. For this case, the
Estimation method of moments estimates are
where is the sample mean and s2 is the sample variance. If a and b are not 0 and
For the case when a and b are known, the maximum likelihood estimates
can be obtained by solving the following set of equations
DISCRETE DISTRIBUTIONS
Binomial Distribution
Probability The binomial distribution is used when there are exactly two mutually exclusive
Mass Function outcomes of a trial. These outcomes are appropriately labeled “success” and “failure”.
The binomial distribution is used to obtain the probability of observing x successes in
N trials, with the probability of success on a single trial denoted by p. The binomial
distribution assumes that p is fixed for all trials.
50
where
The following is the plot of the binomial probability density function for four
values of p and n = 100.
51
Percent Point The binomial percent point function does not exist in simple closed form. It is
Function computed numerically. Note that because this is a discrete distribution that is only
defined for integer values of x, the percent point function is not smooth in the way the
percent point function typically is for a continuous distribution.
The following is the plot of the binomial percent point function with the
same values of p as the pdf plots above.
52
Common Mean
Statistics Mode
Range 0 to N
Standard
Deviation
Coefficient of
Variation
Skewness
Kurtosis
Comments The binomial distribution is probably the most commonly used discrete distribution.
Parameter The maximum likelihood estimator of p (n is fixed) is
Estimation
Poisson Distribution
Probability Mass The Poisson distribution is used to model the number of events occurring
Function within a given time interval.
53
Cumulative The formula for the Poisson cumulative probability function is
Distribution
Function
Percent Point The Poisson percent point function does not exist in simple closed form. It
Function is computed numerically. Note that because this is a discrete distribution
that is only defined for integer values of x, the percent point function is not
smooth in the way the percent point function typically is for a continuous
distribution.
54
with the same values of as the pdf plots above.
Common Mean
Statistics Mode For non-integer , it is the largest integer
less than . For integer , x = and x =
1 are both the mode.
Range 0 to positive infinity
Standard
Deviation
Coefficient of
Variation
Skewness
Kurtosis
55