Probabilty Distribution

You might also like

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 31

Probability distribution

In probability theory and statistics, a probability distribution identifies either the probability of
each value of an unidentified random variable (when the variable is discrete), or the probability of the
value falling within a particular interval (when the variable is continuous).[1] The probability
distribution describes the range of possible values that a random variable can attain and the
probability that the value of the random variable is within any (measurable) subset of that range.

The Normal distribution, often called the "bell curve"

When the random variable takes values in the set of real numbers, the probability distribution is
completely described by the cumulative distribution function, whose value at each real x is the
probability that the random variable is smaller than or equal to x.

The concept of the probability distribution and the random variables which they describe underlies
the mathematical discipline of probability theory, and the science of statistics. There is spread or
variability in almost any value that can be measured in a population (e.g. height of people, durability
of a metal, etc.); almost all measurements are made with some intrinsic error; in physics many
processes are described probabilistically, from the kinetic properties of gases to the quantum
mechanical description of fundamental particles. For these and many other reasons, simple numbers
are often inadequate for describing a quantity, while probability distributions are often more
appropriate.

There are various probability distributions that show up in various different applications. One of the
more important ones is the normal distribution, which is also known as the Gaussian distribution or
the bell curve and approximates many different naturally occurring distributions. The toss of a fair
coin yields another familiar distribution, where the possible values are heads or tails, each with
probability 1/2.

Uniform distribution (discrete)

In probability theory and statistics, the discrete uniform distribution is a discrete probability
distribution that can be characterized by saying that all values of a finite set of possible values are
equally probable.

If a random variable has any of n possible values that are equally probable, then it
has a discrete uniform distribution. The probability of any outcome ki  is 1 / n. A simple example of
the discrete uniform distribution is throwing a fair die. The possible values of k are 1, 2, 3, 4, 5, 6;
and each time the die is thrown, the probability of a given score is 1/6. If two dice are thrown, then
the uniform distribution no longer fits, as values from 2 to 12 have varying probabilities.
In case the values of a random variable with a discrete uniform distribution are real, it is possible to
express the cumulative distribution function in terms of the degenerate distribution; thus

where the Heaviside step function H(x − x0) is the CDF of the degenerate distribution centered at x0.
This assumes that consistent conventions are used at the transition points.

Binomial distribution

In probability theory and statistics, the binomial distribution is the discrete probability distribution
of the number of successes in a sequence of n independent yes/no experiments, each of which yields
success with probability p. Such a success/failure experiment is also called a Bernoulli experiment or
Bernoulli trial. In fact, when n = 1, the binomial distribution is a Bernoulli distribution. The binomial
distribution is the basis for the popular binomial test of statistical significance. A binomial
distribution should not be confused with a bimodal distribution.

It is frequently used to model number of successes in a sample of size n from a population of size N.
Since the samples are not independent (this is sampling without replacement), the resulting
distribution is a hypergeometric distribution, not a binomial one. However, for N much larger than n,
the binomial distribution is a good approximation, and widely used.

Hypergeometric distribution

In probability theory and statistics, the hypergeometric distribution is a discrete probability


distribution that describes the number of successes in a sequence of n draws from a finite population
without replacement, just as the binomial distribution describes the number of successes for draws
with replacement.

The notation is illustrated by this contingency table:

drawn not drawn total


successes k m−k m

N+k−n
failures n − k N−m
−m

total n N−n N

Here N represents the size of the population, m represents the number of successes in the population,
k represents the number of successful draws observed, and n represents the number of draws.

A random variable X follows the hypergeometric distribution with parameters N, m and n if the
probability is given by

where the binomial coefficient is defined to be the coefficient of xb in the polynomial expansion
of (1 + x)a

The probability is positive when k is between max(0, n + m − N) and min(m, n).

The formula can be understood as follows: There are possible samples (without replacement).

There are ways to obtain k defective objects and there are ways to fill out the rest of the
sample with non-defective objects.

The sum of the probabilities for all possible values of k is equal to 1 as one would expect intuitively;
this is essentially Vandermonde's identity from combinatorics.

Negative binomial distribution

In probability and statistics the negative binomial distribution (including the Pascal distribution or
Polya distribution) is a discrete probability distribution. It arises as the probability distribution of
the number of failures in a sequence of Bernoulli trials needed to get a specified (non-random)
number of successes. If one throws a die repeatedly until the third time a "1" appears, then the
probability distribution of the number of non-"1"s that appear before the third "1" is a negative
binomial distribution.

The Pascal distribution and Polya distribution are special cases of the negative binomial. There is a
convention among engineers, climatologists, and others to reserve "negative binomial" in a strict
sense or "Pascal" (after Blaise Pascal) for the case of an integer-valued parameter r, to the right, and
use "Polya" (for George Pólya) for the real-valued case. The Polya distribution more accurately
models occurrences of "contagious" discrete events, like tornado outbreaks, than does the Poisson
distribution.
Geometric distribution

In probability theory and statistics, the geometric distribution is either of two discrete probability
distributions:

 The probability distribution of the number X of Bernoulli trials needed to get one success,
supported on the set { 1, 2, 3, ...}

 The probability distribution of the number Y = X − 1 of failures before the first success,
supported on the set { 0, 1, 2, 3, ... }

Which of these one calls "the" geometric distribution is a matter of convention and convenience.
These two different geometric distributions should not be confused with each other. Often, the name
shifted geometric distribution is adopted for the former one (distribution of the number X); however,
to avoid ambiguity, it is considered wise to indicate which is intended, by mentioning the range
explicitly.

If the probability of success on each trial is p, then the probability that the kth trial (out of k trials) is
the first success is

for k = 1, 2, 3, ....

Equivalently, if the probability of success on each trial is p, then the probability that there are k
failures before the first success is

for k = 0, 1, 2, 3, ....

In either case, the sequence of probabilities is a geometric sequence.

For example, suppose an ordinary die is thrown repeatedly until the first time a "1" appears. The
probability distribution of the number of times it is thrown is supported on the infinite set { 1, 2, 3, ...
} and is a geometric distribution with p = 1/6.

Poisson distribution

In probability theory and statistics, the Poisson distribution (pronounced [pwasõ]) is a discrete
probability distribution that expresses the probability of a number of events occurring in a fixed
period of time if these events occur with a known average rate and independently of the time since
the last event. The Poisson distribution can also be used for the number of events in other specified
intervals such as distance, area or volume.
The distribution was first introduced by Siméon-Denis Poisson (1781–1840) and published, together
with his probability theory, in 1838 in his work Recherches sur la probabilité des jugements en
matières criminelles et matière civile ("Research on the Probability of Judgments in Criminal and
Civil Matters"). The work focused on certain random variables N that count, among other things, the
number of discrete occurrences (sometimes called "arrivals") that take place during a time-interval of
given length.

If the expected number of occurrences in this interval is λ, then the probability that there are exactly
k occurrences (k being a non-negative integer, k = 0, 1, 2, ...) is equal to
where

 e is the base of the natural logarithm (e = 2.71828...)


 k is the number of occurrences of an event - the probability of which is given by the function
 k! is the factorial of k
 λ is a positive real number, equal to the expected number of occurrences that occur during the
given interval. For instance, if the events occur on average 4 times per minute, and you are
interested in the number of events occurring in a 10 minute interval, you would use as your
model a Poisson distribution with λ = 10×4 = 40.

As a function of k, this is the probability mass function. The Poisson distribution can be derived as a
limiting case of the binomial distribution.

The Poisson distribution can be applied to systems with a large number of possible events, each of
which is rare. A classic example is the nuclear decay of atoms.

The Poisson distribution is sometimes called a Poissonian, analogous to the term Gaussian for a
Gauss or normal distribution.

Bernoulli distribution
In probability theory and statistics, the Bernoulli distribution, named after Swiss scientist Jacob
Bernoulli, is a discrete probability distribution, which takes value 1 with success probability p and
value 0 with failure probability q = 1 − p. So if X is a random variable with this distribution, we have:

   

The probability mass function f of this distribution is

This can also be expressed as

The expected value of a Bernoulli random variable X is , and its variance is

The kurtosis goes to infinity for high and low values of p, but for p = 1 / 2 the Bernoulli distribution
has a lower kurtosis than any other probability distribution, namely -2.

The Bernoulli distribution is a member of the exponential family.

Normal distribution

In probability theory and statistics, the normal distribution or Gaussian distribution is a continuous
probability distribution that describes data that cluster around a mean or average. The graph of the
associated probability density function is bell-shaped, with a peak at the mean, and is known as the
Gaussian function or bell curve. The Gaussian distribution is one of many things named after
Carl Friedrich Gauss, who used it to analyze astronomical data, [1] and determined the formula for
its probability density function. However, Gauss was not the first to study this distribution or the
formula for its density function—that had been done earlier by Abraham de Moivre
.

The normal distribution can be used to describe, at least approximately, any variable that tends to
cluster around the mean. For example, the heights of adult males in the United States are roughly
normally distributed, with a mean of about 70 in (1.8 m). Most men have a height close to the mean,
though a small number of outliers have a height significantly above or below the mean. A histogram
of male heights will appear similar to a bell curve, with the correspondence becoming closer if more
data are used.

By the central limit theorem, the sum of a large number of independent random variables is
distributed approximately normally. For this reason, the normal distribution is used throughout
statistics, natural science, and social science[2] as a simple model for complex phenomena. For
example, the observational error in an experiment is usually assumed to follow a normal distribution,
and the propagation of uncertainty is computed using this assumption.
Gamma distribution

In probability theory and statistics, the gamma distribution is a two-parameter family of continuous
probability distributions. It has a scale parameter θ and a shape parameter k. If k is an integer then the
distribution represents the sum of k independent exponentially distributed random variables, each of
which has a mean of θ (which is equivalent to a rate parameter of θ −1) .

The gamma distribution is frequently a probability model for waiting times; for instance, in life
testing, the waiting time until death is a random variable that is frequently modeled with a gamma
distribution.[1]

Characterization

A random variable X that is gamma-distributed with scale θ and shape k is denoted

Probability density function


The probability density function of the gamma distribution can be expressed in terms of the gamma
function parameterized in terms of a shape parameter k and scale parameter θ. Both k and θ will be
positive values.

The equation defining the probability density function of a gamma-distributed random variable x is

(This parameterization is used in the infobox and the plots.)

Alternatively, the gamma distribution can be parameterized in terms of a shape parameter α = k and
an inverse scale parameter β = 1/θ, called a rate parameter:

If α is a positive integer, then

Both parameterizations are common because either can be more convenient depending on the
situation.

Illustration of the Gamma PDF for parameter values over k and x with θ set to 1,2,3,4,5 and 6. One
can see each θ layer by itself here [1] as well as by k [2] and x [3].

Cumulative distribution function

The cumulative distribution function is the regularized gamma function:

where γ(k,x / θ) is the lower incomplete gamma function.


It can also be expressed as follows, if k is an integer (i.e., the distribution is an Erlang distribution)[2]:

where β = 1/θ.

Exponential distribution

In probability theory and statistics, the exponential distributions are a class of continuous
probability distributions. They describe the times between events in a Poisson process, i.e. a process
in which events occur continuously and independently at a constant average rate.

Characterization

The probability density function (pdf) of an exponential distribution is


Here λ > 0 is the parameter of the distribution, often called the rate parameter. The distribution is
supported on the interval [0, ∞). If a random variable X has this distribution, we write X ~ Exp(1/λ).

Cumulative distribution function

The cumulative distribution function is given by

Alternative parameterization

A commonly used alternative parameterization is to define the probability density function (pdf) of
an exponential distribution as

where β > 0 is a scale parameter of the distribution and is the reciprocal of the rate parameter, λ,
defined above. In this specification, β is a survival parameter in the sense that if a random variable X
is the duration of time that a given biological or mechanical system manages to survive and
X ~ Exponential(β) then E[X] = β. That is to say, the expected duration of survival of the system is β
units of time. The parameterisation involving the "rate" parameter arises in the context of events
arriving at a rate λ, when the time between events (which might be modelled using an exponential
distribution) has a mean of β = λ−1.

The alternative specification is sometimes more convenient than the one given above, and some
authors will use it as a standard definition. This alternative specification is not used here.
Unfortunately this gives rise to a notational ambiguity. In general, the reader must check which of
these two specifications is being used if an author writes "X ~ Exponential(λ)", since either the
notation in the previous (using λ) or the notation in this section (here, using β to avoid confusion)
could be intended.
Chi-square distribution

In probability theory and statistics, the chi-square distribution (also chi-squared or χ2-
distribution) is one of the most widely used theoretical probability distributions in inferential
statistics, e.g., in statistical significance tests.[2][3][4][5] A random variable is said to have a chi-square
distribution if it equals the sum of the squares of a set of statistically independent standard Gaussian
random variables.

The best-known situations in which the chi-square distribution is used are the common chi-square
tests for goodness of fit of an observed distribution to a theoretical one, and of the independence of
two criteria of classification of qualitative data. Many other statistical tests also lead to a use of this
distribution, like Friedman's analysis of variance by ranks.
Definition

If Xi are k independent, normally distributed random variables with mean 0 and variance 1, then the
random variable

is distributed according to the chi-square distribution with k degrees of freedom. This is usually
written

The chi-square distribution has one parameter: k - a positive integer that specifies the number of
degrees of freedom (i.e. the number of Xi's)

The chi-square distribution is a special case of the gamma distribution.

Characteristics

Further properties of the chi-square distribution can be found in the box at right.

Probability density function

A probability density function of the chi-square distribution is

where Γ denotes the Gamma function, which has closed-form values at the half-integers.

For derivations of the pdf in the cases of one and two degrees of freedom, see Proofs related to chi-
square distribution.

Cumulative distribution function


Its cumulative distribution function is:

where γ(k,z) is the lower incomplete Gamma function and P(k,z) is the regularized Gamma function.

Tables of this distribution — usually in its cumulative form — are widely available and the function
is included in many spreadsheets and all statistical packages.

Additivity

It follows from the definition of the chi-square distribution that the sum of independent chi-square

variables is also chi-square distributed. Specifically, if are independent chi-square

variables with degrees of freedom, respectively, then is chi-

square distributed with degrees of freedom.

Information entropy

The information entropy is given by

where ψ(x) is the Digamma function.

Noncentral moments

The moments about zero of a chi-square distribution with k degrees of freedom are given by[6][7]

Cumulants
The cumulants are readily obtained by a (formal) power series expansion of the logarithm of the
characteristic function:

Asymptotic properties

By the central limit theorem, because the chi-square distribution is the sum of k independent random
variables, it converges to a normal distribution for large k (k > 50 is "approximately normal"

[8]
according to ). Specifically, if , then as k tends to infinity, the distribution of

tends to a standard normal distribution. However, convergence is slow as the

skewness is and the excess kurtosis is 12 / k.

Other functions of the chi-square distribution converge more rapidly to a normal distribution. Some
examples are:

 If then is approximately normally distributed with mean and


unit variance (result credited to R. A. Fisher).

 If then is approximately normally distributed with mean 1 − 2 / (9k) and


variance 2 / (9k) (Wilson and Hilferty,1931)

Weibull distribution

In probability theory and statistics, the Weibull distribution is a continuous probability distribution.
It is named after Waloddi Weibull who described it in detail in 1951, although it was first identified
by Fréchet (1927) and first applied by Rosin & Rammler (1933) to describe the size distribution of
particles. The probability density function of a Weibull random variable x is[1]:
where k > 0 is the shape parameter and λ > 0 is the scale parameter of the distribution. Its
complementary cumulative distribution function is a stretched exponential function. The Weibull
distribution is related to a number of other probability distributions; in particular, it interpolates
between the exponential distribution (k = 1) and the Rayleigh distribution (k = 2).

The Weibull distribution is often used in the field of life data analysis due to its flexibility—it can
mimic the behavior of other statistical distributions such as the normal and the exponential. If the
failure rate decreases over time, then k < 1. If the failure rate is constant over time, then k = 1. If the
failure rate increases over time, then k > 1.

An understanding of the failure rate may provide insight as to what is causing the failures:

 A decreasing failure rate would suggest "infant mortality". That is, defective items fail early
and the failure rate decreases over time as they fall out of the population.
 A constant failure rate suggests that items are failing from random events.

 An increasing failure rate suggests "wear out" - parts are more likely to fail as time goes on.

Student's t-distribution

In probability and statistics, Student's t-distribution (or simply the t-distribution) is a probability
distribution that arises in the problem of estimating the mean of a normally distributed population
when the sample size is small. It is the basis of the popular Student's t-tests for the statistical
significance of the difference between two sample means, and for confidence intervals for the
difference between two population means. The Student's t-distribution is a special case of the
generalised hyperbolic distribution.

The derivation of the t-distribution was first published in 1908 by William Sealy Gosset, while he
worked at a Guinness Brewery in Dublin. Due to proprietary issues, the paper was written under the
pseudonym Student. The t-test and the associated theory became well-known through the work of
R.A. Fisher, who called the distribution "Student's distribution".

Student's distribution arises when (as in nearly all practical statistical work) the population standard
deviation is unknown and has to be estimated from the data. Quite often, however, textbook
problems will treat the population standard deviation as if it were known and thereby avoid the need
to use the Student's t-test. These problems are generally of two kinds: (1) those in which the sample
size is so large that one may treat a data-based estimate of the variance as if it were certain, and (2)
those that illustrate mathematical reasoning, in which the problem of estimating the standard
deviation is temporarily ignored because that is not the point that the author or instructor is then
explaining.

Etymology

The "Student's" distribution was actually published in 1908 by William Sealy Gosset. Gosset,
however, was employed at a brewery that forbade members of its staff publishing scientific papers
due to an earlier paper containing trade secrets. To circumvent this restriction, Gosset used the name
"Student", and consequently the distribution was named "Student's t-distribution". [2]

Characterization

Student's t-distribution is the probability distribution of the ratio

where

 Z is normally distributed with expected value 0 and variance 1;


 V has a chi-square distribution with ν degrees of freedom;
 Z and V are independent.
While, for any given constant μ, is a random variable of noncentral t-distribution with
noncentrality parameter μ.

Probability density function

Student's t-distribution has the probability density function

where ν is the number of degrees of freedom and Γ is the Gamma function.

For ν even,

For ν odd,

The overall shape of the probability density function of the t-distribution resembles the bell shape of
a normally distributed variable with mean 0 and variance 1, except that it is a bit lower and wider. As
the number of degrees of freedom grows, the t-distribution approaches the normal distribution with
mean 0 and variance 1.

The following images show the density of the t-distribution for increasing values of ν. The normal
distribution is shown as a blue line for comparison.; Note that the t-distribution (red line) becomes
closer to the normal distribution as ν increases.

Derivation
Suppose X1, ..., Xn are independent random variables that are normally distributed with expected
value μ and variance σ2. Let

be the sample mean, and

be the sample variance. It can be shown that the random variable

has a chi-square distribution with n − 1 degrees of freedom. It is readily shown that the quantity

is normally distributed with mean 0 and variance 1, since the sample mean is normally distributed
with mean μ and standard error . Moreover, it is possible to show that these two random
variables—the normally distributed one and the chi-square-distributed one—are independent.
Consequently the pivotal quantity,

which differs from Z in that the exact standard deviation is replaced by the random variable , has
a Student's t-distribution as defined above. Notice that the unknown population variance σ2 does not
appear in T, since it was in both the numerator and the denominators, so it canceled. Technically,
has a distribution by Cochran's theorem. Gosset's work showed that T has the
probability density function
with ν equal to n − 1.

This may also be written as

where B is the Beta function.

The distribution of T is now called the t-distribution. The parameter ν is called the number of
degrees of freedom. The distribution depends on ν, but not μ or σ; the lack of dependence on μ and σ
is what makes the t-distribution important in both theory and practice.

Gosset's result can be stated more generally. (See, for example, Hogg and Craig, Sections 4.4 and
4.8.) Let Z have a normal distribution with mean 0 and variance 1. Let V have a chi-square
distribution with ν degrees of freedom. Further suppose that Z and V are independent (see Cochran's
theorem). Then the ratio

has a t-distribution with ν degrees of freedom.

Cumulative distribution function

The cumulative distribution function is given by an incomplete beta function,

with
F-distribution

In probability theory and statistics, the F-distribution is a continuous probability distribution.[1][2][3][4]


It is also known as Snedecor's F distribution or the Fisher-Snedecor distribution (after R.A.
Fisher and George W. Snedecor). The F-distribution arises frequently as the null distribution of a test
statistic, especially in likelihood-ratio tests, perhaps most notably in the analysis of variance; see F-
test.
Characterization

A random variate of the F-distribution arises as the ratio of two chi-squared variates:

where

 U1 and U2 have chi-square distributions with d1 and d2 degrees of freedom respectively, and

 U1 and U2 are independent (see Cochran's theorem for an application).

The probability density function of an F(d1, d2) distributed random variable is given by

for real x ≥ 0, where d1 and d2 are positive integers, and B is the beta function.

The cumulative distribution function is

where I is the regularized incomplete beta function.

The expectation, variance, and other details about the F(d1,d2) are given in the sidebox; for d2 > 8, the
kurtosis is

where

The F-distribution is a particular parameterisation of the beta prime distribution, which is also called
the beta distribution of the second kind.
Probability Distribution
 
 
 
 
 
 
Henry M. Estremadura, 2007-0371
ES 85
TFr 2:30-4:00
First Semester A.Y. 2009 - 2010
 
 
 
 
28-October-2009
 

  Mindanao State University – General Santos City


College of Engineering 
 

You might also like