Topic 4: Some Special Distributions: Rohini Somanathan Course 003, 2015-2016

Course 003: Basic Econometrics, 2015-2016
Topic 4: Some Special Distributions

Rohini Somanathan
Course 003, 2015-2016
Page 0
Rohini Somanathan
'
Parametric Families of Distributions

Some families of probability distributions are frequently used because they have a small
number of parameters and are good approximations for the experiments and events we are
interested in analyzing.
Examples:
For modeling the distribution of income or consumption expenditure, we want a density
which is skewed to the right ( gamma, weibull, lognormal..)
IQs, heights, weights, arm circumference are quite symmetric around a mode (normal
or truncated normal)
number of successes in a given number of trials (binomial)
the time to failure for a machine or person (gamma, exponential)
We refer to these probability density functions by f(x; ) where refers to a parameter
vector.
A given choice of therefore leads to a given probability density function.
is used to denote the parameter space.
&
Page 1
%
Rohini Somanathan
'
The discrete uniform distribution

Parameter: N
Probability function: f(x; N) =
Moments:
1
N
I(1,2,...,N) (x)
1 N(N + 1) (N + 1)
=
N
2
2
X
1 N(N + 1)(2N + 1) (N + 1) 2 N2 1
2
2
2
=
x f(x) =
=
N
6
2
12
=
MGF:
xf(x) =
PN
ejt
j=1 N
Applications: experiments with equally likely outcomes (dice, coins..) Can you think of
applications in economics?
&
Page 2
%
Rohini Somanathan
'
The Bernoulli distribution

Parameter: p , 0 p 1
Probability function: f(x; p) = px (1 p)1x I(0,1) (x)
Moments:
=
xf(x) = 1.p1 (1 p)0 + 0.p0 ((1 p)1 = p

2 =
x2 f(x) 2 = p(1 p)
MGF: et p + e0 (1 p) = pet + (1 p)
Applications: experiments with two possible outcomes: success or failure, defective or not
defective, male or female, etc.
&
Page 3
%
Rohini Somanathan
'
The Binomial distribution

.
Parameters: (n, p) , 0 p 1 and n is a positive integer
Probability function: An observed sequence of n Bernoulli trials can be represented by an

n!
n-tuple of zeros and ones. The number of ways to achieve x ones is given by n
x = x!(nx)! .
The probability of x successes in n trials is therefore:
npx (1 p)nx
x = 0, 1, 2, . . . n
x
f(x; n, p) =
0
otherwise
Notice that since
n
P
x=0
n x nx
x a b
= (a + b)n ,
n
P
f(x) = [(p + (1 p)]n = 1 so we have a valid
x=0
density function.
MGF:The MGF is given by:
n
n
x
P tx
P
P
nx =
e f(x) =
etx n
p
(1
p)
x
x
x=0
x=0
n
t x
nx
x (pe ) (1 p)
= [(1 p) + pet ]n
Moments: The MGF can be used to derive = np and 2 = np(1 p)

Result: If X1 , . . . Xk are independent random variables and if each Xi has a binomial
distribution with parameters ni and p, then the sum X1 + + Xk has a binomial
distribution with parameters n = n1 + + nk and p.
&
Page 4
%
Rohini Somanathan
'
The Multinomial distribution

Suppose there are a small number of different outcomes (methods of public transport, water
purification etc. ) The Multinomial distribution gives us the probability associated with a
particular vector of these outcomes:
P
Parameters: (n, p1 , . . . pm ) , 0 pi 1, i pi = 1 and n is a positive integer
Probability function:
f(x1 , . . . xm ; n, p1 , . . . pm ) =
MGF: MX (t) =
P
m
pi
eti
m
Q
x
n!
pi i
m
Q
xi ! i=1
x = 0, 1, 2, . . . n,
Pm
i
xi = n
i=1
otherwise
n
i=1
Moments: i = npi , 2i = npi (1 pi )
&
Page 5
%
Rohini Somanathan
'
Geometric and Negative Binomial distributions

The Negative Binomial (or Pascal) distribution gives us the probability that x failures will
occur before r successes are achieved. This means that the rth success occurs on the
(x + r)th trial.
P
Parameters: (r, p) , 0 pi 1, i pi = 1 and r is a positive integer
Density: For the rth success occurs on the (x + r)th trial, we require (r 1) successes in
the first (x + r 1) trials. We therefore obtain the density:

r+x1
f(x; r, p) =
pr qx ,
x
x = 0, 1, 2, 3...
The geometric distribution is a special case of the negative binomial with r = 1.

The density in this case takes the form f(x|1, p) = pqx over all natural numbers x
P
p
t x
the MGF is given by E(etX ) = p
x=0 (qe ) = 1qet
for t < log( q1 )
We can use this function to get the mean and variance, =
q
p
and 2 =
q
p2
The negative binomial is just a sum of r geometric variables, and the MGF is therefore
rq
p
rq
r
2
( 1qe
t ) and the corresponding mean and variance is = p and = p2
The geometric distribution is memory-less, so the conditional probability of k + t
failures given k failures is the unconditional probability of t failures,
P(X = k + t|X k) = P(X = t)
&
Page 6
%
Rohini Somanathan
'
Discrete Distributions: Poisson

Parameter: , > 0
Probability function:
f(x; ) =
e x
x!
, x = 0, 1, 2, . . . ,
0
2
otherwise
Using the result that the series 1 + + 2! + 3! + . . . converges to e ,
P
P
P
x
e x
e = 1 so we have a valid density.

=
e
f(x) =
x!
x! = e
x
x=0
x=0
Moments: = = 2
MGF: E(etX ) =
P
x=0
etx e x
x!
= e
P
x=0
(et )x
x!
= e(e
t 1)
The MGF can be used to get the first and second moments about the origin, and 2 +
so the mean and the variance are both .
We can also use the product of k identical MGFs to show that the sum of k independently
distributed Poisson variables has a Poisson distribution with mean 1 + . . . k .
&
Page 7
%
Rohini Somanathan
'
A Poisson process
Suppose that the number of type A outcomes that occur over a fixed interval of time, [0, t]
follows a process in which
1. The probability that precisely one type A outcome will occur in a small interval of time t
is approximately proportional to the length of the interval:
g(1, t) = t + o(t)
where o(t) denotes a function of t having the property that limt0
o(t)
t
= 0.
2. The probability that two or more type A outcomes will occur in a small interval of time t
is negligible:
X
g(x, t) = o(t)
x=2
3. The numbers of type A outcomes that occur in nonoverlapping time intervals are
independent events.
These conditions imply a process which is stationary over the period of observation, i.e the
probability of an occurrence must be the same over the entire period with neither busy nor quiet
intervals.
&
Page 8
%
Rohini Somanathan
'
Poisson densities representing poisson processes

RESULT: Consider a Poisson process with the rate per unit of time. The number of events in
a time interval t is a Poisson density with mean = t.
Applications:
the number of weaving defects in a yard of handloom cloth or stitching defects in a shirt
the number of traffic accidents on a motorway in an hour
the number of particles of a noxious substance that come out of chimney in a given period
of time
the number of times a machine breaks down each week
Example:
let the probability of exactly one blemish in a foot of wire be
blemishes be zero.
1
1000
and that of two or more
were interested in the number of blemishes in 3, 000 feet of wire.

if the numbers of blemishes in non-overlapping intervals are assumed to be independently
distributed, then our random variable X follows a poisson process with = t = 3 and
35 e3
P(X = 5) =
5!
you can plug this into a computer, or alternatively use tables to compute f(5; 3) = .101
&
Page 9
%
Rohini Somanathan
'
The Poisson as a limiting distribution

We can show that a binomial distribution with large n and small p can be approximated by a
Poisson ( which is computationally easier).
useful result: ev = limn (1 +
v n
n)
We can rewrite the binomial density for non-zero values as

x
Q
f(x; n, p) =
(ni+1)
i=1
x!
px (1 p)nx .
If np = , we can subsitute for p by

x
Q
limn f(x; n, p)
limn
i=1
x!
x
Q
=
=
=
(n i + 1)

to get
x
nx
1
n
n
(n i + 1)
x
n
x
limn
1
1
nx
x!
n
n
h n (n 1)

x
(n x + 1)
n
x i
limn
.
....
1
1
n
n
n
x!
n
n
i=1
e x
x!
(using the above result and the property that the limit of a product is the product of the
limits)
&
Page 10
%
Rohini Somanathan
'
Poisson as a limiting distribution...example

We have a 300 page novel with 1, 500 letters on each page.
Typing errors are as likely to occur for one letter as for another, and the probability of such
an error is given by p = 105 .
The total number of letters n = (300) (1500) = 450, 000
Using = np, the poisson distribution function gives us the probability of the number of
errors being less than or equal to 10 as:
10
X
e4.5 (4.5)x
P(x 10)
= .9933
x!
x=0
Rules of Thumb: close to binomial probabilities when n 20 and p .05, excellent when n 100
and np 10.
&
Page 11
%
Rohini Somanathan
'
Discrete distributions: Hypergeometric

Suppose, like in the case of the binomial, there are two possible outcomes and were
interested in the probability of x values of a particular outcome, but we are drawing
randomly without replacement so our trials are not independent.
In particular, suppose there are A + B objects from which we pick n, A of the total number
available are of one type (red balls) and the rest are of the other (blue balls).
If the random variable is the total number of red balls selected, then, for appropriate values
B
(A)(nx
)
of x, we have f(x; A, B, n) = x A+B
( n )
Over what values of x is this defined? max{0, n B, } X min{n, A}
The multivariate extension is (for xi 0, 1, 2..n,
n
P
xi = n and
i=1
Ki = M ):
i=1
m
Q
f(x1 . . . xm ; K1 . . . Km , n) =
m
P
Kj
xj
j=1
M
n
&
Page 12
%
Rohini Somanathan
'
Continuous distributions: uniform or rectangular

Parameters: (a, b) , (a, b) : < a < b <
Density: f(x; a, b) =
Moments: =
1
ba
(a+b)
,
2
I(a,b) (x) (hence the name rectangular)
2 =
(ba)2
12
Applications:
to construct the probability space of an experiment in which any outcome in the
interval [a, b] is equally likely.
to generate random samples from other distributions (based on the probability integral
transformation). This is part of your first lab assignment.
&
Page 13
%
Rohini Somanathan
'
The gamma function

The gamma function is a special mathematical function that is widely used in statistics. The
gamma function of is defined as
y1 ey dy
() =
(1)
If = 1, () =
R
0
ey dy

y
e
0
=1
If > 1, we can integrate (1) by parts, setting u = y1 and dv = ey and using the formula

R
R
R
1

udv = uv vdu to get: yey + ( 1) y2 ey dy
0
The first term in the above expression is zero because the exponential function goes to zero
faster than any polynomial and we obtain
() = ( 1)( 1)
and for any integer > 1, we have
() = ( 1)( 2)( 3) . . . (3)(2)(1)(1) = ( 1)!
&
Page 14
%
Rohini Somanathan
'
The gamma distribution

Define the variable x by y =
x
,
where > 0. Then dy =
x 1
() =
1
dx
and can rewrite () as
1
dx
or as
1=
x
1
1
x
e
dx
()
This shows that for , > 0,

f(x; , ) =
x
1
1
x
e
I(0,) (x)
()
is a valid density and is known as a gamma-type probability density function.
&
Page 15
%
Rohini Somanathan
'
Features of the gamma density

This is a valuable distribution because it can take a variety of shapes depending on the values of
the parameters and
It is skewed to the right
It is strictly decreasing when 1
If = 1, we have the exponential density, which is memory-less.
Distributions
For > 1 the density attains it maximum at x = ( 1)
e
t
a ! 0.1, b ! 0.1
a ! 1, b ! 1
a ! 2, b ! 2
a ! 3, b ! 3
1.2
Gamma p.d.f.
1.0
0.8
0.6
0.4
0.2
0
&
Page 16
%
Rohini Somanathan
Moments. Let X have the gamma distribution with parameters and . For k =
'
Moments of the gamma distribution

Parameters: (, ) , > 0, > 0
Moments: = , 2 = 2
MGF: MX (t) = (1 t) for t <
MX (t)
etx
=
0
which can be derived as follows:
x
1
1 e dx
x
()
1
1
1 e( t)x dx
x
()
=
=
1
()
1
()
1
1
1
t
Z
0

1 t)x 1 t

(
1
dx
x1
t
e
1
()
1 t)x in the expression for ().)
(by setting y = (
t
1

1 t
&
Page 17
%
Rohini Somanathan
'
Gamma applications
Survival analysis
The waiting time till the rth event/success: If X is the time that passes until the first
success, then X could be gamma distribution with = 1 and = 1 . This is known as an
exponential distribution. If, instead we are interested in the time taken for the rth
success, this has a gamma density with = r and 1 = .
Related to the Poisson distribution: If Y, the number of events in a given time period t
has a poisson density with parameter , the rate of success is given by = t .
Example: A bottling plant breaks down, on average, twice every four weeks. We want the
probability that the number of breakdowns, X 3 in the next four weeks. We have = 2
3
P
i
1
and the breakdown rate = 2 per week. P(X 3) =
e2 2i! = .135 + .271 + .271 + .18 = .857
i=0
Suppose we wanted the probability that the machine does not break down in the next four
weeks. The time taken until the first break-down, x must therefore be more than four
weeks. This follows a gamma distribution, with = 1 and = 1.

R 1 x
x
P(X 4) = 2 e 2 dx = e 2 = e2 = .135
4
Income distributions that are uni-modal
&
Page 18
%
Rohini Somanathan
'
Gamma distributions: some useful properties

Gamma Additivity: Let X1 , . . . Xn be independently distributed random variables with
respective gamma densities Gamma(i , ). Then
Y=
n
X
i=1
Scaling Gamma Random Variables:

Gamma(, ) and let c > 0. Then
n
X
Xi Gamma(
i , )
i=1
Let X be distributed with gamma density
Y = cX Gamma(, c)
Both these can be easily proved using the gamma MGF and applying the MGF uniqueness
theorem: In the first case the MGF of Y is the product of the individual MGFs, i.e.
MY (t) =
n
Y
i=1
n
P
n
Y
i
1
i
i=1
MXi (t) =
(1 t)
= (1 t)
for t <
i=1
For the second result, MY (t) = McX (t) = MX (ct) = (1 ct) for t <
1
c
&
Page 19
%
Rohini Somanathan
'
The gamma family: exponential distributions

An exponential distribution is simply a gamma distribution with = 1
Parameters: , > 0
Density: f(x; ) =
x
1
e
I(0,) (x)
Moments: = , 2 = 2
MGF: MX (t) = (1 t)1 for t <
Applications: As discussed above, the most important application the representation of

operating lives. The exponential is memoryless and so, if failure hasnt occurred, the object
(or person, animal) is as good as new. The risk of failure at any point t is given by the
hazard rate,
f(t)
h(t) =
S(t)
where S(t) is the survival function, 1 F(t). Verify that the hazard rate in this case is a
constant, 1 .
If we would like wear-out effects, we should use a gamma with > 1 and for work-hardening
effects, use a gamma with < 1
&
Page 20
%
Rohini Somanathan
'
The gamma family: chi-square distributions

An Chi-square distribution is simply a gamma distribution with =
v
2
and = 2
Parameters: v , v is a positive integer (referred to as the degrees of freedom)

Density: f(x; v) =
v
22
v 1 x
1
2
e 2 I(0,) (x)
x
( v
)
2
Moments: = v, 2 = 2v
v
MGF: MX (t) = (1 2t) 2 for t <
1
2
Applications:
Notice that for v = 2, the Chi-Square density is equivalent to the exponential density with
= 2. It is therefore decreasing for this value of v and hump-shaped for other higher values.
The 2 is especially useful in problems of statistical inference because if we have v
v
P
independent random variables, Xi N(0, 1), their sum
X2i 2v Many of the estimators we
i=1
use in our models fit this case (i.e. they can be expressed as the sum of independent normal
variables)
&
Page 21
%
Rohini Somanathan
'
The Normal (or Gaussian) distribution

This symmetric bell-shaped density is widely used because:
1. Outcomes certain types of continuous random variables can be shown to follow this type of
distribution, this is the motivation weve used for most parametric distributions weve
considered so far (heights-humans, animals and plants, weights, strength of physical
materials, the distance from the centre of a target if errors in both directions are
independent).
2. It has nice mathematical properties: many functions of a set normally distributed random
variables have distributions that take simple forms.
3. Central Limit Theorems: The sample mean of a random sample from any distribution with
finite variance is approximately normal.
&
Page 22
%
Rohini Somanathan
'
The Normal density

Parameters: (, 2 ) , (, ), > 0
Density: f(x; , 2 ) =
MGF: MX (t) = et+
1 x 2
1
e 2 ( ) I(,+) (x)
2
2 t2
2
The MGF can be used to derive the moments, E(X) = and variance is 2
As can be seen from the p.d.f, the distribution is symmetric around , where it achieves its
maximum value. this is therefore also the median and the mode of the distribution.
The normal distribution with zero mean and unit variance is known as the standard normal
1 2
distribution and is of the form: f(x; 0, 1) = 1 e 2 x I(,+) (x)
2
The tails of the distribution are thin: 68% of the total probability lies within one of the
mean, 95.4% within 2 and 99.7% within 3.
&
Page 23
%
Rohini Somanathan
'
The Normal distribution: deriving the MGF

By the definition of the MGF:
M(t)
(x)2
1
22
e
dx
e
2
h
i
Z
(x)2
tx
1
22
dx
e
2
tx
We can rewrite the term inside the square brackets to obtain:

(x )2
1 2 2
[x ( + 2 t)]2
tx
= t + t
22
2
22
The MGF can now be written as:
1
2 t2
MX (t) = Cet+ 2
where C =
e
2
[x(+2 t)]2
22
dx = 1 because the integrand is a normal p.d.f with parameter
replaced by ( + 2 t)
&
Page 24
%
Rohini Somanathan
'
The Normal distribution: computing moments

First taking derivatives of the MGF:
2 t2
2
M(t)
e(t+
M0 (t)
M(t)( + 2 t)
M00 (t)
M(t) 2 + M(t)( + 2 t)2
(obtained by differentiating M(t) with respect to t and substituting for M0 (t))

Evaluating these at t = 0, we get M0 (0) = and M00 (0) = 2 + 2 , or the variance = 2 .
&
Page 25
%
Rohini Somanathan
'
Transformations of Normally Distributed Variables...1

RESULT 1: Let X N(, 2 ). Then Z =
(X)
Proof: Z is of the form aX + b with a =
MZ (t) = ebt MX (at) = e t e
2
t+2 t 2
2
=e
N(0, 1)
and b =
. Therefore
t2
2
which is the MGF of a standard normal distribution.
An important implication of the above result is that if we are interested in any distribution in
this class of normal distributions, we only need to be able to compute integrals for the standard
normal-these are the tables youll see at the back of most textbooks.
Example: The kilometres per litre of fuel achieved by a new Maruti model , X N(17, .25). What
is the probability that a new car will achieve between 16 and 18 kilometres per litre?

1617
1817
Answer: P(16 x 18) = P
z .5
= P(2 z 2) = 1 2(.0228) = .9544
.5
&
Page 26
%
Rohini Somanathan
'
Transformations of Normals...2
RESULT 2: Let X N(, 2 ) and Y = aX + b, where a and b are given constants and a 6= 0,
then Y has a normal distribution with mean a + b and variance a2 2
1
2 2
2 2
Proof: The MGF of Y can be expressed as MY (t) = ebt eat+ 2 a t = e(a+b)t+ 2 (a) t .
This is simply the MGF for a normal distribution with the mean a + b and variance a2 2
RESULT 3: If X1 , . . . , Xk are independent and Xi has a normal distribution with mean i
and variance 2i , then Y = X1 + + Xk has a normal distribution with mean 1 + + k
and variance 21 + + 2k .
Proof: Write the MGF of Y as the product of the MGFs of the Xi s and gather linear and
squared terms separately to get the desired result.
We can combine these two results to derive the distribution of sample mean:
RESULT 4: Suppose that the random variables X1 , . . . , Xn form a random sample from a
n denote the sample mean.
normal distribution with mean and variance 2 , and let X
n has a normal distribution with mean and variance 2 .
Then X
n
&
Page 27
%
Rohini Somanathan
'
Transformations of Normals to 2 distributions

RESULT 5 : If X N(0, 1), then Y = X2 has a 2 distribution with one degree of freedom.
Proof:
MY (t)
x2 t
2
1
x2
dx
e
2
1 2
1
e 2 x (12t) dx
2
1
p
(1 2t)
1
(12t)
12 (x
(12t))2
dx
1
1
p
for t <
2
(1 2t)
( the integrand is a normal density with = 0 and 2 =
1
(12t) ).
The MGF obtained is that of a 2 random variable with v = 1 since the 2 MGF is given by
v
MX (t) = (1 2t) 2 for t <
1
2
&
Page 28
%
Rohini Somanathan
'
Normals and 2 distributions...

RESULT 6 : Let X1 , . . . Xn be independent random variables with each Xi N(0, 1), then Y =
n
P
i=1
has a
X2i
distribution with n degrees of freedom.
Proof:
MY (t)
n
Y
i=1
n
Y
MX2 (t)
i
(1 2t) 2
i=1
(1 2t)
n
2
for t <
1
2
which is the MGF of a 2 random variable with v = n. This is the reason that the parameter v is
called the degrees of freedom. There are n freely varying random variables whose sum of squares
represents a 2v -distributed random variable. This also follows directly from gamma-additivity.
&
Page 29
%
Rohini Somanathan
'
The Bivariate Normal distribution
The bivariate normal has the density:

2
1
q2
p
e
f(x, y) =
21 2 1 2
where
x y y 2 i
1 h x 1 2
1
2
2
q=
2
+
2
1
1
1
2
2
E(Xi ) = i , Var(Xi ) = 2i and the correlation coefficient (X1 , X2 ) =

Verify that in this case, X1 and X2 are independent iff they are uncorrelated.
Applications: heights of couples, scores on tests...
&
Page 30
%
Rohini Somanathan
'
The Multivariate Normal distribution

Parameters: (a, B) , a <n , B a symmetric positive definite matrix.
Density: f(x; a, B) =
1
1
|B| 2
n
(2) 2
0 B1 (xa)
e 2 (xa)
Moments: = a, Cov(X) = B
0
1 0
n(n+1)
MGF: MX (t) = ea t+ 2 t Bt Note: There are n +

parameters, n means and
2
unique elements in the variance-covariance matrix B.
n(n+1)
2
Applications: statistical inference in the classical linear regression model...and with large
samples in other models.
Additional distributions that well use mainly for inference are the Students t-distribution and
F-distribution. Well introduce these in the second half of the course.
&
Page 31
%
Rohini Somanathan

Topic 4: Some Special Distributions: Rohini Somanathan Course 003, 2015-2016

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Topic 4: Some Special Distributions: Rohini Somanathan Course 003, 2015-2016

Uploaded by

Copyright:

Available Formats

Course 003: Basic Econometrics, 2015-2016