Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

Lecture 6

Moments
Introduction

I The nth moment of an r.v. X is E (X n ). In this lecture, we


explore how the moments of an r.v. shed light on its
distribution. We have already seen that the first two moments
are useful since they provide the mean E (X ) and variance
E (X 2 ) − (EX )2 , which are important summaries of the
average value of X and how spread out its distribution is. But
there is much more to a distribution than its mean and
variance.
Summaries of a distribution
I The mean is called a measure of central tendency because it tells us
something about the center of a distribution, specifically its center
of mass. Other measures of central tendency that are commonly
used in statistics are the median and the mode, which we now
define.
I Median: We say that c is a median of an r.v. X if P(X ≤ c) ≥ 1/2
and P(X ≥ c) ≥ 1/2. (The simplest way this can happen is if the
CDF of X hits 1/2 exactly at c, but we know that some CDFs have
jumps.)
I Mode: For a discrete r.v. X, we say that c is a mode of X if it
maximizes the PMF: P(X = c) ≥ P(X = x) for all x. For a
continuous r.v. X with PDF f, we say that c is a mode if it
maximizes the PDF: f (c) ≥ f (x) for all x.
I Intuitively, the median is a value c such that half the mass of the
distribution falls on either side of c (or as close to half as possible,
for discrete r.v.s), and the mode is a value that has the greatest
mass or density out of all values in the support of X. If the CDF F is
a continuous, strictly increasing function, then F −1 (1/2) is the
median (and is unique).
Summaries of a distribution (contd.)
Theorem 1.
Let X be an r.v. with mean µ.
I The value of c that minimizes the mean squared error E (X − c)2 is
c = µ.
I Proof. We will first prove a useful identity:

E (X − c)2 = Var (X ) + (µ − c)2 .

This can be checked by expanding everything out, but a faster way


is to use the fact that adding a constant doesn’t affect the variance:

Var (X ) = Var (X − c) = E (X − c)2 − (E (X − c))2


= E (X − c)2 − (µ − c)2

I It follows that c = µ is the unique choice that minimizes


E (X − c)2 , since that choice makes (µ − c)2 = 0 and any other
choice makes (µ − c)2 > 0.
Summaries of a distribution (contd.)

Figure 1 Two PDFs with mean 2 and variance 12. The light curve is Normal and
symmetric; the dark curve is Log-Normal and right-skewed.
The Normal curve is symmetric about 2, so its mean, median, and mode are all 2. In
s
contrast, the Log-Normal is heavily skewed to the right; this means its right tail is very
long compared to its left tail. It has mean 2, but median 1 and mode 0.25. From the
mean and variance alone, we would not be able to capture the difference between the
asymmetry of the Log-Normal and the symmetry of the Normal. We will learn that a
standard measure of the asymmetry of a distribution is based on the third moment.
Summaries of a distribution (contd.)

Figure 2 Left: N(0, 1) PDF (light) and a scaled t3 PDF (dark). Both have mean 0
and variance 1, but the latter has a sharper peak and heavier tails. Right: magnified
version of right tail behavior.
The previous example considered difference between symmetric and asymmetric
s the same mean and variance can also
distributions, but symmetric distributions with
look very different. The left plot in above figure shows two symmetric PDFs with
mean 0 and variance 1. The dark curve has a sharper peak and heavier tails than the
light curve. The tail behavior is magnified in the right plot, where it is easy to see that
the dark curve decays to 0 much more gradually, making outcomes far out in the tail
much more likely than for the light curve. A standard measure of the peakedness of a
distribution and the heaviness of its tails is based on the fourth moment.
Interpreting moments

I Kinds of moments: Let X be an r.v. with mean µ and


variance σ 2 . For any positive integer n, the nth moment of X
is E (X n ), the nth central moment is E ((X − µ)n ) , and the
nth standardized moment is E (( X σ−µ )n )
I Skewness: The skewness of an r.v. X with mean µ and
variance σ 2 is the third standardized moment of X:
Skew (X ) = E ( X σ−µ )3
Interpreting moments (contd.)
I Symmetry of an r.v.: We say that an r.v. X has a symmetric distribution
about µ if X − µ has the same distribution as µ − X . We also say that X
is symmetric or that the distribution of X is symmetric.
I The number µ in the above definition must be E (X ) if the mean exists,
since
E (X ) − µ = E (X − µ) = E (µ − X ) = µ − E (X )
simplifies to E (X ) = µ.
I The number µ is also a median of the distribution, since if X − µ has the
same distribution as µ − X , then
P(X − µ ≤ 0) = P(µ − X ≤ 0)
so
P(X ≤ µ) = P(X ≥ µ)
which implies that
P(X ≤ µ) = 1 − P(X > µ) ≥ 1 − P(X ≥ µ) = 1 − P(X ≤ µ),
showing that P(X ≤ µ) ≥ 1/2 and P(X ≥ µ) ≥ 1/2
I Intuitively, symmetry means that the PDF of X to the left of µ is the
mirror image of the PDF of X to the right of µ (for X continuous, and
the same holds for the PMF if X is discrete).
Interpreting moments (contd.)

Proposition 1.
Odd central moments of a symmetric distribution: Let X be symmetric
about its mean µ. Then for any odd number m, the mth central moment
E (X − µ)m is 0 if it exists.
Proof. Since X − µ has the same distribution as µ − X , they have the
same mth moment (if it exists):

E (X − µ)m = E (µ − X )m

Let Y = (X − µ)m . Then (µ − X )m = (−(X − µ))m = (−1)m Y = −Y ,


so the above equation just says E (Y ) = −E (Y ). So E (Y ) = 0. This
leads us to consider using an odd standardized moment as a measure of
the skew of a distribution. The first standardized moment is always 0, so
the third standardized moment is taken as the definition of skewness.
Positive skewness is indicative of having a long right tail relative to the
left tail, and negative skewness is indicative of the reverse.
Interpreting moments (contd.)
I Another important descriptive feature of a distribution is how heavy
(or long) its tails are. For a given variance, is the variability
explained more by a few rare (extreme) events or by a moderate
number of moderate deviations from the mean?
I This is an important consideration for risk management in finance:
for many financial assets, the distribution of returns has a heavy left
tail caused by rare but severe crisis events, and failure to account
for these rare events can have disastrous consequences, as
demonstrated by the 2008 financial crisis.
I The kurtosis of an r.v. X with mean µ and variance σ 2 is a shifted
version of the fourth standardized moment of X:
X −µ 4
Kurt(X ) = E ( ) − 3.
σ
The reason for subtracting 3 is that this makes any Normal
distribution have kurtosis 0. However, some sources define the
kurtosis without the 3, in which case they call our version “excess
kurtosis”.
I For interpretation of the Kurtosis measure, see Moors, 1986,
DeCarlo, 1997.
Moment generating functions
I Moment generating function: The moment generating function
(MGF) of an r.v. is M(t) = E (e tX ), as a function of t, if this is
finite on some open interval (−a, a) containing 0. Otherwise we say
the MGF of X does not exist.
I What is the interpretation of t? The answer is that t has no
interpretation in particular; it’s just a bookkeeping device that we
introduce in order to be able to use calculus instead of working with
a discrete sequence of moments.
I Note that M(0) = 1 for any valid MGF M; whenever you compute
an MGF, plug in 0 and see if you get 1, as a quick check!
Moment generating functions

I Bernoulli MGF: For X ∼ Bern(p), e tX takes on the value e t with


probability p and the value 1 with probability q, so
M(t) = E (e tX ) = pe t + q. Since this is finite for all values of t, the
MGF is defined on the entire real line.
I Geometric MGF: For X ∼ Geom(p),
∞ ∞
X X p
M(t) = E (e tX ) = e tK q k p = p (qe t )k =
1 − qe t
k=0 k=0

forqe t < 1, i.e for t in (−∞, log (1/q))


I Uniform MGF: Let U ∼ Unif (a, b). Then the MGF of U is
b
e tb − e ta
Z
1
M(t) = E (e tU ) = e tu du =
b−a a t(b − a)

for t 6= 0, and M(0) = 1.


Moment generating functions (contd.)
The next three theorems give three reasons why the MGF is important. First,
the MGF encodes the moments of an r.v. Second, the MGF of an r.v.
determines its distribution, like the CDF and PMF/PDF. Third, MGFs make it
easy to find the distribution of a sum of independent r.v.s.
Theorem 2.
Moments via derivatives of the MGF: Given the MGF of X, we can get the nth
moment of X by evaluating the nth derivative of the MGF at 0:
E (X n ) = M (n) (0)
I Proof: This can be seen by noting that the Taylor expansion of M(t)
about 0 is

X tn
M(t) = M (n) (0)
n=0
n!
while on the other hand, we also have

X tn
M(t) = E (e tX ) = E ( Xn )
n=0
n!


X tn
M(t) = E (X n )
n=0
n!

Matching the coefficients of the two expansions, we get E (X n ) = M (n) (0)


Moment generating functions (contd.)

Theorem 3.
MGF determines the distribution: The MGF of a random variable
determines its distribution: if two r.v.s have the same MGF, they must
have the same distribution. In fact, if there is even a tiny interval (−a, a)
containing 0 on which the MGFs are equal, then the r.v.s must have the
same distribution.

Theorem 4.
MGF of a sum of independent r.v.s: If X and Y are independent, then the
MGF of X + Y is the product of the individual MGFs:

MX +Y (t) = MX (t)My (t)

This is true because if X and Y are independent, then


E (e t(X +Y ) ) = E (e tX )E (e tY )
Moment generating functions (contd.)

Using the last theorem, we can easily get the MGFs of the Binomial and
Negative Binomial.
I Binomial MGF The MGF of a Bern(p) r.v. is pe t + q, so the MGF
of a Bin(n, p) r.v. is

M(t) = (pe t + q)n

I Negative Binomial MGF: We know the MGF of a Geom(p) r.v. is


p t
1−qe t for qe < 1, so the MGF of X ∼ NBin(r , p) is

p
M(t) = ( )r
1 − qe t

for qe t < 1.
Moment generating functions (contd.)
Proposition 2.
MGF of location-scale transformation: If X has MGF M(t), then the MGF of a + bX is

E (e t(a+bX ) )) = e at E (e btX ) = e at M(bt)


For example, let’s use this proposition to help obtain the MGFs of the Normal and
Exponential distributions.
I Normal MGF: The MGF of a standard Normal r.v. Z is
Z ∞
1 −z 2
MZ (t) = E (e tZ ) = e tz √ e 2 dz
−∞ 2π
After completing the square, we have
Z ∞
t2 1 −(z−t)2 t2
MZ (t) = e 2 √ e 2 dz = e 2
−∞ 2π
using the fact that the N(t, 1) PDF integrates to 1. Thus, the MGF of
X = µ + σZ ∼ N(µ, σ 2 ) is
(σt)2 1 2 2
MX (t) = e µt MZ (σt) = e µt e 2 = e µt+ 2 σ t

I Exponential MGF: The MGF of X ∼ Expo(1) is


Z ∞ Z ∞
1
M(t) = E (e tX ) = e tx e −x dx = e (−x(1−t)) dx = for t < 1
0 0 1−t
So the MGF of Y = X /λ ∼ Expo(λ) is MY (t) = MX ( λt ) = λ
λ−t
for t < λ
Finding all the moments via MGFs
I Exponential moments: Let X ∼ Expo(1). The MGF of X is M(t) = 1
1−t
for
t<1
∞ X tn ∞
1 X
M(t) = = tn = n!
1−t n=0 n=0
n!

On the other hand, we know that E (X n ) is the coefficient of the term involving
t n in the Taylor expansion of M(t):

X tn
M(t) = E (X n )
n=0
n!
I Thus we can match coefficients to conclude that E (X n ) = n! for all n. We not
only did not have to do a LOTUS integral, but also we did not, for example,
have to take 10 derivatives to get the 10th moment—we got the moments all at
once.
I To find the moments of Y ∼ Expo(λ), use a scale transformation: we can
express Y = X /λ where X ∼ Expo(1). Therefore Y n = X n /λn and
n!
E (Y n ) =
λn
I We have found the mean and variance of Y
1
E (Y ) =
λ
2 1 1
Var (Y ) = E (Y 2 ) − (EY )2 = − 2 = 2
λ2 λ λ

You might also like