Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

7.

2: The Method of Moments

Basic Theory
The Method
Suppose that we have a basic random experiment with an observable, real-valued random variable X. The distribution of X
has k unknown real-valued parameters, or equivalently, a parameter vector θ = (θ , θ , … , θ ) taking values in a parameter 1 2 k

space, a subset of R . As usual, we repeat the experiment n times to generate a random sample of size n from the distribution
k

of X.

X = (X 1 , X 2 , … , X n ) (7.2.1)

Thus, X is a sequence of independent random variables, each with the distribution of X. The method of moments is a technique
for constructing estimators of the parameters that is based on matching the sample moments with the corresponding distribution
moments. First, let
(j) j
μ (θ) = E (X ) , j ∈ N+ (7.2.2)

so that μ (θ) is the j th moment of X about 0. Note that we are emphasizing the dependence of these moments on the vector
(j)

of parameters θ . Note also that μ (θ) is just the mean of X, which we usually denote simply by μ. Next, let
(1)

n
1 j
(j)
M (X) = ∑X , j ∈ N+ (7.2.3)
i
n
i=1

so that M
(j)
(X) is the j th sample moment about 0. Equivalently, M
(j)
(X) is the sample mean for the random sample
j j
(X , X , … , X n )
1 2
j
from the distribution of X . Note that we are emphasizing the dependence of the sample moments on the
j

sample X . Note also that M (X) is just the ordinary sample mean, which we usually just denote by M (or by M if we wish
(1)
n

to emphasize the dependence on the sample size). From our previous work, we know that M (X) is an unbiased and (j)

consistent estimator of μ (θ) for each j . Here's how the method works:
(j)

To construct the method of moments estimators (W1 , W2 , … , Wk ) for the parameters (θ1 , θ2 , … , θk ) respectively, we
consider the equations
(j) (j)
μ (W1 , W2 , … , Wk ) = M (X 1 , X 2 , … , X n ) (7.2.4)

consecutively for j ∈ N until we are able to solve for (W


+ 1, W2 , … , Wk ) in terms of (M (1)
,M
(2)
, …) .

The equations for j ∈ {1, 2, … , k} give k equations in k unknowns, so there is hope (but no guarantee) that the equations can
be solved for (W , W , … , W ) in terms of (M , M , … , M ) . In fact, sometimes we need equations with j > k . Exercise
1 2 k
(1) (2) (k)

28 below gives a simple example. The method of moments can be extended to parameters associated with bivariate or more
general multivariate distributions, by matching sample product moments with the corresponding distribution product
moments. The method of moments also sometimes makes sense when the sample variables (X , X , … , X ) are not 1 2 n

independent, but at least are identically distributed. The hypergeometric model below is an example of this.
Of course, the method of moments estimators depend on the sample size n ∈ N . We have suppressed this so far, to keep the +

notation simple. But in the applications below, we put the notation back in because we want to discuss asymptotic behavior.

Estimates for the Mean and Variance


Estimating the mean and variance of a distribution are the simplest applications of the method of moments. Throughout this
subsection, we assume that we have a basic real-valued random variable X with μ = E(X) ∈ R and σ = var(X) ∈ (0, ∞) . 2

Occasionally we will also need σ = E[(X − μ) ] , the fourth central moment. We sample from the distribution of X to
4
4

produce a sequence X = (X , X , …) of independent variables, each with the distribution of X. For each n ∈ N ,
1 2 +

X = (X , X , … , X ) is a random sample of size n from the distribution of X . We start by estimating the mean, which is
n 1 2 n

essentially trivial by this method.


Suppose that the mean μ is unknown. The method of moments estimator of μ based on X is the sample mean n

n
1
Mn = ∑ Xi (7.2.5)
n
i=1

1. E(M ) = μ so M is unbiased for n ∈ N


n n +

2. var(M ) = σ /n for n ∈ N so M = (M
n
2
+ 1, M2 , …) is consistent.
Proof
It does not get any more basic than this. The method of moments works by matching the distribution mean with the
sample mean. The fact that E(M ) = μ and var(M ) = σ /n for n ∈ N are properties that we have seen several times
n n
2
+

before.

Estimating the variance of the distribution, on the other hand, depends on whether the distribution mean μ is known or
unknown. First we will consider the more realistic case when the mean in also unknown. Recall that for n ∈ {2, 3, …} , the
sample variance based on X is n

n
1
2 2
Sn = ∑(X i − Mn ) (7.2.6)
n−1
i=1

Recall also that E(S so S is unbiased for , and that so is


2 2 2 2 1 n−3 4 2 2 2
n) = σ n n ∈ {2, 3, …} var(S n ) =
n
(σ 4 − σ ) S = (S
2
,S
3
, …)
n−1

consistent.

Suppose that the mean μ and the variance σ


2
are both unknown. For n ∈ N+ , the method of moments estimator of σ
2

based on X is n

n
1
2 2
Tn = ∑(X i − Mn ) (7.2.7)
n
i=1

1. bias(T 2
n ) = −σ /n
2
for n ∈ N + so T 2
= (T
1
2
,T
2
2
, …) is asymptotically unbiased.

2. mse(T n
2
) =
n
1
3
[(n − 1) σ 4 − (n
2 2
− 5n + 3)σ ]
4
for n ∈ N so T is consistent.
+
2

Proof
As before, the method of moments estimator of the distribution mean μ is the sample mean Mn . On the other hand,
and hence the method of moments estimator of is , which simplifies to the result
(2)
2 (2) 2 2 2 2
σ = μ − μ σ Tn = Mn − Mn

above. Note that T = S for n ∈ {2, 3, …} .


n
2 n−1

n
2
n

1. Note that 2
E(Tn ) =
n−1

n
E(Sn ) =
2 n−1

n
σ
2
, so bias(T 2
n ) =
n−1

n
σ
2
− σ
2
= −
1

n
σ
2
.
2

2. Recall that 2
mse(Tn ) = var(Tn ) + bias (Tn )
2 2 2
. But var(Tn ) = (
2 n−1

n
) var(Sn )
2
. The result follows from substituting
2
var(Sn ) given above and 2
bias(Tn ) in part (a).

Hence T is negatively biased and on average underestimates σ . Because of this result,


n
2 2
Tn
2
is referred to as the biased sample
variance to distinguish it from the ordinary (unbiased) sample variance S . 2
n

Next let's consider the usually unrealistic (but mathematically interesting) case where the mean is known, but not the variance.

Suppose that the mean μ is known and the variance σ


2
unknown. For n ∈ N+ , the method of moments estimator of σ
2

based on X is n

n
1 2
2
Wn = ∑(X i − μ) (7.2.8)
n
i=1

1. E(W ) = σ so W is unbiased for n ∈ N


2
n
2 2
n +

2. var(W ) = (σ − σ ) for n ∈ N so W
2
n
1

n 4
4
+
2
= (W
1
2
,W
2
2
, …) is consistent.
Proof
These results follow since \W is the sample mean corresponding to a random sample of size
2

n
n from the distribution of
(X − μ) .
2

We compared the sequence of estimators S with the sequence of estimators W in the introductory section on Estimators.
2 2

Recall that var(W ) < var(S ) for n ∈ {2, 3, …} but var(S )/var(W ) → 1 as n → ∞ . There is no simple, general
2
n
2
n
2
n
2
n
relationship between mse(T 2
n ) and mse(S 2
n) or between mse(T 2
n ) and mse(W 2
n ) , but the asymptotic relationship is simple.

2
mse(Tn )/mse(Wn ) → 1
2
and mse(T n
2 2
)/mse(S n ) → 1 as n → ∞
Proof
In light of the previous remarks, we just have to prove one of these limits. The first limit is simple, since the coefficients of
σ and σ in mse(T ) are asymptotically 1/n as n → ∞ .
4 2
4 n

It also follows that if both μ and σ are unknown, then the method of moments estimator of the standard deviation σ is
2

−−− −−−
T = √T 2
. In the unlikely event that μ is known, but σ unknown, then the method of moments estimator of σ is W = √W .
2 2

Estimating Two Parameters


There are several important special distributions with two paraemters; some of these are included in the computational
exercises below. With two parameters, we can derive the method of moments estimators by matching the distribution mean
and variance with the sample mean and variance, rather than matching the distribution mean and second moment with the
sample mean and second moment. This alternative approach sometimes leads to easier equations. To setup the notation,
suppose that a distribution on R has parameters a and b. We sample from the distribution to produce a sequence of
independent variables X = (X , X , …) , each with the common distribution. For n ∈ N , X = (X , X , … , X ) is a
1 2 + n 1 2 n

random sample of size n from the distribution. Let M , M , and denote the sample mean, second-order sample mean,
(2)
2
n n Tn

and biased sample variance corresponding to X , and let μ(a, b), μ n


(2)
(a, b) , and σ (a, b) denote the mean, second-order mean,
2

and variance of the distribution.

If the method of moments estimators U and V of a and b, respectively, can be found by solving the first two equations
n n

(2) (2)
μ(Un , Vn ) = Mn , μ (Un , Vn ) = Mn (7.2.9)

then U and V can also be found by solving the equations


n n

2 2
μ(Un , Vn ) = Mn , σ (Un , Vn ) = Tn (7.2.10)

Proof
Recall that . In addition, . Hence the equations ,
2 (2) 2 2 (2) 2
σ (a, b) = μ (a, b) − μ (a, b) Tn = Mn − Mn μ(U n , Vn ) = M n

are equivalent to the equations μ(U ,μ .


2 2 (2) (2)
σ (U n , Vn ) = Tn n
, Vn ) = M n (U n , Vn ) = M n

Because of this result, the biased sample variance 2


Tn will appear in many of the estimation problems for special distributions
that we consider below.

Special Distributions
The Normal Distribution
The normal distribution with mean μ ∈ R and variance σ
2
∈ (0, ∞) is a continuous distribution on R with probability density
function g given by
2
1 1 x−μ
g(x) = exp[− ( ) ], x ∈ R (7.2.11)
−−
√ 2πσ 2 σ

This is one of the most important distributions in probability and statistics, primarily because of the central limit theorem. The
normal distribution is studied in more detail in the chapter on Special Distributions.
Suppose now that X = (X , X , … , X ) is a random sample of size n from the normal distribution with mean μ and
1 2 n

variance σ . Form our general work above, we know that if μ is unknown then the sample mean M is the method of moments
2

estimator of μ, and if in addition, σ is unknown then the method of moments estimator of σ is T . On the other hand, in the
2 2 2

unlikely event that μ is known then W is the method of moments estimator of σ . Our goal is to see how the comparisons
2 2

above simplify for the normal distribution.

Mean square errors of S and T . 2


n n
2

1. mse(T 2
) =
2n−1

n
2
σ
4

2. mse(S 2
) =
n−1
2
σ
4

3. mse(T 2
) < mse(S
2
) for n ∈ {2, 3, … , }
Proof
Recall that for the normal distribution, σ = 3σ . Substituting this into the general results gives parts (a) and (b). Part (c)
4
4

follows from (a) and (b). Of course the asymptotic relative efficiency is still 1, from our previous theorem.

Thus, S and T are multiplies of one another; S is unbiased, but when the sampling distribution is normal,
2 2 2
T
2
has smaller
mean square error. Surprisingly, T has smaller mean square error even than W .
2 2

Mean square errors of T and W . 2 2

1. mse(W ) = σ
2 2

n
4

2. mse(T ) < mse(W


2 2
) for n ∈ {2, 3, …}
Proof
Again, since the sampling distribution is normal, σ 4 = 3σ
4
. Substituting this into the gneral formula for var(Wn )
2
gives
part (a).

Run the normal estimation experiment 1000 times for several values of the sample size n and the parameters μ and σ.
Compare the empirical bias and mean square error of S and of T to their theoretical values. Which estimator is better in
2 2

terms of bias? Which estimator is better in terms of mean square error?


−−
Next we consider estimators of the standard deviation σ. As noted in the general discussion above, T = √T is the method of 2

−−−
moments estimator when μ is unknown, while W = √W is the method of moments estimator in the unlikely event that μ is
2

−−
known. Another natural estimator, of course, is S = √S , the usual sample standard deviation. The following sequence,
2

defined in terms of the gamma function turns out to be important in the analysis of all three estimators.

Consider the sequence




2 Γ[(n + 1)/2)
an = √ , n ∈ N+ (7.2.12)
n Γ(n/2)

Then 0 < a n
< 1 for n ∈ N +
and a n
↑ 1 as n ↑ ∞ .

First, assume that μ is known so that W is the method of moments estimator of σ.


n

For n ∈ N ,
+

1. E(W ) = a σ n

2. bias(W ) = (a − 1)σ n

3. var(W ) = (1 − a ) σ 2
n
2

4. mse(W ) = 2(1 − a )σ n
2

Proof
Recall that U = nW /σ has the chi-square distribution with n degrees of freedom, and hence U has the chi distribution
2 2 2

with n degrees of freedom. Solving gives


σ
W = U (7.2.13)

√n

From the formulas for the mean and variance of the chi distribution we have

σ σ – Γ[(n + 1)/2)
E(W ) = E(U ) = √2 = σan
− −
√n √n Γ(n/2)
2 2
σ σ
2 2 2
var(W ) = var(U ) = {n − [E(U )] } = σ (1 − an )
n n

Thus W is negatively biased as an estimator of σ but asymptotically unbiased and consistent. Of course we know that in
general (regardless of the underlying distribution), W is an unbiased estimator of σ and so W is negatively biased as an
2 2

estimator of σ. In the normal case, since a involves no unknown parameters, the statistic W /a is an unbiased estimator of σ.
n n

Next we consider the usual sample standard deviation S .

For n ∈ {2, 3, …} ,
1. E(S ) = a σ
n−1

2. bias(S ) = (a − 1)σ
n−1

3. var(S ) = (1 − a ) σ 2
n−1
2

4. mse(S ) = 2(1 − a )σ n−1


2

Proof
Recall that V = (n − 1)S /σ has the chi-square distribution with n − 1 degrees of freedom, and hence V has the chi
2 2 2

distribution with n − 1 degrees of freedom. The proof now proceeds just as in the previous theorem, but with n − 1
replacing n.

As with W , the statistic S is negatively biased as an estimator of σ but asymptotically unbiased, and also consistent. Since a n−1

involves no unknown parameters, the statistic S /a is an unbiased estimator of σ. Note also that, in terms of bias and mean
n−1

square error, S with sample size n behaves like W with sample size n − 1 . Finally we consider T , the method of moments
estimator of σ when μ is unknown.

For n ∈ {2, 3, …} ,
−−−
1. E(T ) = √
n−1
an−1 σ
n

−−−
2. bias(T ) = (√
n−1
an−1 − 1) σ
n

3. var(T ) = n−1

n
(1 − a
2
n−1

2

−−−
4. mse(T ) = (2 − 1

n
− 2√
n−1

n
an−1 ) σ
2

Proof

The Bernoulli Distribution


Recall that an indicator variable is a random variable X that takes only the values 0 and 1. The distribution of X is known as the
Bernoulli distribution, named for Jacob Bernoulli, and has probability density function g given by
x 1−x
g(x) = p (1 − p) , x ∈ {0, 1} (7.2.14)

where p ∈ (0, 1) is the success parameter. The mean of the distribution is p and the variance is p(1 − p) .
Suppose now that X = (X , X , … , X ) is a random sample of size n from the Bernoulli distribution with unknown success
1 2 n

parameter p. Since the mean of the distribution is p, it follows from our general work above that the method of moments
estimator of p is M , the sample mean. In this case, the sample X is a sequence of Bernoulli trials, and M has a scaled version
of the binomial distribution with parameters n and p:
k n
k n−k
P (M = ) = ( )p (1 − p) , k ∈ {0, 1, … , n} (7.2.15)
n k

Note that since X = X for every k ∈ N , it follows that μ = p and M = M for every k ∈ N . So any of the method of
k
+
(k) (k)
+

moments equations would lead to the sample mean M as the estimator of p. Although very simple, this is an important
application, since Bernoulli trials are found embedded in all sorts of estimation problems, such as empirical probability density
functions and empirical distribution functions.

The Geometric Distribution


The geometric distribution on N with success parameter p ∈ (0, 1) has probability density function g given by
+

x−1
g(x) = p(1 − p) , x ∈ N+ (7.2.16)

The geometric distribution on N governs the number of trials needed to get the first success in a sequence of Bernoulli trials
+

with success parameter p. The mean of the distribution is μ = 1/p .

Suppose that X = (X , X , … , X ) is a random sample of size


1 2 n
n from the geometric distribution on N+ with unknown
success parameter p. The method of moments estimator of p is
1
U = (7.2.17)
M

Proof
The method of moments equation for U is 1/U = M .
The geometric distribution on N with success parameter p ∈ (0, 1) has probability density function
x
g(x) = p(1 − p) , x ∈ N (7.2.18)

This version of the geometric distribution governs the number of failures before the first success in a sequence of Bernoulli
trials. The mean of the distribution is μ = (1 − p)/p .

Suppose that X = (X , X , … , X ) is a random sample of size


1 2 n n from the geometric distribution on N with unknown
parameter p. The method of moments estimator of p is
1
U = (7.2.19)
M +1

Proof
The method of moments equation for U is (1 − U )/U = M .

The Negative Binomial Distribution


More generally, the negative binomial distribution on N with shape parameter k ∈ (0, ∞) and success parameter p ∈ (0, 1) has
probability density function
x+k−1
k x
g(x) = ( )p (1 − p) , x ∈ N (7.2.20)
k−1

If k is a positive integer, then this distribution governs the number of failures before the kth success in a sequence of Bernoulli
trials with success parameter p. However, the distribution makes sense for general k ∈ (0, ∞) . The negative binomial
distribution is studied in more detail in the chapter on Bernoulli Trials. The mean of the distribution is k(1 − p)/p and the
variance is k(1 − p)/p . Suppose now that X = (X , X , … , X ) is a random sample of size n from the negative binomial
2
1 2 n

distribution on N with shape parameter k and success parameter p

If k and p are unknown, then the corresponding method of moments estimators U and V are
2
M M
U = , V = (7.2.21)
2 2
T −M T

Proof
Matching the distribution mean and variance to the sample mean and variance gives the equations
1 − V 1 − V
2
U = M, U = T (7.2.22)
2
V V

As usual, the results are nicer when one of the parameters is known.

Suppose that k is known but p is unknown. The method of moments estimator V of p is k

k
Vk = (7.2.23)
M +k

Proof
Matching the distribution mean to the sample mean gives the equation
1 − Vk
k = M (7.2.24)
Vk

Suppose that k is unknown but p is known. The method of moments estimator of k is


p
Up = M (7.2.25)
1−p

1. E(U ) = k so U is unbiased.
p p

2. var(U ) = p
so U is consistent.
k

n(1−p)
p

Proof
1−p
Matching the distribution mean to the sample mean gives the equation U p p
= M .

1. E(U and E(M )


p 1−p

p
) = E(M ) = k
1−p p
2
1−p
2. var(U and var(M )
p 1
p
) = ( ) var(M ) = var(X) =
1−p n np2

The Poisson Distribution


The Poisson distribution with parameter r ∈ (0, ∞) is a discrete distribution on N with probability density function g given by
x
−r
r
g(x) = e , x ∈ N (7.2.26)
x!

The mean and variance are both r . The distribution is named for Simeon Poisson and is widely used to model the number of
“random points” is a region of time or space. The parameter r is proportional to the size of the region, with the proportionality
constant playing the role of the average rate at which the points are distributed in time or space. The Poisson distribution is
studied in more detail in the chapter on the Poisson Process.
Suppose now that X = (X , X , … , X ) is a random sample of size n from the Poisson distribution with parameter r . Since
1 2 n
r

is the mean, it follows from our general work above that the method of moments estimator of r is the sample mean M .

The Gamma Distribution


The gamma distribution with shape parameter k ∈ (0, ∞) and scale parameter b ∈ (0, ∞) is a continuous distribution on (0, ∞)

with probability density function g given by


1 k−1 −x/b
g(x) = x e , x ∈ (0, ∞) (7.2.27)
Γ(k)bk

The gamma probability density function has a variety of shapes, and so this distribution is used to model various types of
positive random variables. The gamma distribution is studied in more detail in the chapter on Special Distributions. The mean
is μ = kb and the variance is σ = kb . 2 2

Suppose now that X = (X 1, X2 , … , Xn ) is a random sample from the gamma distribution with shape parameter k and scale
parameter b.

Suppose that k and b are both unknown, and let U and V be the corresponding method of moments estimators. Then
2 2
M T
U = , V = (7.2.28)
2
T M

Proof
Matching the distribution mean and variance with the sample mean and variance leads to the equations UV = M ,
= T . Solving gives the results.
2 2
UV

The method of moments estimators of k and b given in the previous exercise are complicated, nonlinear functions of the
sample mean M and the sample variance T . Thus, computing the bias and mean square errors of these estimators are
2

difficult problems that we will not attempt. However, we can judge the quality of the estimators empirically, through
simulations.
When one of the parameters is known, the method of moments estimator of the other parameter is much simpler.

Suppose that k is unknown, but b is known. The method of moments estimator of k is


M
Ub = (7.2.29)
b

1. E(U ) = k so U is unbiased.
b b

2. var(U ) = k/n so U is consistent.


b b

Proof
If b is known, then the method of moments equation for U is b
bU b = M . Solving gives (a). Next,
E(U ) = E(M )/b = kb/b = k , so U is unbiased. Finally var(U ) = var(M )/b /(nb ) = k/n .
2 2 2
b b b
= kb

Suppose that b is unknown, but k is known. The method of moments estimator of b is


M
Vk = (7.2.30)
k

1. E(V ) = b so V is unbiased.
k k

2. var(V ) = b /kn so that V is consistent.


k
2
k
Proof
If k is known, then the method of moments equation for V is k
kVk = M . Solving gives (a). Next,
E(V ) = E(M )/k = kb/k = b , so V is unbiased. Finally var(V ) = var(M )/k .
2 2 2 2
k k k = kb /(nk ) = b /kn

Run the gamma estimation experiment 1000 times for several different values of the sample size n and the parameters k
and b. Note the empirical bias and mean square error of the estimators U , V , U , and V . One would think that the b k

estimators when one of the parameters is known should work better than the corresponding estimators when both
parameters are unknown; but investigate this question empirically.

The Beta Distribution


The beta distribution with left parameter a ∈ (0, ∞) and right parameter b ∈ (0, ∞) is a continuous distribution on (0, 1) with
probability density function g given by
1
a−1 b−1
g(x) = x (1 − x) , 0 < x < 1 (7.2.31)
B(a, b)

The beta probability density function has a variety of shapes, and so this distribution is widely used to model various types of
random variables that take values in bounded intervals. The beta distribution is studied in more detail in the chapter on
a(a+1)
Special Distributions. The first two moments are μ = a

a+b
and μ (2)
=
(a+b)(a+b+1)
.

Suppose now that X = (X 1 , X 2 , … , X n ) is a random sample of size n from the beta distribution with left parameter a and
right parameter b.

Suppose that a and b are both unknown, and let U and V be the corresponding method of moments estimators. Then
(2) (2)
M (M − M ) (1 − M ) (M − M )
U = , V = (7.2.32)
M (2) − M 2 M (2) − M 2

Proof
The method of moments equations for U and V are
U U (U + 1)
(2)
= M, = M (7.2.33)
U + V (U + V )(U + V + 1)

Solving gives the result.

The method of moments estimators of a and b given in the previous exercise are complicated nonlinear functions of the sample
moments M and M . Thus, we will not attempt to determine the bias and mean square errors analytically, but you will have
(2)

an opportunity to explore them empricially through a simulation.

Suppose that a is unknown, but b is known. Let U be the method of moments estimator of a. Then
b

M
Ub = b (7.2.34)
1−M

Proof
If b is known then the method of moments equation for U as an estimator of b a is U b /(U b + b) = M . Solving for Ub gives
the result.

Suppose that b is unknown, but a is known. Let V be the method of moments estimator of b. Then
a

1−M
Va = a (7.2.35)
M

Proof
If a is known then the method of moments equation for Va as an estimator of b is a/(a + Va ) = M . Solving for Va gives
the result.

Run the beta estimation experiment 1000 times for several different values of the sample size n and the parameters a and
b . Note the empirical bias and mean square error of the estimators U , V , U , and V . One would think that the estimators
b a

when one of the parameters is known should work better than the corresponding estimators when both parameters are
unknown; but investigate this question empirically.
The following problem gives a distribution with just one parameter but the second moment equation from the method of
moments is needed to derive an estimator.

Suppose that X = (X , X , … , X ) is a random sample from the symmetric beta distribution, in which the left and right
1 2 n

parameters are equal to an unknown value c ∈ (0, ∞) . The method of moments estimator of c is
(2)
2M
U = (7.2.36)
1 − 4M (2)

Proof
Note that the mean μ of the symmetric distribution is , independently of c, and so the first equation in the method of
1

moments is useless. However, matching the second distribution moment to the second sample moment leads to the
equation
U + 1
(2)
= M (7.2.37)
2(2U + 1)

Solving gives the result.

The Pareto Distribution


The Pareto distribution with shape parameter a ∈ (0, ∞) and scale parameter b ∈ (0, ∞) is a continuous distribution on (b, ∞)

with probability density function g given by


a
ab
g(x) = , b ≤ x < ∞ (7.2.38)
a+1
x

The Pareto distribution is named for Vilfredo Pareto and is a highly skewed and heavy-tailed distribution. It is often used to
model income and certain other types of positive random variables. The Pareto distribution is studied in more detail in the
2

chapter on Special Distributions. If a > 2 , the first two moments of the Pareto distribution are μ = ab

a−1
and μ(2)
=
ab

a−2
.
Suppose now that X = (X , X , … , X
1 2 n) is a random sample of size n from the Pareto distribution with shape parameter
a > 2 and scale parameter b > 0 .

Suppose that a and b are both unknown, and let U and V be the corresponding method of moments estimators. Then
−−−−−−−−−−
M (2)
U = 1+√ (7.2.39)
(2) 2
M −M

−−−−−−−−−−
(2)
M M (2) − M 2
V = (1 − √ ) (7.2.40)
M (2)
M

Proof
The method of moments equations for U and V are
UV
= M (7.2.41)
U − 1
2
UV
(2)
= M (7.2.42)
U − 2

Solving for U and V gives the results.

As with our previous examples, the method of moments estimators are complicatd nonlinear functions of M and M , so (2)

computing the bias and mean square error of the estimator is difficult. Instead, we can investigate the bias and mean square
error empirically, through a simulation.

Run the Pareto estimation experiment 1000 times for several different values of the sample size n and the parameters a

and b. Note the empirical bias and mean square error of the estimators U and V .

When one of the parameters is known, the method of moments estimator for the other parameter is simpler.

Suppose that a is unknown, but b is known. Let U be the method of moments estimator of a. Then
b
M
Ub = (7.2.43)
M −b

Proof
If b is known then the method of moment equation for U as an estimator of b a is bU b /(U b − 1) = M . Solving for Ub gives
the result.

Suppose that b is unknown, but a is known. Let V be the method of moments estimator of b. Then a

a−1
Va = M (7.2.44)
a

1. E(V a) = b so V is unbiased.
a
2

2. var(V a) =
na(a−2)
b
so V is consistent.
a

Proof
If a is known then the method of moments equation for V as an estimator of a
b is aVa /(a − 1) = M . Solving for Va gives
(a). Next, so is unbiased. Finally,
a−1 a−1 ab
E(Va ) = E(M ) = = b Va
a a a−1
2 2
(a−1)
.
2 2
a−1 ab b
var(Va ) = ( ) var(M ) = 2
=
a a
2
na(a−2)
n(a−1) (a−2)

The Uniform Distribution


The (continuous) uniform distribution with location parameter a ∈ R and scale parameter h ∈ (0, ∞) has probability density
function g given by
1
g(x) = , x ∈ [a, a + h] (7.2.45)
h

The distribution models a point chosen “at random” from the interval [a, a + h] . The mean of the distribution is μ = a + h 1

and the variance is σ = h . The uniform distribution is studied in more detail in the chapter on Special Distributions.
2 1

12
2

Suppose now that X = (X , X , … , X ) is a random sample of size n from the uniform distribution.
1 2 n

Suppose that a and h are both unknown, and let U and V denote the corresponding method of moments estimators. Then
– –
U = 2M − √ 3T , V = 2√ 3T (7.2.46)

Proof
Matching the distribution mean and variance to the sample mean and variance leads to the equations U +
1

2
V = M and
= T . Solving gives the result.
1 2 2
V
12

As usual, we get nicer results when one of the parameters is known.

Suppose that a is known and h is unknown, and let V denote the method of moments estimator of h. Then a

Va = 2(M − a) (7.2.47)

1. E(V a) = h so V is unbiased.
2

2. var(V a
) =
h

3n
so V is consistent.
a

Proof
Matching the distribution mean to the sample mean leads to the equation a + 1

2
Va = M . Solving gives the result.
1. E(V a
) = 2[E(M ) − a] = 2(a + h/2 − a) = h

2. var(V
2
h
a
) = 4var(M ) =
3n

Suppose that h is known and a is unknown, and let U denote the method of moments estimator of a. Then h

1
Uh = M − h (7.2.48)
2

1. E(U h
) = a so U is unbiased.
h
2

2. var(U h) =
12n
h
so U is consistent.
h

Proof
Matching the distribution mean to the sample mean leads to the quation U h +
1

2
h = M . Solving gives the result.

1. E(U h) = E(M ) −
1

2
h = a +
1

2
h −
1

2
h = a

2. var(U
2
h
h
) = var(M ) =
12n

The Hypergeometric Model


Our basic assumption in the method of moments is that the sequence of observed random variables X = (X , X , … , X ) is 1 2 n

a random sample from a distribution. However, the method makes sense, at least in some cases, when the variables are
identically distributed but dependent. In the hypergeometric model, we have a population of N objects with r of the objects type
1 and the remaining N − r objects type 0. The parameter N , the population size, is a positive integer. The parameter r , the type 1
size, is a nonnegative integer with r ≤ N . These are the basic parameters, and typically one or both is unknown. Here are some
typical examples:
1. The objects are devices, classified as good or defective.
2. The objects are persons, classified as female or male.
3. The objects are voters, classified as for or against a particular candidate.
4. The objects are wildlife or a particular type, either tagged or untagged.
We sample n objects from the population at random, without replacement. Let X be the type of the i th object selected, so that
i

our sequence of observed variables is X = (X , X , … , X ) . The variables are identically distributed indicator variables, with
1 2 n

P (X = 1) = r/N for each i ∈ {1, 2, … , n} , but are dependent since the sampling is without replacement. The number of
i

type 1 objects in the sample is Y = ∑ X . This statistic has the hypergeometric distribution with parameter N , r , and n, and
n

i=1 i

has probability density function given by


r N −r
( )( ) (y) (n−y)
y n−y n r (N − r)
P (Y = y) = = ( ) , y ∈ {max{0, N − n + r}, … , min{n, r}} (7.2.49)
N y (n)
( ) N
n

The hypergeometric model is studied in more detail in the chapter on Finite Sampling Models.

As above, let X = (X 1 , X 2 , … , X n ) be the observed variables in the hypergeometric model with parameters N and r .
Then
1. The method of moments estimator of p = r/N is M = Y /n , the sample mean.
2. The method of moments estimator of r with N known is U = N M = N Y /n .
3. The method of moments estimator of N with r known is V = r/M = rn/Y if Y > 0 .
Proof
These results all follow simply from the fact that E(X) = P(X = 1) = r/N .

In the voter example (3) above, typically N and r are both unknown, but we would only be interested in estimating the ratio
p = r/N . In the reliability example (1), we might typically know N and would be interested in estimating r . In the wildlife

example (4), we would typically know r and would be interested in estimating N . This example is known as the capture-
recapture model.
Clearly there is a close relationship between the hypergeometric model and the Bernoulli trials model above. In fact, if the
sampling is with replacement, the Bernoulli trials model would apply rather than the hypergeometric model. In addition, if the
population size N is large compared to the sample size n, the hypergeometric model is well approximated by the Bernoulli
trials model.

You might also like