Chapter 14
Maximum Likelihood Estimation
1. The density of the maximum is

n[z/]n-1(1/), 0 < z < .

Therefore, the expected value is E[z] = 0 zndz = [n+1/(n+1)][n/n] = n/(n+1). The variance is found likewise,

E[z2] = 0 z2n(z/n)n-1(1/)dz = n2/(n+2) so Var[z] = E[z2] - (E[z])2 = n2/[(n + 1)2(n+2)]. Using mean
squared convergence we see that lim E[z] =  and lim Var[z] = 0, so that plim z = .
n→ n→

2. The log-likelihood is lnL = -nln - (1/)  i = 1 xi . The maximum likelihood estimator is obtained as the

i =1 xi = 0, or ˆ ML = 
x = x . The asymptotic variance of the
solution to lnL/ = -n/ + (1/2) i =1 i
i =1 xi ]}-1.
MLE is {-E[2lnL/2]}-1 = {-E[n/2 - (2/3) To find the expected value of this random variable,
we need E[xi] = . Therefore, the asymptotic variance is  /n. The asymptotic distribution is normal with mean

 and this variance.

3. The log-likelihood is lnL = nln - (+)  i =1 yi + ln  i =1 xi +  

n n n n
x ln yi -
i =1 i i =1
ln( xi !)
lnL/ = n/-  i =1 yi
The first and second derivatives are

= -  i =1 yi + i =1 xi /
n n
2lnL/2 = -n/2
2lnL/2 = -  i =1 xi /2

2lnL/ = 0.
Therefore, the maximum likelihood estimators are ˆ ML = 1/ y and ̂ = x / y and the asymptotic covariance
n /  2 0 
matrix is the inverse of E  2

n . In order to complete the derivation, we will require the
 0 x / 
i =1 i 
i =1 xi = nE[xi].
expected value of In order to obtain E[xi], it is necessary to obtain the marginal distribution
 
of xi, which is f(x) =  0
e − ( + ) y (y ) x / x ! dy =  x ( / x !)  0
e − ( + ) y y x dy. This is x(/x!) times a gamma

integral. This is f(x) = x(/x!)[(x+1)]/(+)x+1. But, (x+1) = x!, so the expression reduces to
f(x) = [/(+)][/(+)]x.
Thus, x has a geometric distribution with parameter  = /(+). (This is the distribution of the number of tries
until the first success of independent trials each with success probability 1-. Finally, we require the expected

value of xi, which is E[x] = [/(+)] x=0 x[/(+)]x= /. Then, the required asymptotic covariance
n /  2 0   2 / n 0 
matrix is  2
= .
 0 n( / ) /    0  / n

The maximum likelihood estimator of /(+) is is
 /( + ) = (1/ y )/[ x / y + 1/ y ] = 1/(1 + x ).
Its asymptotic variance is obtained using the variance of a nonlinear function
V = [/(+)]2(2/n) + [-/(+)]2(/n) = 2/[n(+)3].
The asymptotic variance could also be obtained as [-1/(1 + E[x])2]2Asy.Var[ x ].)
For part (c), we just note that  = /(+). For a sample of observations on x, the log-likelihood would
lnL = nln + ln(1-)  i =1 xi

i =1 xi /(1-).
lnL/d = n/ -
A solution is obtained by first noting that at the solution, (1-)/ = x = 1/ - 1. The solution for  is, thus,
̂ = 1 / (1 + x ).Of course, this is what we found in part b., which makes sense.
f ( x, y) e − ( + ) y (y ) x ( + ) x ( + )
For part (d) f(y|x) = = . Cancelling terms and gathering the
f ( x) x !  x
remaining like terms leaves f(y|x) = ( + )[( + ) y ] x e − ( + ) y / x ! so the density has the required form with

 

 = (+). The integral is [ x +1 ] / x ! e − y y x dy . This integral is a Gamma integral which equals
(x+1)/x+1, which is the reciprocal of the leading scalar, so the product is 1. The log-likelihood function is
lnL = nln -   i =1 yi + ln  i =1 xi - i =1 ln xi !
n n n

lnL/ = (  i =1 xi + n)/ - i =1 yi .
n n

2lnL/2 = -(  i =1 xi + n)/2.

Therefore, the maximum likelihood estimator of  is (1 + x )/ y and the asymptotic variance, conditional on
the xs is Asy.Var.  ˆ  = (2/n)/(1 + x )
 
Part (e.) We can obtain f(y) by summing over x in the joint density. First, we write the joint density

as f ( x , y ) = e − y e −y (y ) x / x! . The sum is, therefore, f ( y ) = e − y e −y (y ) x / x! . The sum is that
x =0
of the probabilities for a Poisson distribution, so it equals 1. This produces the required result. The maximum
likelihood estimator of  and its asymptotic variance are derived from
lnL = nln -   i =1 yi

i =1 yi
lnL/ = n/ -
2lnL/2 = -n/2.
Therefore, the maximum likelihood estimator is 1/ y and its asymptotic variance is 2/n. Since we found f(y)
by factoring f(x,y) into f(y)f(x|y) (apparently, given our result), the answer follows immediately. Just divide the
expression used in part e. by f(y). This is a Poisson distribution with parameter y. The log-likelihood function
and its first derivative are
lnL = -  i =1 yi + ln  i =1 xi + i =1 xi ln yi - i =1 ln xi !
n n n n

lnL/ = -  i =1 yi + i =1 xi /,
n n

from which it follows that ˆ = x / y .

4. The log-likelihood and its two first derivatives are
logL = nlog + nlog + (-1) i =1 log xi -   i = 1 xi
n n

i =1 xi
logL/ = n/ -

i =1 log xi -  i =1(log xi ) xi

n n
logL/ = n/ +

Since the first likelihood equation implies that at the maximum, ̂ = n /  i = 1 xi , one approach would be to

scan over the range of  and compute the implied value of . Two practical complications are the allowable
range of  and the starting values to use for the search.
The second derivatives are
2lnL/2 = -n/2
2lnL/2 = -n/2 -  i =1 (log xi ) 2 xi

2lnL/ = - i =1 (log xi ) xi .


If we had estimates in hand, the simplest way to estimate the expected values of the Hessian would be to evaluate
the expressions above at the maximum likelihood estimates, then compute the negative inverse. First, since the
expected value of lnL/ is zero, it follows that E[xi] = 1/. Now,
E[lnL/] = n/ + E[ i =1 log xi ] - E[ i =1 (log xi ) xi ]= 0
n n

as well. Divide by n, and use the fact that every term in a sum has the same expectation to obtain
1/ + E[lnxi] - E[(lnxi)xi]/E[xi] = 0.
Now, multiply through by E[xi] to obtain E[xi] = E[(lnxi)xi] - E[lnxi]E[xi]
or 1/() = Cov[lnxi,xi]. 

5. As suggested in the previous problem, we can concentrate the log-likelihood over . From logL/ = 0,
i =1 xi ]. Thus, we scan over different values of  to seek the value
we find that at the maximum,  = 1/[(1/n)
which maximizes logL as given above, where we substitute this expression for each occurrence of . Values
of  and the log-likelihood for a range of values of  are listed and shown in the figure below.
 logL
0.1 -62.386
0.2 -49.175
0.3 -41.381
0.4 -36.051
0.5 -32.122
0.6 -29.127
0.7 -26.829
0.8 -25.098
0.9 -23.866
1.0 -23.101
1.05 -22.891
1.06 -22.863
1.07 -22.841
1.08 -22.823
1.09 -22.809
1.10 -22.800
1.11 -22.796
1.12 -22.797
1.2 -22.984
1.3 -23.693

The maximum occurs at  = 1.11. The implied value of  is 1.179. The negative of the second derivatives
     2555
. 9.6506      .04506 −.2673
matrix at these values and its inverse are I ,  =   and I -1  ,  = 
 −.2673 .04148
  9.6506 27.7552 
The Wald statistic for the hypothesis that  = 1 is W = (1.11 - 1)2/.041477 = .276. The critical value for a test
of size .05 is 3.84, so we would not reject the hypothesis.
If  = 1, then ̂ = n / i =1 xi = 0.88496. The distribution specializes to the geometric distribution if

 = 1, so the restricted log-likelihood would be

logLr = nlog -   i =1 xi = n(log - 1) at the MLE.

logLr at  = .88496 is -22.44435. The likelihood ratio statistic is -2log = 2(23.10068 - 22.44435) = 1.3126.
Once again, this is a small value. To obtain the Lagrange multiplier statistic, we would compute
 −  2 log L /  2 −  2 log L /   log L /  
 log L /   log L /  − 2 log L /  − 2 log L / 2    log L /  
   
at the restricted estimates of  = .88496 and  = 1. Making the substitutions from above, at these values, we
would have
logL/ = 0
1 n
logL/ = n + i =1 log xi - i = 1 xi log xi = 9.400342

2logL/2 = − nx = -25.54955
1 n
2logL/2 = -n - i =1 xi (log xi ) 2 = -30.79486
2logL/ = − i = 1 xi log xi = -8.265.

The lower right element in the inverse matrix is .041477. The LM statistic is, therefore, (9.40032)2.041477 =
2.9095. This is also well under the critical value for the chi-squared distribution, so the hypothesis is not rejected
on the basis of any of the three tests.

6. a. The full log likelihood is logL =  log fyx(y,x|,).

b. By factoring the density, we obtain the equivalent logL = [ log fy|x (y|x,,) + log fx (x|)]
c. We can solve the first order conditions in each case. From the marginal distribution for x,
  log fx (x|)/ = 0
provides a solution for . From the joint distribution, factored into the conditional plus the marginal, we have

[ log fy|x (y|x,,)/ + log fx (x|)/ = 0

[ log fy|x (y|x,,)/ = 0

d. The asymptotic variance obtained from the first estimator would be the negative inverse of the expected
second derivative, Asy.Var[a] = {[-E[2 log fx (x|)/2]}-1. Denote this A-1. Now, consider the second
estimator for  and  jointly. The negative of the expected Hessian is shown below. Note that the A from
the marginal distribution appears there, as the marginal distribution appears in the factored joint distribution.

 2 ln L  B B   A 0   A + B B 

−E = + =
B   0 0   B B 
      
   
    
The asymptotic covariance matrix for the joint estimator is the inverse of this matrix. To compare this to the
asymptotic variance for the marginal estimator of , we need the upper left element of this matrix. Using the
formula for the partitioned inverse, we find that this upper left element in the inverse is

[(A+B) - (BB-1B)]-1 = [A + (B - BB-1B)]-1

which is smaller than A as long as the second term is positive.

e. (Unfortunately, this is an error in the text.) In the preceding expression, B is the cross derivative. Even if
it is zero, the asymptotic variance from the joint estimator is still smaller, being [A + B]-1. This makes
sense. If  appears in the conditional distribution, then there is additional information in the factored joint
likelhood that is not in the marginal distribution, and this produces the smaller asymptotic variance.

7. The log likelihood for the Poisson model is

LogL = -n + logi yi - i log yi!

The expected value of 1/n times this function with respect to the true distribution is

E[(1/n)logL] = - + log E0[ y ] – E0 (1/n)i logyi!

The first expectation is 0. The second expectation can be left implicit since it will not affect the solution
for  - it is a function of the true 0. Maximizing this function with respect to  produces the necessary
E0 (1/n)logL]/ = -1 + 0/ = 0

which has solution  = 0 which was to be shown.

8. The log likelihood for a sample from the normal distribution is

LogL = -(n/2)log2 - (n/2)log2 – 1/(22) i (yi - )2.

E0 [(1/n)logL] = -(1/2)log2 - (1/2)log2 – 1/(22) E0[(1/n) i (yi - )2].

The expectation term equals E0[(yi - )2] = E0[(yi - 0)2] + (0 - )2 = 02 + (0 - )2 . Collecting terms,

E0 [(1/n)logL] = -(1/2)log2 - (1/2)log2 – 1/(22)[ 02 + (0 - )2]

To see where this is maximized, note first that the term (0 - )2 enters negatively as a quadratic, so the
maximizing value of  is obviously 0. Since this term is then zero, we can ignore it, and look for the 2 that
maximizes -(1/2)log2 - (1/2)log2 – 02/(22). The –1/2 is irrelevant as is the leading constant, so we wish
to minimize (after changing sign) log2 + 02/2 with respect to 2. Equating the first derivative to zero
produces 1/2 = 02/(2)2 or 2 = 02, which gives us the result.

9. The log likelihood for the classical normal regression model is

LogL = i -(1/2)[log2 + log2 + (1/2)(yi - xi)2]

If we reparameterize this in terms of  = 1/ and  = /, then after a bit of manipulation,

LogL = i -(1/2)[log2 - log2 + (yi - xi)2]

The first order conditions for maximizing this with respect to  and  are

logL/ = n/ - i yi (yi - xi) = 0

logL/ = i xi (yi - xi) = 0

Solve the second equation for , which produces  =  (XX)-1Xy =  b. Insert this implicit solution into
the first equation to produce n/ = i yi (yi - xib). By taking  outside the summation and multiplying

the entire expression by , we obtain n = 2 i yi (yi - xib) or 2 = n/[i yi (yi - xib)]. This is an analytic
solution for  that is only in terms of the data – b is a sample statistic. Inserting the square root of this result
into the solution for  produces the second result we need. By pursuing this a bit further, you canshow that
the solution for 2 is just n/ee from the original least squares regression, and the solution for  is just b times
this solution for . The second derivatives matrix is

2logL/2 = -n/2 - iyi2

2logL/  = -i xixi

2logL/  = i xiyi.

We’ll obtain the expectations conditioned on X. E[yi|xi] is xi from the original model, which equals xi/.
E[yi2|xi] = 1/2 (xi)2 + 1/2. (The cross term has expectation zero.) Summing over observations and
collecting terms, we have, conditioned on X,

E[2logL/2|X] = -2n/2 - (1/2)XX

E[2logL/ |X] = -XX

E[2logL/ |X] = (1/)XX

The negative inverse of the matrix of expected second derivatives is

 X'X −(1/  ) X ' X 
Asy.Var[d, h] =  
 −(1/  ) ' X ' X (1/  )[2n + X ' X 

(The off diagonal term does not vanish here as it does in the original parameterization.)

The first derivatives of the log likelihood function are logL/ = -(1/22) i -2(yi - ). Equating this to zero
produces the vector of means for the estimator of . The first derivative with respect to 2 is

logL/2 = -nM/(22) + 1/(24)i (yi - )(yi - ). Each term in the sum is m (yim - m)2. We already
deduced that the estimators of m are the sample means. Inserting these in the solution for 2 and solving the
likelihood equation produces the solution given in the problem. The second derivatives of the log likelihood

2logL/ = (1/2) i -I

2logL/2 = (1/24) i -2(yi - )

2logL/22 = nM/(24) - 1/6 i (yi - )(yi - )

The expected value of the first term is (-n/2)I. The second term has expectation zero. Each term in the
summation in the third term has expectation M2, so the summation has expected value nM2. Adding gives
the expectation for the third term of -nM/(24). Assembling these in a block diagonal matrix, then taking the
negative inverse produces the result given earlier.
For the Wald test, the restriction is

H0:  - 0i = 0.

The unrestricted estimator of  is x . The variance of x is given above, so the Wald statistic is simply

( x - 0i ) Var[( x - 0i )]-1( x - 0i ). Inserting the covariance matrix given above produces the suggested

The asymptotic variance of the MLE is, in fact, equal to the Cramer-Rao Lower Bound for the variance of
a consistent, asymptotically normally distributed estimator, so this completes the argument.
In example 4.7, we proposed a regression with a gamma distributed disturbance,

yi =  + xi + i
f(i) = [P/(P)] iP-1 exp(-i), i > 0,  > 0, P > 2.

(The fact that i is nonnegative will shift the constant term, as shown in Example 4.7. The need for the
restriction on P will emerge shortly.) It will be convenient to assume the regressors are measured in
deviations from their means, so ixi = 0. The OLS estimator of  remains unbiased and consistent in this
model, with variance

Var[b|X] = 2(XX)-1

where 2 = Var[i|X] = P/2. [You can show this by using gamma integrals to verify that E[i|X] = P/ and
E[i2|X] = P(P+1)/2. See B-39 and (E-1) in Section E2.3. A useful device for obtaining the variance is (P)
= (P-1)(P-1).] We will now show that in this model, there is a more efficient consistent estimator of . (As
we saw in Example 4.7, the constant term in this regression will be biased because E[i|X] = P/; a estimates
+P/. In what follows, we will focus on the slope estimators.
The log likelihood function is

Ln L = i =1
P ln  − ln ( P) + ( P − 1) ln i − i
The likelihood equations are

 lnL/ = i [-(P-1)/i + ] = 0,
 lnL/ = i [-(P-1)/i + ]xi = 0,
 lnL/ = i [P/ - i] = 0,
 lnL/P = i [ln - (P) - i] = 0.

The function (P) = dln(P)/dP is defined in Section E2.3.) To show that these expressions have expectation
zero, we use the gamma integral once again to show that E[1/i] = /(P-1). We used the result E[lni] = (P)-
 in Example 13.5. So show that E[lnL/] = 0, we only require E[1/i] = /(P-1) because xi and i are
independent. The second derivatives and their expectations are found as follows: Using the gamma integral
once again, we find E[1/i2] = 2/[(P-1)(P-2)]. And, recall that ixi = 0. Thus, conditioned on X, we have
-E[2lnL/2] = E[i (P-1)(1/i2)] = n2/(P-2),
-E[2lnL/] = E[i (P-1)(1/i2)xi] = 0,
-E[2lnL/] = E[i (-1)] = -n,
-E[2lnL/P] = E[i (1/i)] = n/(P-1),
-E[2lnL/] = E[i (P-1)(1/i2)xixi] = Σi [2/(P-2)]xixi = [2/(P-2)](X′X),
-E[2lnL/] = E[i (-1)xi] = 0,
-E[2lnL/P] = E[i (1/i)xi] = 0,
-E[2lnL/2] = E[i (P/2)] = nP/2,
-E[2lnL/P] = E[i (1/)] = n/,
-E[2lnL/P2] = E[i (P)] = n′(P).

Since the expectations of the cross partials witth respect to  and the other parameters are all zero, it
follows that the asymptotic covariance matrix for the MLE of  is simply

Asy.Var[ ˆMLE ] = {-E[2lnL/]}-1 = [(P-2)/2](X′X)-1.

Recall, the asymptotic covariance matrix of the ordinary least squares estimator is

Asy.Var[b] = [P/2](X′X)-1.

(Note that the MLE is ill defined if P is less than 2.) Thus, the ratio of the variance of the MLE of any
element of  to that of the corresponding element of b is (P-2)/P which is the result claimed in Example 4.7.

1. a. For both probabilities, the symmetry implies that 1 – F(t) = F(-t). In either model, then,

Prob(y=1) = F(t) and Prob(y=0) = 1 – F(t) = F(-t).

These are combined in Prob(Y=y) = F[(2yi-1)ti] where ti = xi′. Therefore,

ln L = Σi ln F[(2yi-1)xi′]

f [(2 yi − 1)xi ]
lnL/ = 
b. (2 yi − 1)xi = 0
i =1
F [(2 yi − 1)xi]
where f[(2yi-1)xi′] is the density function. For the logit model, f = F(1-F). So, for the logit model,
lnL/ =  i =1 {1 − F [(2 yi − 1)xi ]}(2 yi − 1)xi = 0

Evaluating this expression for yi = 0, we get simply –F(xi′)xi. When yi = 1, the term is
[1- F(xi′)]xi. It follows that both cases are [yi - F(xi′)]xi, so the likelihood equations for the logit
model are
lnL/ =  i =1 [ yi − (xi )]xi = 0.

For the probit model, F[(2yi-1)xi′] = [(2yi-1)xi′] and f[(2yi-1)xi′] = [(2yi-1)xi′], which does
not simplify further, save for that the term 2yi inside may be dropped since (t) = (-t). Therefore,

[(2 yi − 1)xi ]
lnL/ = 
(2 yi − 1)xi = 0
i =1
[(2 yi − 1)xi ]

c. For the logit model, the result is very simple.

2lnL/′=  −  (xi )[1 −  ()]xi xi .


For the probit model, the result is more complicated. We will use the result that

d(t)/dt = -t(t).

It follows, then, that d[(t)/(t)]/dt = [-(t)/(t)][t + (t)/(t)]. Using this result directly, it follows

 [(2 yi − 1)xi ]  [(2 yi − 1)xi ] 

  (2 yi − 1)xi +  (2 yi − 1) xi xi = 0
2lnL/′= − 2

 [(2 yi − 1)xi]  [(2 yi − 1)xi ] 

i =1

This actually simplifies somewhat because (2yi-1)2 = 1 for both values of yi and [(2 yi − 1) xi  ] =
( xi  )

d. Denote by H the actual second derivatives matrix derived in the previous part. Then, Newton’s
method is
ˆ ( j )] 
−1   ln L[

ˆ ( j + 1) = ˆ ( j ) − H ˆ ( j )  
 ( j ) 

where the terms on the right hand side indicate first and second derivatives evaluated at the
“previous” estimate of .

e. The method of scoring uses the expected Hessian instead of the actual Hessian in the iterations.
The methods are the same for the logit model, since the Hessian does not involve yi. The methods
are different for the probit model, since the expected Hessian does not equal the actual one. For
the logit model

 
(xi)[1 − ()]xi xi
-[E(H)]-1 = i =1

For the probit model, we need first to obtain the expected value. Do obtain this, we take the
expected value, with Prob(y=0) = 1 -  and Prob(y=1) = . The expected value of the ith term in
the negative hessian is the expected value of the term,

 [(2 yi − 1)xi]  [(2 yi − 1) xi] 

  (2 yi − 1)xi +  xi xi
 [(2 yi − 1)xi]  [(2 yi − 1) xi ] 

Copyright © 2018 Pearson Education, Inc. 126

This is

 [xi]  [xi]   [xi]  [xi] 

[−xi]   −xi +  xi xi + [xi]   xi +  xi xi
 [−xi]  [−xi]   [xi ]  [xi ] 

 [xi] [xi] 
= [xi]  −xi + + xi +  xi xi
 [−xi] [xi] 

 [xi  ] [xi ] 
= [xi  ]  +  xi xi
 [− xi ] [ xi ] 
2 1 1 
= ( [xi  ])  +  x i x
 [− xi ] [ xi ] 
2  [ xi  ] + [ − xi  ] 
= ( [xi  ])   x i x
 [− xi ][ xi ] 
 ( [xi]) 

=  x x
 [1 −  (xi ) ( xi)  i
 

? Application 14.1 Part f.
Namelist ; x = one,age,educ,hsat,female,married $
LOGIT ; Lhs = Doctor ; Rhs = X $
Calc ; L1 = logl $
| Binary Logit Model for Binary Choice |
| Dependent variable DOCTOR |
| Number of observations 27326 |
| Log likelihood function -16405.94 |
| Number of parameters 6 |
| Info. Criterion: AIC = 1.20120 |
| Info. Criterion: BIC = 1.20300 |
| Restricted log likelihood -18019.55 |
|Variable| Coefficient | Standard Error |b/St.Er.|P[|Z|>z]| Mean of X|
---------+Characteristics in numerator of Prob[Y = 1]
Constant| 1.82207669 .10763712 16.928 .0000
AGE | .01235692 .00124643 9.914 .0000 43.5256898
EDUC | -.00569371 .00578743 -.984 .3252 11.3206310
HSAT | -.29276744 .00686076 -42.673 .0000 6.78542607
FEMALE | .58376753 .02717992 21.478 .0000 .47877479
MARRIED | .03550015 .03173886 1.119 .2634 .75861817

Matr ; bw = b(5:6) ; vw = varb(5:6,5:6) $
Matrix ; list ; WaldStat = bw'<vw>bw $
Calc ; list ; ctb(.95,2) $
LOGIT ; Lhs = Doctor ; Rhs = One,age,educ,hsat $
Calc ; L0 = logl $
Calc ; List ; LRStat = 2*(l1-l0) $
Matrix WALDSTAT has 1 rows and 1 columns.

1| 461.43784
--> Calc ; list ; ctb(.95,2) $
| Listed Calculator Results |
Result = 5.991465
--> Calc ; L0 = logl $
--> Calc ; List ; LRStat = 2*(l1-l0) $
| Listed Calculator Results |
LRSTAT = 467.336374
Logit ; Lhs = Doctor ; Rhs = X ; Start = b,0,0 ; Maxit = 0 $
| Binary Logit Model for Binary Choice |
| Maximum Likelihood Estimates |
| Model estimated: May 17, 2007 at 11:49:42PM.|
| Dependent variable DOCTOR |
| Weighting variable None |
| Number of observations 27326 |
| Iterations completed 1 |
| LM Stat. at start values 466.0288 |
| LM statistic kept as scalar LMSTAT |
| Log likelihood function -16639.61 |
| Number of parameters 6 |
| Info. Criterion: AIC = 1.21830 |
| Finite Sample: AIC = 1.21830 |
| Info. Criterion: BIC = 1.22010 |
| Info. Criterion:HQIC = 1.21888 |
| Restricted log likelihood -18019.55 |
| McFadden Pseudo R-squared .0765802 |
| Chi squared 2759.883 |
| Degrees of freedom 5 |
| Prob[ChiSqd > value] = .0000000 |
| Hosmer-Lemeshow chi-squared = 23.44388 |
| P-value= .00284 with = 8 |

h. The restricted log likelihood given with the initial results equals -18019.55. This is the log
likelihood for a model that contains only a constant term. The log likelihood for the model is
-16405.94. Twice the difference is about 3,200, which vastly exceeds the critical chi squared with
5 degrees of freedom. The hypothesis would be rejected.

Application 14.2.
The results for the Poisson model are shown in Table 14.11 with the results for the geometric model.
Poisson Regression
Dependent variable DOCVIS
Log likelihood function -104607.04078
Restricted log likelihood -108662.13583
Chi squared [ 4](P= .000) 8110.19010

Significance level .00000
McFadden Pseudo R-squared .0373184
Estimation based on N = 27326, K = 5
Inf.Cr.AIC = 209224.1 AIC/N = 7.657
Chi- squared =255585.13599 RsqP= .0802
G - squared =156175.50075 RsqD= .0494
Overdispersion tests: g=mu(i) : 22.155
Overdispersion tests: g=mu(i)^2: 22.450
| Standard Prob. 95% Confidence
DOCVIS| Coefficient Error z |z|>Z* Interval
Constant| 1.04805*** .02718 38.56 .0000 .99478 1.10132
AGE| .01840*** .00033 55.46 .0000 .01775 .01905
EDUC| -.04325*** .00173 -25.01 .0000 -.04663 -.03986
INCOME| -.52070*** .02195 -23.73 .0000 -.56371 -.47768
HHKIDS| -.16094*** .00796 -20.23 .0000 -.17653 -.14534
Partial derivatives of expected val. with
respect to the vector of characteristics.
Effects are averaged over individuals.
Observations used for means are All Obs.
Sample average conditional mean 3.1835
Scale Factor for Marginal Effects 3.1835
| Partial Standard Prob. 95% Confidence
DOCVIS| Effect Error z |z|>Z* Interval
AGE| .05858*** .00107 54.50 .0000 .05648 .06069
EDUC| -.13767*** .00552 -24.93 .0000 -.14850 -.12685
INCOME| -1.65765*** .07009 -23.65 .0000 -1.79503 -1.52027
HHKIDS| -.49998*** .02415 -20.70 .0000 -.54732 -.45264 #

Application 14.3
? Mixture of normals problem
create ; y1=rnn(1,1) ; y2 = rnn(5,1) $
create ; c = rnu(0,1) $
create ; if(c < .3)y=y1 ; (else) y=y2 $
calc ; yb = xbr(y) ; sb = sdv(y) $
calc ; yb1=.9*yb ; sb1 = .9*sb
; yb2=1.1*yb ; sb2=1.1*sb $
; labels = lambda,mu1,s1,mu2,s2
; start = .5, yb1,sb1,yb2,sb2
; fcn = log(lambda*1/s1*n01((y-mu1)/s1) + (1-lambda)*1/s2*n01((y-mu2)/s2)) $

create ; y1=rnn(1,1) ; y2 = rnn(2,1) $
create ; c = rnu(0,1) $
create ; if(c < .3)y=y1 ; (else) y=y2 $
calc ; yb = xbr(y) ; sb = sdv(y) $
calc ; yb1=.9*yb ; sb1 = .9*sb
; yb2=1.1*yb ; sb2=1.1*sb $
; labels = lambda,mu1,s1,mu2,s2
; start = .5, yb1,sb1,yb2,sb2
; fcn = log(lambda*1/s1*n01((y-mu1)/s1) + (1-lambda)*1/s2*n01((y-mu2)/s2)) $

First try uses the same starting values for the two segments. Iterations
Never get started.
|-> calc ; yb = xbr(y) ; sb = sdv(y) $
|-> maximize
; labels = lambda,mu1,s1,mu2,s2
; start = .5, yb,sb,yb,sb
; fcn = log(lambda*1/s1*n01((y-mu1)/s1) + (1-lambda)*1/s2*n01((y-
mu2)/s2)) $
Iterative procedure has converged
NOTE: Convergence in initial iterations is rarely
at a true function optimum. This may not be a
solution (especially if initial iterations stopped).
Exit from iterative procedure. 3 iterations completed.
Convergence values:
Gradient Norm: Tolerance= .1000D-05, current value= .4896D-07
Function Change: Tolerance= .0000D+00, current value= .1958D-08
Parameter Change: Tolerance= .0000D+00, current value= .1691D-07
Smallest abs. param. change from start value = .0000D+00
At least one parameter did not leave start value.
Normal exit: 3 iterations. Status=0, F= .2135387D+04
Error 143: Models - estimated variance matrix of estimates is singular
Error 447: Current estimated covariance matrix for slopes is singular

Second try with better starting values.

|-> calc ; yb1=.9*yb ; sb1 = .9*sb
; yb2=1.1*yb ; sb2=1.1*sb $
|-> maximize
; labels = lambda,mu1,s1,mu2,s2
; start = .5, yb1,sb1,yb2,sb2
; fcn = log(lambda*1/s1*n01((y-mu1)/s1) + (1-lambda)*1/s2*n01((y-
mu2)/s2)) $
Iterative procedure has converged
Normal exit: 19 iterations. Status=0, F= .1942574D+04

User Defined Optimization
Dependent variable Function
Log likelihood function -1942.57381
Estimation based on N = 1000, K = 5
Inf.Cr.AIC = 3895.1 AIC/N = 3.895
| Standard Prob. 95% Confidence
UserFunc| Coefficient Error z |z|>Z* Interval
LAMBDA| .30986*** .01608 19.27 .0000 .27835 .34138
MU1| .99937*** .06079 16.44 .0000 .88023 1.11850
S1| .88624*** .05345 16.58 .0000 .78149 .99100
MU2| 4.91473*** .04250 115.65 .0000 4.83144 4.99802
S2| .98467*** .03351 29.38 .0000 .91899 1.05036
***, **, * ==> Significance at 1%, 5%, 10% level.
Model was estimated on Aug 19, 2017 at 04:45:10 PM

Third try moves segments closer together. Results are a bit less precise.
|-> sample;1-1000$
|-> calc;ran(12345)$
|-> create ; y1=rnn(1,1) ; y2 = rnn(2,1) $
|-> create ; c = rnu(0,1) $
|-> create ; if(c < .3)y=y1 ; (else) y=y2 $
|-> calc ; yb = xbr(y) ; sb = sdv(y) $
|-> calc ; yb1=.9*yb ; sb1 = .9*sb

; yb2=1.1*yb ; sb2=1.1*sb $
|-> maximize
; labels = lambda,mu1,s1,mu2,s2
; start = .5, yb1,sb1,yb2,sb2
; fcn = log(lambda*1/s1*n01((y-mu1)/s1) + (1-lambda)*1/s2*n01((y-
mu2)/s2)) $
Iterative procedure has converged
Normal exit: 24 iterations. Status=0, F= .1461581D+04

User Defined Optimization
Dependent variable Function
Log likelihood function -1461.58073
Estimation based on N = 1000, K = 5
Inf.Cr.AIC = 2933.2 AIC/N = 2.933
| Standard Prob. 95% Confidence
UserFunc| Coefficient Error z |z|>Z* Interval
LAMBDA| .21683 .25504 .85 .3952 -.28304 .71670
MU1| .59512 .43775 1.36 .1740 -.26285 1.45310
S1| .70248*** .17154 4.10 .0000 .36627 1.03870
MU2| 1.92993*** .32142 6.00 .0000 1.29997 2.55990
S2| .93527*** .11326 8.26 .0000 .71329 1.15726
***, **, * ==> Significance at 1%, 5%, 10% level.
Model was estimated on Aug 19, 2017 at 04:46:37 PM

Chapter 15
Simulation Based Estimation and
Inference and Random Parameter
1. Exponential: The pdf is f(x) = exp(-x). The CDF is
x  1  1 
F ( x) =   exp(−t )dt =   − exp(−x) −  − exp(−0)   = 1 − exp(−x).
    
We would draw observations from the U(0,1) population, say Fi, and equate these to F(xi). Inverting the
function, we find that 1-Fi = exp(-xi), or –(1/)ln(1-Fi) = xi.

2. Weibull. If the survival function is S(x) = pexp[-(x)p], then we may equate random draws from the uniform
distribution, Si to this function (a draw of Si is the same as a draw of Fi = 1-Si). Solving for xi, we find
lnSi = ln(p) – (x)p, so xi = (1/)[ln(p) – lnSi]1/p.

3. The derivative of the simulated sum of squares is

S (, )   x  
= i  − 2  yit − (1 / R) r h( xit  + u vir )   (1 / R) r h( xit  + u vir )  it  
  t
  vir  
 

To estimate the asymptotic covariance matrix, we rely can rely on the results for the nonlinear least squares
estimator. The gradient is of the form it -2eit xit0. The approach taken in Theorem 7.2 would be as follows:

( )
ˆ 2 = 
nT i
 t
 yit − (1 / R) h xit ˆ + ˆ u vir  . The counterpart to (XX)-1 is
 r 
 
   xit    xit   
 i   (1 / R) r h ( xit  + u vir )    (1 / R) r h ( xit  + u vir )    
   
 vir    vir   

 

? Application 15.1. Monte Carlo Simulation
? Set seed of RNG for replicability
Calc ; Ran(123579) $
? Sample size is 50. Generate x(i) and z(i) held fixed
Sample ; 1 - 50 $
Create ; xi = rnn(0,1) ; zi = rnn(0,1) $
Namelist ; X = one,xi,zi ; X0 = one,xi $
? Moment Matrices
Matrix ; XXinv = <X'X> ; X0X0inv = <X0'X0> $
Matrix ; Waldi = init(1000,1,0) $
Matrix ; LMi = init(1000,1,0) $

? Procedure studies the LM statistic
Proc = LM (c) $
? Three kinds of disturbances
Create ?; Eps = Rnt(5) ? Nonnormal distribution
; vi=exp(.2*xi) ; eps = vi*rnn(0,1) ? Heteroscedasticity
?;eps= Rnn(0,1) ? Standard normal distribution
; y = 0 + xi + c*zi +eps $
Matrix ; b0 = X0X0inv*X0'y $
Create ; e0 = y - X0'b0 $
Matrix ; g = X'e0 $
Calc ; lmstat = qfr(g,xxinv)/(e0'e0/n) ; i = i + 1 $
Matrix ; Lmi (i) = lmstat $
EndProc $
Calc ; i = 0 ; gamma = -1 $
Exec ; Proc=LM(gamma) ; n = 1000 $
create;LMv=lmi $
Calc ; List ; Type1 = xbr(reject) ; pwr = 1-Type1 $

? Procedure studies the Wald statistic
Calc ; type = 1 or 2 or 3 … set this before you run the program. $
Proc = Wald(c) $
Create ; if(type=1)Eps = Rnn(0,1) ? Standard normal distribution
; if(type=2)vi=exp(.2*xi) ? eps = vi*rnn(0,1) ? Heteroscedasticity
; if(type=3)eps= Rnt(5) ? Nonnormal distribution
; y = 0 + xi + c*zi +eps $
Matrix ; b0=XXinv*X'y $
Create ; e0=y-X'b0$
Calc ; ss0 = e0'e0/(47)
; v0 = ss0*xxinv(3,3)
; wald0=(b0(3))^2/v0
; i=i+1 $
Matrix ; Waldi(i)=Wald0 $
EndProc $
? Set the values for the simulation
Calc ; i = 0 ; gamma = 0 ; type=1 $
Sample ; 1-50 $
Exec ; Proc=Wald(gamma) ; n = 1000 $
create;Waldv=Waldi $
create;reject=Waldv > 3.84$

Calc ; List ; Type1 = xbr(reject) ; pwr = 1-Type1 $

To carry out the simulation, execute the procedure for different values of “gamma” and “type.” Summarize
the results with a table or plot of the rejection probabilities as a function of gamma.

Application 15.2
regress ; lhs = lwage
matrix ; be=b(1:2) ; ve = varb(1:2,1:2) $
? Delta method
wald ; fn1 = -c1/(2*c2)
; parameters=be
; covariance=ve
; labels = c1,c2 $

? Bootstrapping $
proc $
regress ; quietly ; lhs = lwage
calc ; mx = -b(1)/(2*b(2)) $
endproc $
exec;n=100 ; bootstrap = mx $

Ordinary least squares regression ............
LHS=LWAGE Mean = 6.67635
Standard deviation = .46151
---------- No. of observations = 4165 DegFreedom Mean square
Regression Sum of Squares = 373.138 11 33.92168
Residual Sum of Squares = 513.767 4153 .12371
Total Sum of Squares = 886.905 4164 .21299
---------- Standard error of e = .35172 Root MSE .35122
Fit R-squared = .42072 R-bar squared .41919
Model test F[ 11, 4153] = 274.20378 Prob F > F* .00000
| Standard Prob. 95% Confidence
LWAGE| Coefficient Error z |z|>Z* Interval
EXP| .03991*** .00217 18.36 .0000 .03565 .04417
EXPSQ| -.00067*** .4776D-04 -14.10 .0000 -.00077 -.00058
Constant| 5.22697*** .07170 72.90 .0000 5.08644 5.36749
WKS| .00431*** .00109 3.96 .0001 .00217 .00644
OCC| -.14531*** .01474 -9.86 .0000 -.17420 -.11642
IND| .04986*** .01187 4.20 .0000 .02660 .07311
SOUTH| -.06754*** .01251 -5.40 .0000 -.09206 -.04302
SMSA| .14010*** .01205 11.62 .0000 .11647 .16372
MS| .06387*** .02061 3.10 .0019 .02348 .10426
FEM| -.38103*** .02521 -15.12 .0000 -.43043 -.33163
UNION| .08833*** .01287 6.86 .0000 .06310 .11356
ED| .05786*** .00263 22.04 .0000 .05272 .06301
WALD procedure. Estimates and standard errors for nonlinear functions and
joint test of nonlinear restrictions.
Wald Statistic = 2001.26761
Prob. from Chi-squared[ 1] = .00000
Functions are computed at means of variables
| Standard Prob. 95% Confidence
WaldFcns| Function Error z |z|>Z* Interval

Fncn(1)| 29.6227*** .66217 44.74 .0000 28.3249 30.9205
Completed 100 bootstrap iterations.
Results of bootstrap estimation of model.
Model has been reestimated 100 times.
The statistics shown below are centered
around the original estimate based on
the original full sample of observations.
Result is MX = 29.62271
Bootstrap samples have 4165 observations.
Estimate RtMnSqDev Skewness Kurtosis
29.62271 .77585 1.18455 5.29601
Minimum = 28.14656 Maximum = 32.90254
| Standard Prob. 95% Confidence
BootStrp| Coefficient Error z |z|>Z* Interval
MX| 29.6227*** .77585 38.18 .0000 28.1021 31.1433

Chapter 16
Bayesian Estimation and Inference
a. The likelihood function is

exp(−) yi 1
 i =1 f ( yi | ) =  i =1 = exp(−n) i yi  i =1
n n n
L(y|) = .
( yi + 1) ( yi + 1)

b. The posterior is

p( y1 ,..., yn | ) p()
p( | y1 ,..., yn ) = 
 0
p( y1 ,..., yn | ) p()d 

The product of factorials will fall out. This leaves

exp( − n) i yi (1/  )

p ( | y1 ,..., yn ) = 
 0
exp(− n ) i yi (1/  ) d 

exp( − n) (
 i yi ) −1
= 
exp(− n) (
 i yi ) −1
 0
exp( − n) ny −1
= 
 0
exp(− n) ny −1d 
n ny exp(− n ) ny −1
= .
 (ny )

where we have used the gamma integral at the last step. The posterior defines a two parameter gamma
distribution, G(n, ny ).

c. The estimator of  is the mean of the posterior. There is no need to do the integration. This falls simply
out of the posterior density, E[|y] = ny /n = y .

d. The posterior variance also drops out simply; it is ny /n2 = y /n.

 Ki  Fi Ki − Fi
a. p(Fi|Ki,) =    (1 − ) so the log likelihood function is
 i
 Ki 
ln L( | y ) =  i =1 ln   + Fi ln  + ( Ki − Fi )ln(1 − )

 F, 
The MLE is obtained by setting lnL(|y)/ = Σi [Fi/ - (Ki-Fi)/(1-)] = 0. Multiply both sides by (1-) to

Σi [(1-)Fi -  (Ki-Fi)] = 0

A line of algebra reveals that the solution is  = (ΣiFi)/(ΣiKi) = 0.651596.

 n  K i  Fi  ( a + b) a −1
 i =1
K i − Fi
   (1 − )   (1 − )b −1
  i
F   ( a )  ( b )
b. The posterior density is
 n  K i  Fi  (a + b) a −1
 i =1

K i − Fi
   (1 − )   (1 − )b −1 d 
  i
F   ( a )  ( b )
This simplifies considerably. The combinatorials and gamma functions fall out, leaving

 n  Fi (1 − ) Ki − Fi  a −1 (1 − )b −1
 (1 − )   (1 − )
i Fi i ( Ki − Fi ) a −1 b −1

p ( | y ) =  i =1  =
  i (1 − ) i i  a −1 (1 − )b −1 d 
1 1
  i Fi (1 − )i ( Ki − Fi )  a −1 (1 − )b −1 d 
n F K − F
0  i =1  0

 ( i Fi ) + ( a −1)
(1 − ) 
[ i ( Ki − Fi )]+ ( b −1)

= 1
 0
( i Fi ) + ( a −1) (1 − )i ( Ki − Fi )]+ ( b −1)  d 

The denominator is a beta integral, so the posterior density is

[(i Fi ) + (a − 1)][(i ( Ki − Fi )) + (b − 1)] ( i Fi ) + ( a −1)
p ( | y ) =  (1 − )[ i ( Ki − Fi )]+ ( b −1) 
[(i Fi ) + (a − 1) + (i ( Ki − Fi )) + (b − 1)] 

The denominator simplifies slightly;

[(i Fi ) + (a − 1)][(i ( Ki − Fi )) + (b − 1)] ( i Fi ) + ( a −1)

p ( | y ) =  (1 − )[ i ( Ki − Fi )]+ ( b −1) 
[(i Ki ) + (a − 1) + (b − 1)]
[(a + i Fi ) − 1)][(b + i ( Ki − Fi )) − 1)] ( a +i Fi ) −1
=  (1 − )[b +i ( Ki − Fi )]−1 
[(a + b) + (i K i ) − 1 − 1)]

c-e. The posterior distribution is a beta distribution with parameters a*=(a+Σ iFi) and b*=[b+Σi(Ki-Fi)].
The mean of this beta random variable is a*/(a*+b*) = (a+ΣiFi)/(a+b+ΣiKi). In the data, Σi = 49 and ΣiKi =
75. For the values given, the posterior means are
(a=1,b=1): Result = .647668
(a=2,b=2): Result = .643939
(a=1,b=2): Result = .639386

