Statistical Inference II: Mohammad Samsul Alam

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 31

Statistical Inference II

(Lecture Material III)

Mohammad Samsul Alam


Assistant Professor of Applied Statistics
Institute of Statistical Research and Training (ISRT)
University of Dhaka

https://www.isrt.ac.bd/people/msalam

M. S. Alam (ISRT,DU) Methods of Estimation: REML msalam@isrt.ac.bd 1 / 31


Maximum Likelihood Estimation I

Suppose our interest is to estimate the σ 2 from the observed values x1 , x2 , . . . , xn


of X1 , X2 , . . . , Xn where

Xi ∼ N (µ, σ 2 ),

with both µ and σ 2 are unknown.


Now the log-likelihood is,
Pn
n n i=1 (xi − µ)2
l (µ, σ 2 ) = − log(2π) − log σ 2 − (1)
2 2 2σ 2
First derivative of (1) w.r.t {µ, σ 2 } separately yields (verify),

µ̂ = x̄
n
1X
σ̂ 2 = (xi − x̄ )2
n i=1

M. S. Alam (ISRT,DU) Methods of Estimation: REML msalam@isrt.ac.bd 2 / 31


Maximum Likelihood Estimation II

Now, from two variate calculus, to verify that a function H (θ1 , θ2 ) has a lo-
cal maximum at θˆ1 , θˆ2 , it must be shown that the following three conditions
hold,
1. The first-order partial derivatives are 0,
δ δ
H (θ1 , θ2 )|θ1 = θ̂1 , θ2 = θ̂2 = 0 and H (θ1 , θ2 )|θ1 = θ̂1 , θ2 = θ̂2 = 0
δθ1 δθ2
2. At least one second-order partial derivative is negative

δ2 δ2
2 H (θ1 , θ2 )|θ1 = θ̂1 , θ2 = θ̂2 < 0 or 2 H (θ1 , θ2 )|θ1 = θ̂1 , θ2 = θ̂2 < 0
δθ1 δθ2

3. The Jacobian of the second-order partial derivatives is positive,

δ22 H (θ1 , θ2 ) δ2

δθ1 δθ2 H (θ1 , θ2 )

δθ1
δ2 δ 2
δθ δθ H (θ1 , θ2 ) δθ22
H (θ1 , θ2 )
1 2 θ1 =θ̂1 ,θ2 =θ̂2
δ2 δ2 δ2
 
= 2 H (θ1 , θ2 ) 2 H (θ1 , θ2 ) − H (θ1 , θ2 ) |θ1 =θ̂1 ,θ2 =θ̂2 > 0
δθ1 δθ2 δθ1 δθ2
M. S. Alam (ISRT,DU) Methods of Estimation: REML msalam@isrt.ac.bd 3 / 31
Maximum Likelihood Estimation III

For Normal likelihood, as we obtained µ̂ and σ̂ 2 by equating the first deriva-


tives of l (µ, σ 2 ) respectively with µ and σ 2 to 0 the first condition holds in
this case.
It can easily be shown that (verify)

δ2 2 −n −n 2
l (µ, σ )|µ=µ̂,σ 2 =σ̂ 2 = = P n <0
δθ2 σ̂ 2 i=1 (xi − x̄ )
2
n
δ2 2 n 1 X −n
2 2
l (µ, σ )| 2
µ=µ̂,σ =σ̂ 2 = 4
− 6
(xi − x̄ )2 = <0
δ(σ ) 2σ̂ σ̂ i=1 2σ̂ 4

Again, with
n
δ2 2 1 X
l (µ, σ ) = − (xi − µ)|µ=µ̂,σ2 =σ̂2 = 0,
δθδσ 2 σ 4 i=1

it can be shown that the property three also holds for µ̂ and σˆ2 .

M. S. Alam (ISRT,DU) Methods of Estimation: REML msalam@isrt.ac.bd 4 / 31


Maximum Likelihood Estimation IV

Now, after a little algebra (verify), it can be shown that

E (µ̂) = µ
(n − 1) 2
E (σ̂ 2 ) = σ
n
It can also be shown that
 
n
E σ̂ 2 = σ2 ,
(n − 1)

which leads to conclude that


n
1 X
(xi − x̄ )2
(n − 1) i=1

is an unbiased estimate of the population variance σ 2 .

M. S. Alam (ISRT,DU) Methods of Estimation: REML msalam@isrt.ac.bd 5 / 31


Maximum Likelihood Estimation V

Now let us do the following algebra,


n
1X
E (σ̂ 2 ) = E ( (Xi − X̄ )2 )
n i=1
1 
= E (X − AX)T (X − AX) ,

n
where
X is a n × 1 vector of the random variables X1 , X2 , . . . , Xn
A is a n × n defined as A = (1/n)Jn , where Jn is a n × n matrix of 1’s.

M. S. Alam (ISRT,DU) Methods of Estimation: REML msalam@isrt.ac.bd 6 / 31


Maximum Likelihood Estimation VI

Then we can write,


1 
E (σ̂ 2 ) = E (X − AX)T (X − AX)

n
1 
= E XT X − XT AX

(A is a Hermitian matrix1 )
n
1  1 
= E XT X − E XT AX

n n
1 2  2 1
= σ E χ (n) + µ2 − σ 2 E χ2 (1) − µ2 (Why?)

n n
2 1 2
=σ − σ
n
n −1 2
= σ
n
The second term makes the estimator biased. Moreover, the bias is down-
ward for which rank (A) = 1 is responsible.

M. S. Alam (ISRT,DU) Methods of Estimation: REML msalam@isrt.ac.bd 7 / 31


Maximum Likelihood Estimation VII

In general, it can be shown that when rank (A) = p, we will have


n −p 2
E (σ̂ 2 ) = σ .
n

Theorem
If y ∼ N (0, IN ), and A is an orgonal projection, then yT Ay ∼ χ2(k ) with
k = ranlk(A).

If A is idempotent, its eigenvalues satisfy λ2 = λ. If A is also Hermitian,


its eigen values are real. Hence, in eigen decomposition A = QΛQ−1 =
QΛQT , Λ is a diagonal matrix containing either 0 or 1.
(1) When A is full-rank, Λ = IN , A has to be IN , and yT Ay = yT y. Then the
result follows immediately from the definition of chi-square distribution.

M. S. Alam (ISRT,DU) Methods of Estimation: REML msalam@isrt.ac.bd 8 / 31


Maximum Likelihood Estimation VIII

(2) When A is of rank k < N ,


 
T T T T Ik 0
y Ay = y QΛQ y = W W = WkT Wk ,
0 0

where W = QT y, and Wk denotes the vector containing the first k ele-


ments of W. Since, Q is orthogonal, W ∼ N (0, IN ), so Wk ∼ N (0, Ik ).
The result again follows directly from the definition of chi-square distribu-
tion. The following two properties of a Hermitian matrix have been used
in the above proof.
(a) Hermitian matrix has real eigenvalues
(b) Eigenvectors of a real symmetric matrix are orthogonal which implies that
Q −1 = Q T
The inferential problem we had can also be restated, in terms of regression
model, as,

Y = Xµ + ;  ∼ MVN (0, σ 2 I), (2)

where X is a n × 1 matrix of 1. (note that we have changed the notations


here to keep consistency with the usual notations of regression analysis)
M. S. Alam (ISRT,DU) Methods of Estimation: REML msalam@isrt.ac.bd 9 / 31
Maximum Likelihood Estimation IX

Now it can be shown that H = X(XT X)−1 XT , popularly known as hat


matrix in regression analysis.
This matrix is a Hermitian matrix as a consequance I−H is also a Hermitian
matrix which has rank equals to n − p.
For a model specified as equation (2), the mle of µ can be obtained as,

µ̂ = (XT X)−1 XT Y. (3)

It can be shown that the estimator in (3) is unbiased for population mean
µ.
This matrix has great uses, one of them is to estimate the residuals as,

ˆ = Y − Xµ̂ = Y − Ȳ = (I − H)Y.

1A matrix whose conjugate transpose yields that matrix


1A matrix whose conjugate transpose yields that matrix
M. S. Alam (ISRT,DU) Methods of Estimation: REML msalam@isrt.ac.bd 10 / 31
Restricted Maximum Likelihood Estimation (REML) I

The restricted maximum liklihood estimation is also known as residual max-


imum likelhood estimation and reduced maximum likelhood estimation.
The REML was developed to achieve an unbiased estimate of the scale
parameters under study.
The idea is to partitioning the likelihood into two components: one of which
involves with location parameters (µ) ( as well as the scale parameters σ 2 )
and the other one involves only the scale parameter σ 2 .
The former is known as likelihood and the later is termed as residual like-
lihood.
The estimation is carried out by optimising the likelihood for location pa-
rameters and residual likelihood for scale parameters.
Patterson and Thompson (1971) introduced the REML estimation as an
alternative of ML estimation for variance component.
This approach is based on the idea of optimizing the error contrasts for
estimating the parameters involved in variance component.

M. S. Alam (ISRT,DU) Methods of Estimation: REML msalam@isrt.ac.bd 11 / 31


Restricted Maximum Likelihood Estimation (REML) II

In words, error contrasts are the linear combination of the observations for
which mean is zero i.e. Z = AY such that E (Z) = 0.
Since E (Z) = 0, the distribution of error constrasts (Z) does not depend
on the parameters β.
It is always possible to find a suitable matrix A for finding the error contrast
without knowing the true values of β and θ.
For example, in case of ordinary least square residuals,
−1 T
A = I − X XT X X

has the required properties.


Because Z is a linear transformation of Y, it retains a multivariate Gaussian
distribution.
However, the constraint imposed by the requirement that the distribution
of Z must not depend on β reduces the effective dimensionality of Z from
n to n − p, where p is the number of elements of β.

M. S. Alam (ISRT,DU) Methods of Estimation: REML msalam@isrt.ac.bd 12 / 31


Restricted Maximum Likelihood Estimation (REML) III

Therefore, the REML estimator for θ by maximizing the profile likelihood


for θ based on the transformed data Z.
REML estimation is proposed to estimate the variance parameter mainly.
Suppose our interest is to estimate the σ 2 from the observed values y1 ,
y2 ,. . .,yn of Y1 , Y2 , . . . , Yn where

Yi ∼ N (µ, σ 2 ),

with both µ and σ 2 are unknown.


The above scenario can be presented in matrix notation as the way de-
scribed in equation (2).
With the specification in equation (2) we can compute the error contrasts
as
h −1 T i
Z = I − X XT X X Y (4)

Therefore, as linear combination of Normally distributed random variable


Y the random variable Z will have a Normal distribution with mean
−10 and
covariance matrix (I − H)Cov(Y)(I − H)T , where H = X XT X XT .
M. S. Alam (ISRT,DU) Methods of Estimation: REML msalam@isrt.ac.bd 13 / 31
Restricted Maximum Likelihood Estimation (REML) IV

As the rank of (I − H) is (n − 1) we will have to use any of the (n − 1)


element of Z.
With X, a (n × 1) column matrix of 1, we have
 
1 − (1/n) −(1/n) . . . −(1/n)
 −(1/n) 1 − (1/n) . . . −(1/n)
h −1 T i  
I − X XT X X =

.. .. .. .. 
 . . . . 
−(1/n) −(1/n) . . . 1 − (1/n)

Now, let us define a new matrix A by taking the first (n − 1) columns of


(I − H) and Z = AT Y.
As Y follows multivariate normal distribution, the distribution of Z is
also multivariate normal with mean vector µZ = 0 and covariance matrix
ΣZ = AΣY AT , where ΣY = Cov(Y) = σ 2 I.
Then

M. S. Alam (ISRT,DU) Methods of Estimation: REML msalam@isrt.ac.bd 14 / 31


Restricted Maximum Likelihood Estimation (REML) V

ΣZ =AΣY AT
=σ 2 AIAT
 1 − (1/n) −(1/n) ... −(1/n)
 1 − (1/n) −(1/n) ... −(1/n)
T
−(1/n) 1 − (1/n) ... −(1/n) −(1/n) 1 − (1/n) ... −(1/n)

=σ 2  . . . . . . . .
  
. . . . . . . .
. . . .
 . . . .

−(1/n) −(1/n) ... 1 − (1/n) −(1/n) −(1/n) ... 1 − (1/n)
−(1/n) −(1/n) ... −(1/n) −(1/n) −(1/n) ... −(1/n)
 
1 − (1/n) −(1/n) . . . −(1/n)
 −(1/n) 1 − (1/n) . . . −(1/n) 
=σ 2 
 
.. .. .. .. 
 . . . . 
−(1/n) −(1/n) . . . 1 − (1/n) (n−1)×(n−1)
2
=σ [I(n−1) − (1/n)Jn−1 ]
=σ 2 [I(n−1) + (−1/n)Jn−1 ]

where Jn−1 is a (n − 1) × (n − 1) matrix of 1.

M. S. Alam (ISRT,DU) Methods of Estimation: REML msalam@isrt.ac.bd 15 / 31


Restricted Maximum Likelihood Estimation (REML) VI

The log-likelihood function for Z is

(n − 1) 1 1
l (σ 2 ) = − log(2π) − log |ΣZ | − ZT Σ−1Z Z (5)
2 2 2
−1
Here Σ−1 2

Z = (1/σ ) I(n−1) + (−1/n)Jn−1 is computed using the follow-
ing lemma,

Lemma
If A and A + B are invertible, and B has rank 1, then for g = trace(BA−1 )6= 1
we can write,
1
(A + B )−1 = A−1 − A−1 BA−1
1+g

For our case, g = trace((−1/n)Jn−1 I−1 ) = −(n − 1)/n, therefore g + 1 =


1 − (n − 1)/n = 1/n.

M. S. Alam (ISRT,DU) Methods of Estimation: REML msalam@isrt.ac.bd 16 / 31


Restricted Maximum Likelihood Estimation (REML) VII

So we can write,
−1
Σ−1 2

Z = (1/σ ) I(n−1) + (−1/n)Jn−1
 
2 1
= (1/σ ) I(n−1) − I(n−1) (−1/n)Jn−1 I(n−1)
1+g
= (1/σ 2 ) I(n−1) − n(−1/n)Jn−1
 

= (1/σ 2 ) I(n−1) − (−1)Jn−1


 
   

 1 0 ... 1 −1 −1 . . . −1 

 0 1 . . . 0   −1 −1 . . . −1
 

= (1/σ 2 )  . . . ..  −  ..
   
. . . .. .. .. 



 . . . .   . . . . 


0 0 ... 1 −1 −1 . . . −1
 
 
2 1 ... 1
 1 2 ... 1 
= (1/σ 2 )  . . .
 
 .. .. . . ... 

1 1 ... 2

M. S. Alam (ISRT,DU) Methods of Estimation: REML msalam@isrt.ac.bd 17 / 31


Restricted Maximum Likelihood Estimation (REML) VIII

In this case, our transformed data look like

Z = AT Y
 T
1 − (1/n) −(1/n) ... −(1/n)  
 −(1/n) 1 − (1/n) y1
... −(1/n) 

.. .. ..
  y2 
=
 ..   
. . . .   .. 

 −(1/n)
  . 
−(1/n) ... 1 − (1/n) 
yn
−(1/n) −(1/n) ... −(1/n)
  
1 − (1/n) −(1/n) ... −(1/n) −(1/n) y1
 −(1/n) 1 − (1/n) ... −(1/n) −(1/n)  y2 
=
  
.. .. .. .. ..  .. 
 . . . . .  . 
−(1/n) −(1/n) ... 1 − (1/n) −(1/n) yn
 
y1 − ȳ
 y2 − ȳ 
=
 
.. 
 . 
yn−1 − ȳ
M. S. Alam (ISRT,DU) Methods of Estimation: REML msalam@isrt.ac.bd 18 / 31
Restricted Maximum Likelihood Estimation (REML) IX

so,
 T   
y1 − ȳ 2 1 ... 1 y1 − ȳ
 y2 − ȳ   1 2 ... 1  y2 − ȳ 
−1
ZT ΣZ Z =  (1/σ 2 ) 
    
.. .. .. .. ..  .. 
 .   . . . .  . 
yn−1 − ȳ 1 1 ... 2 yn−1 − ȳ
 T  
y1 − ȳ 2(y1 − ȳ) + (y2 − ȳ) + . . . + (yn−1 − ȳ)
 y2 − ȳ   (y1 − ȳ) + 2(y2 − ȳ) + . . . + (yn−1 − ȳ) 
=(1/σ 2 ) 
   
..   .. 
 .   . 
yn−1 − ȳ (y1 − ȳ) + (y2 − ȳ) + . . . + 2(yn−1 − ȳ)
"n−1 ( n
)#
X X
2
=(1/σ ) (yi − ȳ) 2(yi − ȳ) + (yj − ȳ) − (yi − ȳ) − (yn − ȳ)
i=1 j =1
"n−1 n−1
#
X X
2 2
=(1/σ ) (yi − ȳ) − (yn − ȳ) (yi − ȳ)
i=1 i=1

M. S. Alam (ISRT,DU) Methods of Estimation: REML msalam@isrt.ac.bd 19 / 31


Restricted Maximum Likelihood Estimation (REML) X

"n−1 ( n )#
X X
2 2
=(1/σ ) (yi − ȳ) − (yn − ȳ) yi − yn − (n − 1)ȳ
i=1 i=1
"n−1 #
X
=(1/σ 2 ) (yi − ȳ)2 − (yn − ȳ) {n ȳ − yn − n ȳ + ȳ}
i=1
" n #
X
2 2
=(1/σ ) (yi − ȳ)
i=1

and

|ΣZ | = |σ 2 [I(n−1) + (−1/n)Jn−1 ]|

From the properties of the determinant, we know that, for any constant k
and a n × n square matrix A, det(kA) = k n det(A). Hence, we can write,

|ΣZ | = (σ 2 )n−1 |[I(n−1) + (−1/n)Jn−1 ]|

M. S. Alam (ISRT,DU) Methods of Estimation: REML msalam@isrt.ac.bd 20 / 31


Restricted Maximum Likelihood Estimation (REML) XI

Finally, the log-likelihood in equation (5) can be written in simplified form


as,

(n − 1) n −1 1
l (σ 2 ) = − log(2π) − log(σ 2 ) − log |[I(n−1) + (−1/n)Jn−1 ]|
2 2 2
n
1 X 2
− 2 (yi − ȳ)
2σ i=1

Now, by taking differentiation w.r.t. σ 2 , we found


Pn 2
d n −1 (yi − ȳ)
l (σ 2 ) = − + i=1
d σ2 2σ 2 2(σ 2 )2

M. S. Alam (ISRT,DU) Methods of Estimation: REML msalam@isrt.ac.bd 21 / 31


Restricted Maximum Likelihood Estimation (REML) XII

equating the right hand side to zero yields,


Pn 2
n −1 i=1 (yi − ȳ)
− + 2 = 0
2σ̂ 2

reml 2 σ̂ 2
reml
Pn 2
(yi − ȳ) n −1
⇒ i=1 2 =
2σ̂ 2

2 σ̂ 2 reml
reml
Pn 2
2 (yi − ȳ)
⇒σ̂reml = i=1
n −1

M. S. Alam (ISRT,DU) Methods of Estimation: REML msalam@isrt.ac.bd 22 / 31


Restricted Maximum Likelihood Estimation (REML) XIII

Reduced Maximum Likelihood Estimation (REML)

Now the log-likelihood is,


Pn
2 n n 2 i=1 (xi − µ)2
l (µ, σ ) = − log(2π) − log σ − 2
2 2 Pn 2σ
n n i=1 i −
(x x̄ + x̄ − µ)2
=− log(2π) − log σ 2 −
2 2 Pn 2σ 2
n n (xi − x̄ )2 + n(x̄ − µ)2
=− log(2π) − log σ 2 − i=1 2
2 2 2σP
2 n 2
n n n(x̄ − µ) i=1 (xi − x̄ )
=− log(2π) − log σ 2 − −
2 2 2σ 2 2σ 2

M. S. Alam (ISRT,DU) Methods of Estimation: REML msalam@isrt.ac.bd 23 / 31


Restricted Maximum Likelihood Estimation (REML) XIV

Therefore, we can write

1 1 σ2 n(x̄ − µ)2
l µ; σ 2 = − log(2π) − log


2 2 n 2σ 2 Pn
n −1 n 2 1 σ2 (xi − x̄ )2
− log(2π) − log σ + log − i=1 2
 2 2 2 n 2σ
1 σ2 n(x̄ − µ)2
= − constant + log +
2 n 2σ 2
 Pn 2

n 2 1 n i=1 (xi − x̄ )
− constant + log σ + log 2 +
2 2 σ 2σ 2
So, the first part of the liklihood function l (µ; σ 2 ) can be maximized
w.r.t. µ
and the second part can be maximized w.r.t. σ 2 .
Further, the second part is independent of µ and its maximum occurs at
Pn
2 (xi − x̄ )2
σ = i=1 .
n −1

M. S. Alam (ISRT,DU) Methods of Estimation: REML msalam@isrt.ac.bd 24 / 31


Restricted Maximum Likelihood Estimation (REML) XV

However, the former part of the liklihood function l (µ; σ 2 ) is the liklihood
for sample mean x̄ asymptotically.
In addition, x̄ which is the ordinary least square (OLS) as well as the MLE
(in Gaussian case) of µ.
1 n
The extra term 2 log σ2 in the later part of l (µ; σ 2 ) is related to the variance
of x̄ .
Moreover, this part distinguish the likelihood used in REML estimation
than the likelihood used in MLE.

M. S. Alam (ISRT,DU) Methods of Estimation: REML msalam@isrt.ac.bd 25 / 31


Profile Likelihood I

Let us suppose that the unknown parameters θ can be partitioned as θ =


{ψ, λ}, where ψ is a p−dimensional vector of interested parameters and λ
is a q−dimensional vector of nuisance parameters.
ˆ λ}
Though, estimation involves finding values {ψ, ˆ at which the log-likelihood
ˆ
function l (ψ, λ) is maximum, our main focus on the ψ.
ˆ λ}
To find {ψ, ˆ one can easily use the following fact,

n
X
arg max l (ψ, λ) = arg max log f (xi ; ψ, λ),
ψ,λ ψ,λ i=1

where {xi }ni=1 is the observed values of the random sample X1 , X2 , . . . , Xn


from fX (x ; θ).
However, the above procedures becomes difficult in some situations. Alter-
native of the above, a solution is the use of profile likelihood.

M. S. Alam (ISRT,DU) Methods of Estimation: REML msalam@isrt.ac.bd 26 / 31


Profile Likelihood II

The profile likelihood starts with the assumption that ψ is known and
considers the following maximization

λ̂ψ = arg max lψ (λ),


λ

where lψ (λ) is l (ψ, λ) at a given value of ψ.


This method then finds ψ̂ as

ψ̂ = arg max lψ (λ̂ψ ) = arg max l (ψ, λ̂ψ )


ψ ψ

to estimate the parameter of interest ψ.


For example, consider the estimation of {µ, σ 2 } based on a random sample
X1 , X2 , . . . , Xn from N (µ, σ 2 ).
The profile likelihood method first assumes that µ is known and maximize
the following log-likelihood
n
X
lµ (σ 2 ) = −(n/2) log(2π) − (n/2) log σ 2 − (2σ 2 )−1 (xi − µ)2 (6)
i=1
M. S. Alam (ISRT,DU) Methods of Estimation: REML msalam@isrt.ac.bd 27 / 31
Profile Likelihood III

Maximization of (6) w.r.t. σ 2 assuming µ in known yields,


n
1X
σ̂µ2 = (xi − µ)2 .
n i=1

Therefore, profile log-likelihood of µ is

lµ (σ̂µ2 ) = l (µ, σ̂µ2 )


n
X
= −(n/2) log(2π) − (n/2) log σ̂µ2 − (2σ̂µ2 )−1 (xi − µ)2
i=1
n
!
1X
= −(n/2) log(2π) − (n/2) log (xi − µ)2 − (2/n)−1 ,
n i=1

M. S. Alam (ISRT,DU) Methods of Estimation: REML msalam@isrt.ac.bd 28 / 31


Profile Likelihood IV

which yields (verify),


n
1X
µ̂ = Xi
n i=1

as profile likelihood estimator of µ.


The main drawback of the profile likelihood is that
" #  
δlψ (λ̂ψ ) δl (ψ, λ)
E 6= 0 = E .
δψ δψ

Verify this property in case of the previous example related to the Normal
distribution.
Note that, when the dimension of λ is a substantial function of n then the
mean of (δlψ (λ̂ψ ))/(δψ) is not negligible and the profile likelihood function
can be misleading.

M. S. Alam (ISRT,DU) Methods of Estimation: REML msalam@isrt.ac.bd 29 / 31


Profile Likelihood V

Assuming that X1 , X2 , . . . , Xn , each has a Weibull distribution as

αx (α−1) −(x /θ)α


f (x ; α, θ) = e ; x ∈ [0, ∞),
θα
a random sample. Derive the profile-likelihood for the shape parameter α,
and find the estimators for {α, θ}.
In profile likelihood method, we will first derive an estimate of the scale
parameter θ by assuming that α is known as,

θ̂α = arg max lα (θ)


θ
" n n 
#
X X xi α
= arg max n log α − (α − 1) log xi − nα log θ −
θ
i=1 i=1
θ

M. S. Alam (ISRT,DU) Methods of Estimation: REML msalam@isrt.ac.bd 30 / 31


Profile Likelihood VI

Taking the first defferentiation w.r.t. θ, and equating to 0 we have,


Pn α
nα i=1 xi
− + α (α+1) =0
θ θ
Xn
⇒ nθα = xiα
i=1
 Pn α
 α1
i=1 xi
⇒ θ̂ =
n

Then an estimate of α can be obtained as,

α̂ = arg max lθ (α)


α

= arg max l (α, θ̂)


α
" n  Pn #
α

i=1 xi
X
= arg max n log α − (α − 1) log xi − n log −n
θ
i=1
n

M. S. Alam (ISRT,DU) Methods of Estimation: REML msalam@isrt.ac.bd 31 / 31

You might also like