Statistical Inference II: Mohammad Samsul Alam

Statistical Inference II
(Lecture Material III)
Mohammad Samsul Alam

Assistant Professor of Applied Statistics
Institute of Statistical Research and Training (ISRT)
University of Dhaka
https://www.isrt.ac.bd/people/msalam
M. S. Alam (ISRT,DU) Methods of Estimation: REML msalam@isrt.ac.bd 1 / 31

Maximum Likelihood Estimation I
Suppose our interest is to estimate the σ 2 from the observed values x1 , x2 , . . . , xn

of X1 , X2 , . . . , Xn where
Xi ∼ N (µ, σ 2 ),
with both µ and σ 2 are unknown.

Now the log-likelihood is,
Pn
n n i=1 (xi − µ)2
l (µ, σ 2 ) = − log(2π) − log σ 2 − (1)
2 2 2σ 2
First derivative of (1) w.r.t {µ, σ 2 } separately yields (verify),
µ̂ = x̄
n
1X
σ̂ 2 = (xi − x̄ )2
n i=1

Maximum Likelihood Estimation II
Now, from two variate calculus, to verify that a function H (θ1 , θ2 ) has a lo-
cal maximum at θˆ1 , θˆ2 , it must be shown that the following three conditions
hold,
1. The first-order partial derivatives are 0,
δ δ
H (θ1 , θ2 )|θ1 = θ̂1 , θ2 = θ̂2 = 0 and H (θ1 , θ2 )|θ1 = θ̂1 , θ2 = θ̂2 = 0
δθ1 δθ2
2. At least one second-order partial derivative is negative
δ2 δ2
2 H (θ1 , θ2 )|θ1 = θ̂1 , θ2 = θ̂2 < 0 or 2 H (θ1 , θ2 )|θ1 = θ̂1 , θ2 = θ̂2 < 0
δθ1 δθ2
3. The Jacobian of the second-order partial derivatives is positive,
δ22 H (θ1 , θ2 ) δ2

δθ1 δθ2 H (θ1 , θ2 )

δθ1
δ2 δ 2
δθ δθ H (θ1 , θ2 ) δθ22
H (θ1 , θ2 )
1 2 θ1 =θ̂1 ,θ2 =θ̂2
δ2 δ2 δ2

= 2 H (θ1 , θ2 ) 2 H (θ1 , θ2 ) − H (θ1 , θ2 ) |θ1 =θ̂1 ,θ2 =θ̂2 > 0
δθ1 δθ2 δθ1 δθ2
Maximum Likelihood Estimation III
For Normal likelihood, as we obtained µ̂ and σ̂ 2 by equating the first deriva-

tives of l (µ, σ 2 ) respectively with µ and σ 2 to 0 the first condition holds in
this case.
It can easily be shown that (verify)
δ2 2 −n −n 2
l (µ, σ )|µ=µ̂,σ 2 =σ̂ 2 = = P n <0
δθ2 σ̂ 2 i=1 (xi − x̄ )
2
n
δ2 2 n 1 X −n
2 2
l (µ, σ )| 2
µ=µ̂,σ =σ̂ 2 = 4
− 6
(xi − x̄ )2 = <0
δ(σ ) 2σ̂ σ̂ i=1 2σ̂ 4
Again, with
n
δ2 2 1 X
l (µ, σ ) = − (xi − µ)|µ=µ̂,σ2 =σ̂2 = 0,
δθδσ 2 σ 4 i=1
it can be shown that the property three also holds for µ̂ and σˆ2 .

Maximum Likelihood Estimation IV
Now, after a little algebra (verify), it can be shown that
E (µ̂) = µ
(n − 1) 2
E (σ̂ 2 ) = σ
n
It can also be shown that

n
E σ̂ 2 = σ2 ,
(n − 1)
which leads to conclude that

n
1 X
(xi − x̄ )2
(n − 1) i=1
is an unbiased estimate of the population variance σ 2 .

Maximum Likelihood Estimation V
Now let us do the following algebra,

n
1X
E (σ̂ 2 ) = E ( (Xi − X̄ )2 )
n i=1
1
= E (X − AX)T (X − AX) ,

n
where
X is a n × 1 vector of the random variables X1 , X2 , . . . , Xn
A is a n × n defined as A = (1/n)Jn , where Jn is a n × n matrix of 1’s.

Maximum Likelihood Estimation VI
Then we can write,

1
E (σ̂ 2 ) = E (X − AX)T (X − AX)

n
1
= E XT X − XT AX

(A is a Hermitian matrix1 )
n
1 1
= E XT X − E XT AX

n n
1 2 2 1
= σ E χ (n) + µ2 − σ 2 E χ2 (1) − µ2 (Why?)

n n
2 1 2
=σ − σ
n
n −1 2
= σ
n
The second term makes the estimator biased. Moreover, the bias is down-
ward for which rank (A) = 1 is responsible.

Maximum Likelihood Estimation VII
In general, it can be shown that when rank (A) = p, we will have

n −p 2
E (σ̂ 2 ) = σ .
n
Theorem
If y ∼ N (0, IN ), and A is an orgonal projection, then yT Ay ∼ χ2(k ) with
k = ranlk(A).
If A is idempotent, its eigenvalues satisfy λ2 = λ. If A is also Hermitian,

its eigen values are real. Hence, in eigen decomposition A = QΛQ−1 =
QΛQT , Λ is a diagonal matrix containing either 0 or 1.
(1) When A is full-rank, Λ = IN , A has to be IN , and yT Ay = yT y. Then the
result follows immediately from the definition of chi-square distribution.

Maximum Likelihood Estimation VIII
(2) When A is of rank k < N ,

T T T T Ik 0
y Ay = y QΛQ y = W W = WkT Wk ,
0 0
where W = QT y, and Wk denotes the vector containing the first k ele-

ments of W. Since, Q is orthogonal, W ∼ N (0, IN ), so Wk ∼ N (0, Ik ).
The result again follows directly from the definition of chi-square distribu-
tion. The following two properties of a Hermitian matrix have been used
in the above proof.
(a) Hermitian matrix has real eigenvalues
(b) Eigenvectors of a real symmetric matrix are orthogonal which implies that
Q −1 = Q T
The inferential problem we had can also be restated, in terms of regression
model, as,
Y = Xµ + ; ∼ MVN (0, σ 2 I), (2)
where X is a n × 1 matrix of 1. (note that we have changed the notations

here to keep consistency with the usual notations of regression analysis)
Maximum Likelihood Estimation IX
Now it can be shown that H = X(XT X)−1 XT , popularly known as hat

matrix in regression analysis.
This matrix is a Hermitian matrix as a consequance I−H is also a Hermitian
matrix which has rank equals to n − p.
For a model specified as equation (2), the mle of µ can be obtained as,
µ̂ = (XT X)−1 XT Y. (3)
It can be shown that the estimator in (3) is unbiased for population mean
µ.
This matrix has great uses, one of them is to estimate the residuals as,
ˆ = Y − Xµ̂ = Y − Ȳ = (I − H)Y.
1A matrix whose conjugate transpose yields that matrix

1A matrix whose conjugate transpose yields that matrix
Restricted Maximum Likelihood Estimation (REML) I
The restricted maximum liklihood estimation is also known as residual max-

imum likelhood estimation and reduced maximum likelhood estimation.
The REML was developed to achieve an unbiased estimate of the scale
parameters under study.
The idea is to partitioning the likelihood into two components: one of which
involves with location parameters (µ) ( as well as the scale parameters σ 2 )
and the other one involves only the scale parameter σ 2 .
The former is known as likelihood and the later is termed as residual like-
lihood.
The estimation is carried out by optimising the likelihood for location pa-
rameters and residual likelihood for scale parameters.
Patterson and Thompson (1971) introduced the REML estimation as an
alternative of ML estimation for variance component.
This approach is based on the idea of optimizing the error contrasts for
estimating the parameters involved in variance component.

Restricted Maximum Likelihood Estimation (REML) II
In words, error contrasts are the linear combination of the observations for
which mean is zero i.e. Z = AY such that E (Z) = 0.
Since E (Z) = 0, the distribution of error constrasts (Z) does not depend
on the parameters β.
It is always possible to find a suitable matrix A for finding the error contrast
without knowing the true values of β and θ.
For example, in case of ordinary least square residuals,
−1 T
A = I − X XT X X
has the required properties.

Because Z is a linear transformation of Y, it retains a multivariate Gaussian
distribution.
However, the constraint imposed by the requirement that the distribution
of Z must not depend on β reduces the effective dimensionality of Z from
n to n − p, where p is the number of elements of β.

Restricted Maximum Likelihood Estimation (REML) III
Therefore, the REML estimator for θ by maximizing the profile likelihood

for θ based on the transformed data Z.
REML estimation is proposed to estimate the variance parameter mainly.
Suppose our interest is to estimate the σ 2 from the observed values y1 ,
y2 ,. . .,yn of Y1 , Y2 , . . . , Yn where
Yi ∼ N (µ, σ 2 ),
with both µ and σ 2 are unknown.

The above scenario can be presented in matrix notation as the way de-
scribed in equation (2).
With the specification in equation (2) we can compute the error contrasts
as
h −1 T i
Z = I − X XT X X Y (4)
Therefore, as linear combination of Normally distributed random variable

Y the random variable Z will have a Normal distribution with mean
−10 and
covariance matrix (I − H)Cov(Y)(I − H)T , where H = X XT X XT .
Restricted Maximum Likelihood Estimation (REML) IV
As the rank of (I − H) is (n − 1) we will have to use any of the (n − 1)

element of Z.
With X, a (n × 1) column matrix of 1, we have
 
1 − (1/n) −(1/n) . . . −(1/n)
 −(1/n) 1 − (1/n) . . . −(1/n)
h −1 T i  
I − X XT X X =

.. .. .. .. 
 . . . . 
−(1/n) −(1/n) . . . 1 − (1/n)
Now, let us define a new matrix A by taking the first (n − 1) columns of

(I − H) and Z = AT Y.
As Y follows multivariate normal distribution, the distribution of Z is
also multivariate normal with mean vector µZ = 0 and covariance matrix
ΣZ = AΣY AT , where ΣY = Cov(Y) = σ 2 I.
Then

Restricted Maximum Likelihood Estimation (REML) V
ΣZ =AΣY AT
=σ 2 AIAT
 1 − (1/n) −(1/n) ... −(1/n)
 1 − (1/n) −(1/n) ... −(1/n)
T
−(1/n) 1 − (1/n) ... −(1/n) −(1/n) 1 − (1/n) ... −(1/n)
=σ 2  . . . . . . . .
  
. . . . . . . .
. . . .
 . . . .

−(1/n) −(1/n) ... 1 − (1/n) −(1/n) −(1/n) ... 1 − (1/n)
−(1/n) −(1/n) ... −(1/n) −(1/n) −(1/n) ... −(1/n)
 
1 − (1/n) −(1/n) . . . −(1/n)
 −(1/n) 1 − (1/n) . . . −(1/n) 
=σ 2 
 
.. .. .. .. 
 . . . . 
−(1/n) −(1/n) . . . 1 − (1/n) (n−1)×(n−1)
2
=σ [I(n−1) − (1/n)Jn−1 ]
=σ 2 [I(n−1) + (−1/n)Jn−1 ]
where Jn−1 is a (n − 1) × (n − 1) matrix of 1.

Restricted Maximum Likelihood Estimation (REML) VI
The log-likelihood function for Z is
(n − 1) 1 1
l (σ 2 ) = − log(2π) − log |ΣZ | − ZT Σ−1Z Z (5)
2 2 2
−1
Here Σ−1 2

Z = (1/σ ) I(n−1) + (−1/n)Jn−1 is computed using the follow-
ing lemma,
Lemma
If A and A + B are invertible, and B has rank 1, then for g = trace(BA−1 )6= 1
we can write,
1
(A + B )−1 = A−1 − A−1 BA−1
1+g
For our case, g = trace((−1/n)Jn−1 I−1 ) = −(n − 1)/n, therefore g + 1 =

1 − (n − 1)/n = 1/n.

Restricted Maximum Likelihood Estimation (REML) VII
So we can write,
−1
Σ−1 2

Z = (1/σ ) I(n−1) + (−1/n)Jn−1

2 1
= (1/σ ) I(n−1) − I(n−1) (−1/n)Jn−1 I(n−1)
1+g
= (1/σ 2 ) I(n−1) − n(−1/n)Jn−1

= (1/σ 2 ) I(n−1) − (−1)Jn−1

   

 1 0 ... 1 −1 −1 . . . −1 

 0 1 . . . 0   −1 −1 . . . −1
 

= (1/σ 2 )  . . . ..  −  ..
   
. . . .. .. .. 



 . . . .   . . . . 


0 0 ... 1 −1 −1 . . . −1
 
 
2 1 ... 1
 1 2 ... 1 
= (1/σ 2 )  . . .
 
 .. .. . . ... 

1 1 ... 2

Restricted Maximum Likelihood Estimation (REML) VIII
In this case, our transformed data look like
Z = AT Y
 T
1 − (1/n) −(1/n) ... −(1/n)  
 −(1/n) 1 − (1/n) y1
... −(1/n) 

.. .. ..
  y2 
=
 ..   
. . . .   .. 

 −(1/n)
  . 
−(1/n) ... 1 − (1/n) 
yn
−(1/n) −(1/n) ... −(1/n)
  
1 − (1/n) −(1/n) ... −(1/n) −(1/n) y1
 −(1/n) 1 − (1/n) ... −(1/n) −(1/n)  y2 
=
  
.. .. .. .. ..  .. 
 . . . . .  . 
−(1/n) −(1/n) ... 1 − (1/n) −(1/n) yn
 
y1 − ȳ
 y2 − ȳ 
=
 
.. 
 . 
yn−1 − ȳ
Restricted Maximum Likelihood Estimation (REML) IX
so,
 T   
y1 − ȳ 2 1 ... 1 y1 − ȳ
 y2 − ȳ   1 2 ... 1  y2 − ȳ 
−1
ZT ΣZ Z =  (1/σ 2 ) 
    
.. .. .. .. ..  .. 
 .   . . . .  . 
yn−1 − ȳ 1 1 ... 2 yn−1 − ȳ
 T  
y1 − ȳ 2(y1 − ȳ) + (y2 − ȳ) + . . . + (yn−1 − ȳ)
 y2 − ȳ   (y1 − ȳ) + 2(y2 − ȳ) + . . . + (yn−1 − ȳ) 
=(1/σ 2 ) 
   
..   .. 
 .   . 
yn−1 − ȳ (y1 − ȳ) + (y2 − ȳ) + . . . + 2(yn−1 − ȳ)
"n−1 ( n
)#
X X
2
=(1/σ ) (yi − ȳ) 2(yi − ȳ) + (yj − ȳ) − (yi − ȳ) − (yn − ȳ)
i=1 j =1
"n−1 n−1
#
X X
2 2
=(1/σ ) (yi − ȳ) − (yn − ȳ) (yi − ȳ)
i=1 i=1

Restricted Maximum Likelihood Estimation (REML) X
"n−1 ( n )#
X X
2 2
=(1/σ ) (yi − ȳ) − (yn − ȳ) yi − yn − (n − 1)ȳ
i=1 i=1
"n−1 #
X
=(1/σ 2 ) (yi − ȳ)2 − (yn − ȳ) {n ȳ − yn − n ȳ + ȳ}
i=1
" n #
X
2 2
=(1/σ ) (yi − ȳ)
i=1
and
|ΣZ | = |σ 2 [I(n−1) + (−1/n)Jn−1 ]|
From the properties of the determinant, we know that, for any constant k
and a n × n square matrix A, det(kA) = k n det(A). Hence, we can write,
|ΣZ | = (σ 2 )n−1 |[I(n−1) + (−1/n)Jn−1 ]|

Restricted Maximum Likelihood Estimation (REML) XI
Finally, the log-likelihood in equation (5) can be written in simplified form

as,
(n − 1) n −1 1
l (σ 2 ) = − log(2π) − log(σ 2 ) − log |[I(n−1) + (−1/n)Jn−1 ]|
2 2 2
n
1 X 2
− 2 (yi − ȳ)
2σ i=1
Now, by taking differentiation w.r.t. σ 2 , we found

Pn 2
d n −1 (yi − ȳ)
l (σ 2 ) = − + i=1
d σ2 2σ 2 2(σ 2 )2

Restricted Maximum Likelihood Estimation (REML) XII
equating the right hand side to zero yields,

Pn 2
n −1 i=1 (yi − ȳ)
− + 2 = 0
2σ̂ 2

reml 2 σ̂ 2
reml
Pn 2
(yi − ȳ) n −1
⇒ i=1 2 =
2σ̂ 2

2 σ̂ 2 reml
reml
Pn 2
2 (yi − ȳ)
⇒σ̂reml = i=1
n −1

Restricted Maximum Likelihood Estimation (REML) XIII
Reduced Maximum Likelihood Estimation (REML)
Now the log-likelihood is,

Pn
2 n n 2 i=1 (xi − µ)2
l (µ, σ ) = − log(2π) − log σ − 2
2 2 Pn 2σ
n n i=1 i −
(x x̄ + x̄ − µ)2
=− log(2π) − log σ 2 −
2 2 Pn 2σ 2
n n (xi − x̄ )2 + n(x̄ − µ)2
=− log(2π) − log σ 2 − i=1 2
2 2 2σP
2 n 2
n n n(x̄ − µ) i=1 (xi − x̄ )
=− log(2π) − log σ 2 − −
2 2 2σ 2 2σ 2

Restricted Maximum Likelihood Estimation (REML) XIV
Therefore, we can write
1 1 σ2 n(x̄ − µ)2
l µ; σ 2 = − log(2π) − log

−
2 2 n 2σ 2 Pn
n −1 n 2 1 σ2 (xi − x̄ )2
− log(2π) − log σ + log − i=1 2
2 2 2 n 2σ
1 σ2 n(x̄ − µ)2
= − constant + log +
2 n 2σ 2
Pn 2

n 2 1 n i=1 (xi − x̄ )
− constant + log σ + log 2 +
2 2 σ 2σ 2
So, the first part of the liklihood function l (µ; σ 2 ) can be maximized
w.r.t. µ
and the second part can be maximized w.r.t. σ 2 .
Further, the second part is independent of µ and its maximum occurs at
Pn
2 (xi − x̄ )2
σ = i=1 .
n −1

Restricted Maximum Likelihood Estimation (REML) XV
However, the former part of the liklihood function l (µ; σ 2 ) is the liklihood
for sample mean x̄ asymptotically.
In addition, x̄ which is the ordinary least square (OLS) as well as the MLE
(in Gaussian case) of µ.
1 n
The extra term 2 log σ2 in the later part of l (µ; σ 2 ) is related to the variance
of x̄ .
Moreover, this part distinguish the likelihood used in REML estimation
than the likelihood used in MLE.

Profile Likelihood I
Let us suppose that the unknown parameters θ can be partitioned as θ =

{ψ, λ}, where ψ is a p−dimensional vector of interested parameters and λ
is a q−dimensional vector of nuisance parameters.
ˆ λ}
Though, estimation involves finding values {ψ, ˆ at which the log-likelihood
ˆ
function l (ψ, λ) is maximum, our main focus on the ψ.
ˆ λ}
To find {ψ, ˆ one can easily use the following fact,
n
X
arg max l (ψ, λ) = arg max log f (xi ; ψ, λ),
ψ,λ ψ,λ i=1
where {xi }ni=1 is the observed values of the random sample X1 , X2 , . . . , Xn

from fX (x ; θ).
However, the above procedures becomes difficult in some situations. Alter-
native of the above, a solution is the use of profile likelihood.

Profile Likelihood II
The profile likelihood starts with the assumption that ψ is known and
considers the following maximization
λ̂ψ = arg max lψ (λ),

λ
where lψ (λ) is l (ψ, λ) at a given value of ψ.

This method then finds ψ̂ as
ψ̂ = arg max lψ (λ̂ψ ) = arg max l (ψ, λ̂ψ )

ψ ψ
to estimate the parameter of interest ψ.

For example, consider the estimation of {µ, σ 2 } based on a random sample
X1 , X2 , . . . , Xn from N (µ, σ 2 ).
The profile likelihood method first assumes that µ is known and maximize
the following log-likelihood
n
X
lµ (σ 2 ) = −(n/2) log(2π) − (n/2) log σ 2 − (2σ 2 )−1 (xi − µ)2 (6)
i=1
Profile Likelihood III
Maximization of (6) w.r.t. σ 2 assuming µ in known yields,

n
1X
σ̂µ2 = (xi − µ)2 .
n i=1
Therefore, profile log-likelihood of µ is
lµ (σ̂µ2 ) = l (µ, σ̂µ2 )

n
X
= −(n/2) log(2π) − (n/2) log σ̂µ2 − (2σ̂µ2 )−1 (xi − µ)2
i=1
n
!
1X
= −(n/2) log(2π) − (n/2) log (xi − µ)2 − (2/n)−1 ,
n i=1

Profile Likelihood IV
which yields (verify),

n
1X
µ̂ = Xi
n i=1
as profile likelihood estimator of µ.

The main drawback of the profile likelihood is that
" #
δlψ (λ̂ψ ) δl (ψ, λ)
E 6= 0 = E .
δψ δψ
Verify this property in case of the previous example related to the Normal
distribution.
Note that, when the dimension of λ is a substantial function of n then the
mean of (δlψ (λ̂ψ ))/(δψ) is not negligible and the profile likelihood function
can be misleading.

Profile Likelihood V
Assuming that X1 , X2 , . . . , Xn , each has a Weibull distribution as
αx (α−1) −(x /θ)α

f (x ; α, θ) = e ; x ∈ [0, ∞),
θα
a random sample. Derive the profile-likelihood for the shape parameter α,
and find the estimators for {α, θ}.
In profile likelihood method, we will first derive an estimate of the scale
parameter θ by assuming that α is known as,
θ̂α = arg max lα (θ)

θ
" n n
#
X X xi α
= arg max n log α − (α − 1) log xi − nα log θ −
θ
i=1 i=1
θ

Profile Likelihood VI
Taking the first defferentiation w.r.t. θ, and equating to 0 we have,

Pn α
nα i=1 xi
− + α (α+1) =0
θ θ
Xn
⇒ nθα = xiα
i=1
Pn α
α1
i=1 xi
⇒ θ̂ =
n
Then an estimate of α can be obtained as,
α̂ = arg max lθ (α)

α
= arg max l (α, θ̂)

α
" n Pn #
α

i=1 xi
X
= arg max n log α − (α − 1) log xi − n log −n
θ
i=1
n

Statistical Inference II: Mohammad Samsul Alam

Uploaded by

Copyright:

Available Formats

You might also like

Statistical Inference II: Mohammad Samsul Alam

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Statistical Inference II: Mohammad Samsul Alam

Uploaded by

Copyright:

Available Formats

Statistical Inference II

(Lecture Material III)

Mohammad Samsul Alam

M. S. Alam (ISRT,DU) Methods of Estimation: REML msalam@isrt.ac.bd 1 / 31

Suppose our interest is to estimate the σ 2 from the observed values x1 , x2 , . . . , xn

with both µ and σ 2 are unknown.

M. S. Alam (ISRT,DU) Methods of Estimation: REML msalam@isrt.ac.bd 2 / 31

3. The Jacobian of the second-order partial derivatives is positive,

For Normal likelihood, as we obtained µ̂ and σ̂ 2 by equating the first deriva-

M. S. Alam (ISRT,DU) Methods of Estimation: REML msalam@isrt.ac.bd 4 / 31

Now, after a little algebra (verify), it can be shown that

which leads to conclude that

is an unbiased estimate of the population variance σ 2 .

M. S. Alam (ISRT,DU) Methods of Estimation: REML msalam@isrt.ac.bd 5 / 31

Now let us do the following algebra,

M. S. Alam (ISRT,DU) Methods of Estimation: REML msalam@isrt.ac.bd 6 / 31

Then we can write,

M. S. Alam (ISRT,DU) Methods of Estimation: REML msalam@isrt.ac.bd 7 / 31

In general, it can be shown that when rank (A) = p, we will have

If A is idempotent, its eigenvalues satisfy λ2 = λ. If A is also Hermitian,

M. S. Alam (ISRT,DU) Methods of Estimation: REML msalam@isrt.ac.bd 8 / 31

(2) When A is of rank k < N ,

where W = QT y, and Wk denotes the vector containing the first k ele-

Y = Xµ + ;  ∼ MVN (0, σ 2 I), (2)

where X is a n × 1 matrix of 1. (note that we have changed the notations

Now it can be shown that H = X(XT X)−1 XT , popularly known as hat

µ̂ = (XT X)−1 XT Y. (3)

1A matrix whose conjugate transpose yields that matrix

The restricted maximum liklihood estimation is also known as residual max-

M. S. Alam (ISRT,DU) Methods of Estimation: REML msalam@isrt.ac.bd 11 / 31

has the required properties.

M. S. Alam (ISRT,DU) Methods of Estimation: REML msalam@isrt.ac.bd 12 / 31

Therefore, the REML estimator for θ by maximizing the profile likelihood

with both µ and σ 2 are unknown.

Therefore, as linear combination of Normally distributed random variable

As the rank of (I − H) is (n − 1) we will have to use any of the (n − 1)

Now, let us define a new matrix A by taking the first (n − 1) columns of

M. S. Alam (ISRT,DU) Methods of Estimation: REML msalam@isrt.ac.bd 14 / 31

where Jn−1 is a (n − 1) × (n − 1) matrix of 1.

M. S. Alam (ISRT,DU) Methods of Estimation: REML msalam@isrt.ac.bd 15 / 31

The log-likelihood function for Z is

For our case, g = trace((−1/n)Jn−1 I−1 ) = −(n − 1)/n, therefore g + 1 =

M. S. Alam (ISRT,DU) Methods of Estimation: REML msalam@isrt.ac.bd 16 / 31

= (1/σ 2 ) I(n−1) − (−1)Jn−1

M. S. Alam (ISRT,DU) Methods of Estimation: REML msalam@isrt.ac.bd 17 / 31

In this case, our transformed data look like

M. S. Alam (ISRT,DU) Methods of Estimation: REML msalam@isrt.ac.bd 19 / 31

|ΣZ | = |σ 2 [I(n−1) + (−1/n)Jn−1 ]|

|ΣZ | = (σ 2 )n−1 |[I(n−1) + (−1/n)Jn−1 ]|

M. S. Alam (ISRT,DU) Methods of Estimation: REML msalam@isrt.ac.bd 20 / 31

Finally, the log-likelihood in equation (5) can be written in simplified form

Now, by taking differentiation w.r.t. σ 2 , we found

M. S. Alam (ISRT,DU) Methods of Estimation: REML msalam@isrt.ac.bd 21 / 31

equating the right hand side to zero yields,

M. S. Alam (ISRT,DU) Methods of Estimation: REML msalam@isrt.ac.bd 22 / 31

Reduced Maximum Likelihood Estimation (REML)

Now the log-likelihood is,

M. S. Alam (ISRT,DU) Methods of Estimation: REML msalam@isrt.ac.bd 23 / 31

Y = Xµ + ; ∼ MVN (0, σ 2 I), (2)