Professional Documents
Culture Documents
Statistical Inference II: Mohammad Samsul Alam
Statistical Inference II: Mohammad Samsul Alam
Statistical Inference II: Mohammad Samsul Alam
https://www.isrt.ac.bd/people/msalam
Xi ∼ N (µ, σ 2 ),
µ̂ = x̄
n
1X
σ̂ 2 = (xi − x̄ )2
n i=1
Now, from two variate calculus, to verify that a function H (θ1 , θ2 ) has a lo-
cal maximum at θˆ1 , θˆ2 , it must be shown that the following three conditions
hold,
1. The first-order partial derivatives are 0,
δ δ
H (θ1 , θ2 )|θ1 = θ̂1 , θ2 = θ̂2 = 0 and H (θ1 , θ2 )|θ1 = θ̂1 , θ2 = θ̂2 = 0
δθ1 δθ2
2. At least one second-order partial derivative is negative
δ2 δ2
2 H (θ1 , θ2 )|θ1 = θ̂1 , θ2 = θ̂2 < 0 or 2 H (θ1 , θ2 )|θ1 = θ̂1 , θ2 = θ̂2 < 0
δθ1 δθ2
δ22 H (θ1 , θ2 ) δ2
δθ1 δθ2 H (θ1 , θ2 )
δθ1
δ2 δ 2
δθ δθ H (θ1 , θ2 ) δθ22
H (θ1 , θ2 )
1 2 θ1 =θ̂1 ,θ2 =θ̂2
δ2 δ2 δ2
= 2 H (θ1 , θ2 ) 2 H (θ1 , θ2 ) − H (θ1 , θ2 ) |θ1 =θ̂1 ,θ2 =θ̂2 > 0
δθ1 δθ2 δθ1 δθ2
M. S. Alam (ISRT,DU) Methods of Estimation: REML msalam@isrt.ac.bd 3 / 31
Maximum Likelihood Estimation III
δ2 2 −n −n 2
l (µ, σ )|µ=µ̂,σ 2 =σ̂ 2 = = P n <0
δθ2 σ̂ 2 i=1 (xi − x̄ )
2
n
δ2 2 n 1 X −n
2 2
l (µ, σ )| 2
µ=µ̂,σ =σ̂ 2 = 4
− 6
(xi − x̄ )2 = <0
δ(σ ) 2σ̂ σ̂ i=1 2σ̂ 4
Again, with
n
δ2 2 1 X
l (µ, σ ) = − (xi − µ)|µ=µ̂,σ2 =σ̂2 = 0,
δθδσ 2 σ 4 i=1
it can be shown that the property three also holds for µ̂ and σˆ2 .
E (µ̂) = µ
(n − 1) 2
E (σ̂ 2 ) = σ
n
It can also be shown that
n
E σ̂ 2 = σ2 ,
(n − 1)
Theorem
If y ∼ N (0, IN ), and A is an orgonal projection, then yT Ay ∼ χ2(k ) with
k = ranlk(A).
It can be shown that the estimator in (3) is unbiased for population mean
µ.
This matrix has great uses, one of them is to estimate the residuals as,
ˆ = Y − Xµ̂ = Y − Ȳ = (I − H)Y.
In words, error contrasts are the linear combination of the observations for
which mean is zero i.e. Z = AY such that E (Z) = 0.
Since E (Z) = 0, the distribution of error constrasts (Z) does not depend
on the parameters β.
It is always possible to find a suitable matrix A for finding the error contrast
without knowing the true values of β and θ.
For example, in case of ordinary least square residuals,
−1 T
A = I − X XT X X
Yi ∼ N (µ, σ 2 ),
ΣZ =AΣY AT
=σ 2 AIAT
1 − (1/n) −(1/n) ... −(1/n)
1 − (1/n) −(1/n) ... −(1/n)
T
−(1/n) 1 − (1/n) ... −(1/n) −(1/n) 1 − (1/n) ... −(1/n)
=σ 2 . . . . . . . .
. . . . . . . .
. . . .
. . . .
−(1/n) −(1/n) ... 1 − (1/n) −(1/n) −(1/n) ... 1 − (1/n)
−(1/n) −(1/n) ... −(1/n) −(1/n) −(1/n) ... −(1/n)
1 − (1/n) −(1/n) . . . −(1/n)
−(1/n) 1 − (1/n) . . . −(1/n)
=σ 2
.. .. .. ..
. . . .
−(1/n) −(1/n) . . . 1 − (1/n) (n−1)×(n−1)
2
=σ [I(n−1) − (1/n)Jn−1 ]
=σ 2 [I(n−1) + (−1/n)Jn−1 ]
(n − 1) 1 1
l (σ 2 ) = − log(2π) − log |ΣZ | − ZT Σ−1Z Z (5)
2 2 2
−1
Here Σ−1 2
Z = (1/σ ) I(n−1) + (−1/n)Jn−1 is computed using the follow-
ing lemma,
Lemma
If A and A + B are invertible, and B has rank 1, then for g = trace(BA−1 )6= 1
we can write,
1
(A + B )−1 = A−1 − A−1 BA−1
1+g
So we can write,
−1
Σ−1 2
Z = (1/σ ) I(n−1) + (−1/n)Jn−1
2 1
= (1/σ ) I(n−1) − I(n−1) (−1/n)Jn−1 I(n−1)
1+g
= (1/σ 2 ) I(n−1) − n(−1/n)Jn−1
Z = AT Y
T
1 − (1/n) −(1/n) ... −(1/n)
−(1/n) 1 − (1/n) y1
... −(1/n)
.. .. ..
y2
=
..
. . . . ..
−(1/n)
.
−(1/n) ... 1 − (1/n)
yn
−(1/n) −(1/n) ... −(1/n)
1 − (1/n) −(1/n) ... −(1/n) −(1/n) y1
−(1/n) 1 − (1/n) ... −(1/n) −(1/n) y2
=
.. .. .. .. .. ..
. . . . . .
−(1/n) −(1/n) ... 1 − (1/n) −(1/n) yn
y1 − ȳ
y2 − ȳ
=
..
.
yn−1 − ȳ
M. S. Alam (ISRT,DU) Methods of Estimation: REML msalam@isrt.ac.bd 18 / 31
Restricted Maximum Likelihood Estimation (REML) IX
so,
T
y1 − ȳ 2 1 ... 1 y1 − ȳ
y2 − ȳ 1 2 ... 1 y2 − ȳ
−1
ZT ΣZ Z = (1/σ 2 )
.. .. .. .. .. ..
. . . . . .
yn−1 − ȳ 1 1 ... 2 yn−1 − ȳ
T
y1 − ȳ 2(y1 − ȳ) + (y2 − ȳ) + . . . + (yn−1 − ȳ)
y2 − ȳ (y1 − ȳ) + 2(y2 − ȳ) + . . . + (yn−1 − ȳ)
=(1/σ 2 )
.. ..
. .
yn−1 − ȳ (y1 − ȳ) + (y2 − ȳ) + . . . + 2(yn−1 − ȳ)
"n−1 ( n
)#
X X
2
=(1/σ ) (yi − ȳ) 2(yi − ȳ) + (yj − ȳ) − (yi − ȳ) − (yn − ȳ)
i=1 j =1
"n−1 n−1
#
X X
2 2
=(1/σ ) (yi − ȳ) − (yn − ȳ) (yi − ȳ)
i=1 i=1
"n−1 ( n )#
X X
2 2
=(1/σ ) (yi − ȳ) − (yn − ȳ) yi − yn − (n − 1)ȳ
i=1 i=1
"n−1 #
X
=(1/σ 2 ) (yi − ȳ)2 − (yn − ȳ) {n ȳ − yn − n ȳ + ȳ}
i=1
" n #
X
2 2
=(1/σ ) (yi − ȳ)
i=1
and
From the properties of the determinant, we know that, for any constant k
and a n × n square matrix A, det(kA) = k n det(A). Hence, we can write,
(n − 1) n −1 1
l (σ 2 ) = − log(2π) − log(σ 2 ) − log |[I(n−1) + (−1/n)Jn−1 ]|
2 2 2
n
1 X 2
− 2 (yi − ȳ)
2σ i=1
1 1 σ2 n(x̄ − µ)2
l µ; σ 2 = − log(2π) − log
−
2 2 n 2σ 2 Pn
n −1 n 2 1 σ2 (xi − x̄ )2
− log(2π) − log σ + log − i=1 2
2 2 2 n 2σ
1 σ2 n(x̄ − µ)2
= − constant + log +
2 n 2σ 2
Pn 2
n 2 1 n i=1 (xi − x̄ )
− constant + log σ + log 2 +
2 2 σ 2σ 2
So, the first part of the liklihood function l (µ; σ 2 ) can be maximized
w.r.t. µ
and the second part can be maximized w.r.t. σ 2 .
Further, the second part is independent of µ and its maximum occurs at
Pn
2 (xi − x̄ )2
σ = i=1 .
n −1
However, the former part of the liklihood function l (µ; σ 2 ) is the liklihood
for sample mean x̄ asymptotically.
In addition, x̄ which is the ordinary least square (OLS) as well as the MLE
(in Gaussian case) of µ.
1 n
The extra term 2 log σ2 in the later part of l (µ; σ 2 ) is related to the variance
of x̄ .
Moreover, this part distinguish the likelihood used in REML estimation
than the likelihood used in MLE.
n
X
arg max l (ψ, λ) = arg max log f (xi ; ψ, λ),
ψ,λ ψ,λ i=1
The profile likelihood starts with the assumption that ψ is known and
considers the following maximization
Verify this property in case of the previous example related to the Normal
distribution.
Note that, when the dimension of λ is a substantial function of n then the
mean of (δlψ (λ̂ψ ))/(δψ) is not negligible and the profile likelihood function
can be misleading.