Professional Documents
Culture Documents
Mixed
Mixed
Overview
This document summarizes the computational algorithms discussed in Wolfinger,
Tobias and Sall (1994).
Notation
The following notation is used throughout this chapter unless otherwise stated:
& (θ )
V First derivative of V(θ ) with respect to the s th parameter in θ .
s
&& (θ )
V Second derivative of V(θ ) with respect to the s th and t th parameters in θ .
st
R(θ R ) The n n covariance matrix for ε .
& (θ )
R First derivative of R(θ R ) with respect to the s th parameter in θ R
s R
&& (θ )
R Second derivative of R(θ R ) with respect to the s th and t th parameters in θ R .
st R
G(θ G ) The covariance matrix of random effects.
& (θ )
G First derivative of G(θ G ) with respect to the s th parameter in θ G .
s G
1
2 MIXED
&& (θ )
G Second derivative of G(θ G ) with respect to the s th and t th parameters in θ G .
st G
Vk (θ k ) The covariance matrix of the k th random effect for one random subject.
& (θ )
V First derivative of Vk (θ k ) with respect to the s th parameter in θ k .
k ,s k
&& (θ )
V Second derivative of Vk (θ k ) with respect to the s th and t th parameters in θ k .
k ,st k
y n 1 vector of observed values of the dependent variable.
X n p design matrix of fixed effects.
Z n q design matrix of random effects.
r n 1 vector of residuals.
β p 1 vector of fixed effects parameters.
γ q 1 vector of random effects parameters.
ε n 1 vector of residual error terms.
Wc n n diagonal matrix of case weights.
Wrw n n diagonal matrix of regression weights.
Model
y = Xβ + Z γ + ε (1)
If there are regression weights, the covariance matrix R(θ R ) will be replaced by
R * (θ R ) = W -1/ 2 R(θ R )W -1/ 2 . For simpler notations, we will assume that the
rw rw
weights are already included in the matrix R(θ R ) and they will not be displayed in
the remainder of this document. When case weights are specified, they will be
rounded to nearest integer and each case will be entered into the analysis multiple
times depending on the rounded case weight. Since replicating a case will lead to
duplicate repeated measures (Note: repeated measures are unique within a repeated
subject), non-unity case weights will only be allowed for R(θ R ) with scaled
identity structure. In MIXED, only cases with positive case weight and regression
weight will be included analysis.
If there are K random effects and Sk random subjects in the k th random effect, the
design matrix Z will be partitioned as
Z = Z1 Z2 L ZK ,
Z k = Z k1 Zk 2 L Z kSk , k = 1,K, K .
The number of columns in the design matrix Z kj (the j th random subject of the
k th random effect) is equal to the number of levels of the k th random effect
variable.
Under this partition, the G(θ G ) will be a block diagonal matrix which can be
expressed as
G(θ G ) = ª kK=1 I Sk © Vk (θ k ) .
4 MIXED
It should also be noted that each random effect has its own parameter vector θ k ,
k = 1,K, K , and there are no functional constraints between elements in these
1
parameter vectors. Thus θ G = θ1 ,K, θ K . 6
Repeated Subjects:
When the REPEATED subcommand is used, R(θ R ) will be a block diagonal
matrix where the i th block is R i (θ R ) , i = 1,K, SR . That is,
S
R(θ R ) = ª i =R1 R i (θ R )
Likelihood Functions
-2l MLE (β, θ ) = log| V(θ )|+ r(θ ) T V(θ ) -1 r(θ ) + n log 2Q (2)
-2l REML (θ ) = log|V(θ )|+ r(θ ) T V(θ ) -1 r(θ ) + log| X’V -1 X|+(n - p) log 2Q (3)
where n is the number of observations and p is the rank of fixed effect design
matrix. From (2) and (3), we can see that the key components of the likelihood
functions are
l1 (θ ) = log| V(θ )|
l 2 (θ ) = r(θ ) T V -1 (θ ) r(θ ) (4)
l 3 (θ ) = log| X T V -1 (θ )X|.
MIXED 5
Newton’s algorithm performs well if the starting value is close to the solution. In
order to improve the algorithm’s robustness to bad starting values, the scoring
algorithm is used in the first few iterations. This can be done easily by applying
different formulae for the Hessian matrix at each iteration. Apart from improved
robustness, the scoring algorithm is faster due to the simpler form of the Hessian
matrix.
Convergence Criteria
There are three types of convergence criteria: parameter convergence, log-
likelihood convergence and Hessian convergence. Parameter and log-likelihood
convergence are further subdivided into absolute and relative. If we let ε be some
given tolerance level and
6 MIXED
σˆ 2 σˆ 2
Gk = and R = .
K +1 K +1
MIXED 7
[log(σˆ 2n ) − z1− α / 2 σ −n2 Var (σˆ 2n ) , log(σˆ 2n ) + z1− α / 2σ −n2 Var (σˆ 2n ) ]
(
[exp log(σˆ 2n ) − z1− α / 2 σ −n 2 Var (σˆ 2n ) ) ( )
, exp log(σˆ 2n ) + z1−α / 2 σ −n 2 Var (σˆ 2n ) ]
( ) ( )
[tanh arctanh(ρˆ ) − z1−α / 2 (1 − ρˆ 2 )−1 Var(ρˆ ) , tanh arctanh(ρˆ ) − z1−α / 2 (1 − ρˆ 2 )−1 Var(ρˆ ) ]
XT R
ˆ −1 X XT R ˆ −1 Z β X T R
ˆ −1 y
T ˆ −1 =
ˆ −1 T ˆ −1
(5)
Z R X Z R X + G γ Z R y
T ˆ −1
ˆ −1 X ) − X T V
ˆ = (X T V ˆ −1y
ˆ =G ˆ −1 (y − X ˆ )
ˆ ZTV (6)
=G ˆ −1y − Z T V
ˆ [Z T V ˆ −1 X ˆ ]
C = Cov(βˆ , γˆ )
−
XT Rˆ −1 X XT R ˆ −1Z
= T −1
ˆ −1
(7)
ˆ ˆ −1 Z + G
ZTR
Z R X
C
ˆ ˆ T 21
C
= 11
ˆ ˆ
C 21 C 22
where
MIXED 9
ˆ −1X) −
ˆ = (XT V
C11
ˆ = −G
C ˆ −1XC
ˆ ZTV ˆ
21 11
ˆ −1Z + G
ˆ = (Z T R
C ˆ −1 ) −1 − C ˆ −1ZG
ˆ XT V ˆ
22 21
Custom Hypotheses
Lb = [L 0 L1 ] (8)
F=
(Lbˆ − a) (LCˆ L ) (Lbˆ − a )
T T −1
(9)
q
where q is the rank of the matrix L . The statistic in (9) has an approximate F
distribution. The numerator degrees of freedom is q and the denominator degree
of freedom can be obtained by Satterthwaite (1946) approximation. The method
outlined below is similar to Giesbrecht and Burns (1985), McLean and Sanders
(1988), and Fai and Cornelius (1996).
10 MIXED
Satterthwaite’s Approximation
To find the denominator degrees of freedom of (9), we first perform the spectral
decomposition LC ˆ LT = Γ T DΓ where Γ is an orthogonal matrix of
eigenvectors and D is a diagonal matrix of eigenvalues. If we let l m be the the
m th row of Γ L , d m be the m th eigenvalues and
2
2d
= T m −1
g m ∑(θˆ ) g m
m
∂l m Cl Tm
where gm = and Σ(θˆ ) −1 is the covariance matrix of the estimated
∂θ θ = θˆ
covariance parameters. If we let
q
E= ∑ m
I( > 2)
m− 2
m
m =1
2E
= .
E− q
Note that the degrees of freedom can only be computed when E>q.
Type I or III test statistics are special cases of custom hypothesis tests.
Saved Values
yˆ = X ˆ + Z ˆ (10)
y$ F = Xβ$ (11)
r = yˆ − y (12)
Information Criteria
Information criteria are for model comparison, and the following criteria are given
in “smaller is better” form. If we let l be the log-likelihood of (REML or ML), n
be total number of cases (or total of case weights if used) and d be number of
model parameters, the formulae for various criteria are given below:
• Akaike information criteria (AIC), Akaike (1974):
-2 l + 2d
• Finite sample corrected (AICC), Hurvich and Tsai (1989):
2d n
-2l +
( n - d - 1)
For REML, the value of n is chosen to be the total number of cases minus number
fixed effect parameters and d is the number of covariance parameters. For ML, the
value of n is the total number of cases and d is the number of fixed effect
parameters plus number of covariance parameters.
12 MIXED
& ),
[g1 ]s = tr (V −1Vs
and the 2 derivatives with respect to the s th and t th parameters are given by
nd
& V −1V
[H1 ]st = − tr (V −1V & ) + tr (V −1V
&& ),
s t st
& V −1V
[H 2 ]st = 2r T V −1V & V −1r
s t
~ ~ T & −1
& V −1X
− 2r T V −1V s X Vt V r (14)
&& V −1r
− r T V −1V st
~ & V −1V ~
& V −1X
[H 3 ]st = 2tr ( XT V −1V s t )
~ ~ ~ T −1 & −1 ~
& V −1X
− tr ( XT V −1Vs X V Vt V X)
~ & V −1X~
− tr ( XT V −1Vst )
~
where X = XC for a matrix C satisfying CCT = ( X’V −1X) − = P and
r = [I − X(X’V −1 X)−1 X’V −1 ]y = y − X ˆ .
MIXED 13
W1 ( X, X) W1 ( X, Z) W1 ( X, r )
W1 ( X; r; Z) = W1 (Z, X) W1 (Z, Z) W1 (Z, r )
W1 (r, X) W1 (r, Z) W1 (r, r )
(15)
X T V −1X X T V −1Z XT V −1r
= Z T V −1X Z T V −1Z Z T V −1r
r T V −1X r T V −1Z r T V −1r
r = y − Xb 0
& ),
[g1 ]G , s = tr ( W1 (Z, Z)G s
& W (Z, r ),
[g 2 ]G , s = − W1 (Z, r )T G (19)
s 1
For the second derivatives, we first define the following simplification factors
H st
= 2 W1 (r, Z)G& W (Z, Z)G & W (Z, r ) − W (r, Z)G
&& W (Z, r )
G2 s 1 t 1 1 st 1
H sG 3 = W1 ( X, Z)G& W ( Z, X )
s 1
H stG 3 &
= 2 W ( X, Z)G W (Z, Z)G & W (Z, X) − W ( X, Z)G && W (Z, X)
1 s 1 t 1 1 st 1
∂W0
W0(1)s = −
∂ s
and
∂ 2 W0
W (2)st
=−
∂ s∂ t
0
W0 ( A, B) = A T R −1B
& R −1B
W0(1)s ( A, B) = A T R −1R s
= − A T [ ∂∂θ s R −1 (θ)]B
&& R −1 − R −1R
W0( 2)st ( A, B) = A T [R −1R & R −1R
& R −1 − R −1R
& R −1R
& R −1 ]B
st s t t s
= − A T [ ∂θ∂s θ t R −1 (θ)]B
~
The matrices A and B can be X , Z , Z or r , where
~
Z = ZM = Z(G −1 + Z T R −1Z )−1 , and
r = [I − X( X’ V −1X) −1 X’ V −1 ]y = y − Xb 0
~
Remark: The matrix (G −1 + Z T R −1Z) −1 involved in Z can be obtained by
st
Using these notations, the 1 derivatives of l k (θ) with respect to a parameter in
R are as follows,
16 MIXED
~
[g 2 ]R , s = − W0(1) s (r, r ) + 2W0 (r, Z) W0(1) s (Z, r )
(21)
~ ~
− W0 (r, Z) W0(1)s (Z, Z) W0 (Z, r )
[g 3 ]R , s = − tr(H sR 3 )
nd
To compute 2 derivatives w.r.t. θs and θt (of R), we need to consider the
following simplification factors.
& R −1R
H stR1 = − R −1R & + R −1R
&&
s t st
~ ~
H sR 2 = W0(1) s ( X, r ) + W0 ( X, Z) W0(1)s (Z, Z) W(Z, r )
~ ~
− W0 ( X, Z) W0(1)s (Z, r ) − W0(1) s ( X, Z) W0 (Z, r )
~
H sR 3 = W0(1)s ( X, X) − W0 ( X, Z) W0(1)s (Z, X)
~
− W0(1)s ( X, Z) W0 (Z, X)
~ ~
+ W0 ( X, Z) W0(1) s (Z, Z) W0 (Z, X)
MIXED 17
Based on these simplification terms, the entries of the Hessian matrices are given
by
&
H stGR1 = − W0(1)s ( Z, Z )G t
& ~
+ 2 W0 ( Z, Z )G t W0 ( Z, Z )
1( s )
~ ~
& W ( Z, Z
− W0(1)s ( Z, Z )W0 ( Z, Z )G t 0 )
× W0 (Z, r )
18 MIXED
× W0 (Z, X)
[H 3 ]GR , st = tr (H stGR 3 P − H sG 3 P H tR 3 P)
[ g ]s = [ g 1 ]s + [ g 2 ]s + [ g 3 ]s
[g]s = [g1 ]s + [g 2 ]s
th
and the (s,t) element of the Hessian matrix is given by
It should be noted that the Hessian matrices for the scoring algorithm in both ML
and REML are not ‘exact’. In order to speed up calculation, some second derivative
terms are dropped. Therefore, they are only used in intermediate step of
optimization but not for standard error calculations.
20 MIXED
During the estimation we need to construct several cross product matrices in each
iteration, namely: W0 (X; y; Z) , W1 (X; y; Z) , W0A (X; y; Z) ,
W1A (X; y; Z) , Wb0 (X; y) , and Wb1 (X; y) . The SWEEP operator (see for
example Goodnight (1979)) is used in constructing these matrices. Basically, the
SWEEP operator performs the following transformation
A B A− A−B
B’ C ⇒ − .
− B’ A C − B’ A − B
STEP 1:
Construct
X T R −1 X X T R −1 yX T R −1 Z
W0 (X; y; Z) = y T R −1 X y T R −1y y T R −1Z (24)
Z T R −1 X Z T R −1Z y T R −1y
STEP 2:
I + LT Z T R −1 ZL LT W0 (Z,⋅)
W (X; y; Z) =
A
(25)
W0 (⋅, Z)L
0
W0
T
where L is the lower-triangular Cholesky root of G, i.e. G=LL and W0(Z, . ) is the
rows of W0 corresponding to Z.
MIXED 21
STEP 3:
where
and
During the sweeping, if we accumulate the log of the i th diagonal element just
before the i th sweep, we will obtain
STEP 4:
X T V −1 X X T V −1y
Wb 0 ( X; y ) = T −1 . (28)
y V X y T V −1y
22 MIXED
( X T V −1 X ) − b0
Wb1 (X; y) = (29)
b T0 l 2 (θ)
Reference
Akaike, H. (1974), A New Look at the Statistical Model Identification, IEEE
Transaction on Automatic Control, AC -19, 716 -723.
Fai, A.H.T. and Cornelius, P.L. (1996), Approximate F-tests of Multiple Degree of
Freedom Hypotheses in Generalized Least Squares Analyses of Unbalanced
Split-plot Experiments, Journal of Statistical Computation and Simulation, 54,
363 -378.
Giesbrecht, F.G. and Burns, J.C. (1985), Two-Stage Analysis Based on a Mixed
Model: Large-sample Asymptotic Theory and Small-Sample Simulation
Results, Biometrics, 41, 477 -486.
Hurvich, C. M., and Tsai, C-L. (1989). Regression and Time Series Model
Selection in Small Samples, Biometrika 76, 297-307.
McLean, R.A. and Sanders, W.L. (1988), Approximating Degrees of Freedom for
Standard Errors in Mixed Linear Models, Proceedings of the Statistical
Computing Section, American Statistical Association, New Orleans, 50 -59.
MIXED 23
Wolfinger, R., Tobias, R., Sall J., (1994), Computing Gaussian likelihoods and
their derivatives for general linear mixed models. SIAM J. Sci. Comput., Vol.15,
No. 6, pp. 1294-1310.