Notation PDF

D.G.
Bonett (6/2018)
Matrix Notation and Operations
Matrix Notation
An r × c matrix is a rectangular array of elements with r rows and c columns. An r × c
matrix is said to be of order r × c. A matrix is usually denoted by a capital letter printed in
a boldface font (e.g., A, B, X). The elements of the matrix are represented by lower case
letters with a double subscript (e.g., 𝑎𝑗𝑘 , 𝑏𝑗𝑘 , 𝑥𝑗𝑘 ). For instance, in the matrix X, 𝑥13 is the
element in the first row and the third column. A 4 × 3 matrix X is shown below.
𝑥11 𝑥12 𝑥13

𝑥21 𝑥22 𝑥23
X = [𝑥 𝑥32 𝑥33 ]
31
𝑥41 𝑥42 𝑥43
A matrix with a single row is called a row vector and a matrix with a single column is
called a column vector. Vectors are usually represented by lower case letters printed in a
boldface font (e.g., a, b, x). The elements of the vector are represented by lower case letters
with a single subscript (e.g., 𝑎𝑗 , 𝑏𝑗 , 𝑥𝑗 ). A 3 × 1 column vector y and a 1 × 4 row vector h
are shown below.
𝑦1
𝑦
y = [ 2]
𝑦3
h = [ℎ1 ℎ2 ℎ3 ℎ4 ].
It is sometimes necessary to refer to a particular row or column of a matrix. These row or

column vectors are represented by a subscripted lower case letter in a boldface font. For
instance, the jth row vector in the above 4 × 3 matrix X would be noted as 𝐱𝑗 .
A square matrix has the same number of rows as columns. A square matrix where the
diagonal elements (the elements where the two subscripts are equal) are nonzero and the
off-diagonal elements are zero is called a diagonal matrix. A 3 × 3 diagonal matrix D is shown
below.
1
D.G. Bonett (6/2018)
𝑑11 0 0
D = [ 0 𝑑22 0 ]
0 0 𝑑33
The identity matrix is a special type of diagonal matrix where all diagonal elements are
equal to 1. The identity matrix is usually represented as I or In where n is the order of the
identity matrix.
A square matrix where the jkth element is equal to the kjth element is called a symmetric
matrix. A symmetric 3 × 3 matrix is shown below.
14 5 2
S = [ 5 20 8 ]
2 8 11
A one vector is a row or column vector in which every element is equal to 1 and is
represented as the number one printed in a boldface font. A 1 × 3 one vector is shown
below.
1 = [1 1 1]
Matrix Operations
The transpose of a matrix X is represented as X' (or XT). The transpose of a matrix is
obtained by interchanging the rows and columns of the matrix. For instance, if
4 6
X = [7 1]
3 9
then
4 7 3
X' = [ ].
6 1 9
Note that the jkth element in X is equal to the kjth element in X'. Most vectors in statistical
formulas are assumed to be column vectors. Row vectors, when needed, are obtained by
taking the transpose of a column vector.
2
If two matrices A and B are of the same order, the two matrices are then conformable for
addition or subtraction, and A + B is a matrix with element 𝑎𝑗𝑘 + 𝑏𝑗𝑘 in the jth row and the
kth column, as illustrated below for the sum of two 3 × 2 matrices.
𝑎11 𝑎12 𝑎13 𝑏11 𝑏12 𝑏13

A + B = [𝑎 𝑎22 𝑎23 ] + [𝑏21 ]
21 𝑏22 𝑏23
𝑎11 + 𝑏11 𝑎12 + 𝑏12 𝑎13 + 𝑏13

=[ ]
𝑎21 + 𝑎11 𝑎22 + 𝑏22 𝑎23 + 𝑏23
Likewise, A – B is a matrix with element 𝑎𝑗𝑘 – 𝑏𝑗𝑘 in the jth row and the kth column, as
illustrated below for the difference of two 3 × 2 matrices.
𝑎11 𝑎12 𝑎13 𝑏11 𝑏12 𝑏13

A – B = [𝑎 𝑎22 𝑎23 ] – [𝑏21 ]
21 𝑏22 𝑏23
𝑎11 − 𝑏11 𝑎12 − 𝑏12 𝑎13 − 𝑏13

=[ ]
𝑎21 − 𝑎11 𝑎22 − 𝑏22 𝑎23 − 𝑏23
To multiply a matrix by a scalar (i.e., a single number), simply multiply each element in
the matrix by the scalar. Scalars are represented by italicized lower case letters in a non-
boldface font. To illustrate, if b = 2 and
4 7 3
A=[ ]
6 1 9
then
8 14 6
bA = [ ].
12 2 18
Some statistical formulas involve the subtraction of a scalar from an n x 1 vector. The
result is obtained by first multiplying the scalar by an n x 1 one vector and then taking
the difference of the two vectors. For instance, if y is a 3 × 1 vector and m is a scalar, then
y – m is
𝑦1 1 𝑦1 − 𝑚
y – m1 = [ 2 ] − 𝑚 [1] = [ 2 − 𝑚]
𝑦 𝑦
𝑦3 1 𝑦3 − 𝑚
3
The dot product of a n × 1 vector a with a n × 1 vector b is
a'b = 𝑎1 𝑏1 + 𝑎2 𝑏2 + … + 𝑎𝑛 𝑏𝑛
Note that a'b = b'a. For instance, if a' = [4 3 2] and b' = [6 1 4], then a'b = 4(6) + 3(1) + 2(4)
= 35.
Two n × 1 vectors, a and b, are said to be orthogonal if a'b = 0. For instance, if a' = [.5 .5 -1]
and b' = [1 -1 0], then a and b are orthogonal because 𝐚'b = (.5)(1) + (.5)(-1) + (-1)(0) = 0.
Two matrices A and B can be multiplied if they are conformable for multiplication. To
compute the matrix product AB, the number of columns of A must equal the number of
rows of B. In general, if A is r × n and B is n × c, then the matrix product AB is an r × c
matrix. The jkth element in the r × c product matrix is equal to the dot product 𝐚𝑗 𝐛𝑘 where
𝐚𝑗 is the jth row vector of matrix A and 𝐛𝑘 is the kth column vector of matrix B. For instance,
the matrices A and B shown below are conformable for computing the product AB
because A is 2 × 3 and B is 3 × 4 so that the product will be a 2 × 4 matrix.
1 2 1 4
4 7 3
A= [ ] B = [5 4 3 1]
6 1 9
4 2 3 2
Each of the 2 × 4 = 8 elements of the AB matrix is a dot product. For instance, the element
in row 1 and column 1 of the product AB is
1
[4 7 3] [5] = 4(1) + 7(5) + 3(4) = 51
4
and the element in row 2 and column 3 of AB is
1
[6 1 9] [3] = 6(1) + 1(3) + 9(3) = 36.
3
After computing all 8 dot products, the following result is obtained.
51 42 34 29
AB = [ ]
47 34 36 43
4
Unlike scalar multiplication where ab = ba, the matrix product AB does not in general
equal BA. Regarding the matrix product AB, we can say that B is pre-multiplied by A or
that A is post-multiplied by B. The product of matrix A with itself is denoted as A2.
The transpose of a matrix product is equal to the product of the transposed matrices in
reverse order. Specifically, (AB)' = B'A'.
The product of three matrices ABC requires A and B to be conformable for multiplication
and also requires B and C to be conformable for multiplication. The product ABC can be
obtained by first computing AB and then post-multiplying the result by C, or by first
computing BC and then pre-multiplying the result by A.
If A is a square matrix, then the matrix inverse of A is represented as A-1. If the inverse of
A exists, then AA-1 = I. This result is a generalization of scalar arithmetic where x(1/x) = 1,
assuming x  0 so that the inverse of x exists. Computing a matrix inverse is tedious, and
the amount of computational effort increases as the size of the matrix increases, but
inverting a 2 × 2 matrix is not difficult. The inverse of a 2 × 2 matrix A is
𝑎22 −𝑎12
A-1 = (1/d)[ −𝑎
21 𝑎11 ]
where d = 𝑎11 𝑎22 − 𝑎12 𝑎21 is called the determinant of A. The matrix inverse does not exist
unless the determinant is nonzero.
Inverting a diagonal matrix D of any order is simple. The inverse of D is equal to a

diagonal matrix where the jth diagonal element is equal to 1/𝑑𝑗 .
The trace of a square n × n matrix A, denoted as tr(A), is defined as the sum of its n
diagonal elements.
tr(A) = 𝑎11 + 𝑎22 + … + 𝑎𝑛𝑛
5
For instance if
14 5 2
V = [ 9 20 8 ]
7 8 11
then tr(V) = 14 + 20 + 11 = 45.
The Kronecker product of two matrices, an m × n matrix A and a p × q matrix B, is defined

to be the mp × nq matrix
𝑎11 𝐁 𝑎12 𝐁 ⋯ 𝑎1𝑛 𝐁

𝑎 𝐁 𝑎22 𝐁 … 𝑎2𝑛 𝐁
A ⨂ B = [ 21 ]
⋮ ⋮ ⋮
𝑎𝑚1 𝐁 𝑎𝑚2 𝐁 ⋯ 𝑎𝑚𝑛 𝐁
which is obtained by replacing each element 𝑎𝑗𝑘 with the p × q matrix 𝑎𝑗𝑘 B. For example,
if
1 2
A=[ ] and b = [1 2 3]
3 −1
then
1 2 3 2 4 6
A⨂b=[ ].
3 6 9 −1 −2 −3
The Kronecker product of an identity matrix and another matrix has the following simple
form
𝐁 𝟎 ⋯ 𝟎
𝟎 𝐁 … 𝟎
I⨂B =[ ]
⋮ ⋮ ⋮
𝟎 𝟎 ⋯ 𝐁
where 𝟎 is a matrix of zeros that has the same order as B.
The transpose of a Kronecker product of two matrices is
(A ⨂ B)' = A' ⨂ B'
6
and the inverse of a Kronecker product of two matrices is
(A ⨂ B)-1 = A-1 ⨂ B-1.
The product of A ⨂ B and C ⨂ D is equal to
(A ⨂ B)(C ⨂ D) = AC ⨂ BD
assuming A and C are conformable for multiplication and B and D are conformable for
multiplication.
In some statistical formulas, it is convenient to rearrange the elements of an r × c matrix

A into an rc × 1 column vector a. This is done by stacking the c column vectors (which are
each r × 1) of A (𝐚1 , 𝐚2 , … , 𝐚𝑐 ) one under the other as shown below
𝐚1
𝐚2
a=[ ⋮ ]
𝐚𝑐
The conversion of a matrix A into a vector is denoted as vec(A). To illustrate, consider the
following 4 × 2 matrix Y and vec(Y). The mat(y) function returns the vector into its original
matrix form.
14
9
14 2 7
Y = [ 9 8] vec(Y) = 16 =y
7 11 2
16 34 8
11
[ 34 ]
The vectorization of a matrix triple product can be expressed as
vec(ABC) = (C′ ⨂ A)vec(B)
7
and the vectorization of a matrix product AB follows from this result by setting C = I
vec(AB) = (Ip ⨂ A)vec(B)
where p is the number of columns of B.
For two matrices A and B of the same order, matrix equality A = B indicates that 𝑎𝑗𝑘 = 𝑏𝑗𝑘
for every value of j and k. Matrix inequality A ≠ B indicates that there is at least one
element in A that is not equal to its corresponding element in B.
Covariance Matrices
A covariance matrix is a symmetric matrix with variances in the diagonal elements and
covariances in the off-diagonal elements. If r response variables y′ = [𝑦1 𝑦2 … 𝑦𝑟 ] are
measured for each person in a random sample of n people, the estimated variance for the
jth response variable is
2
𝜎̂𝑗2 = ∑𝑛𝑖=1(𝑦𝑖𝑗 − 𝜇̂ 𝑗 ) /(𝑛 − 1)
and the estimated covariance between the jth and kth response variables is
𝜎̂𝑗𝑘 = ∑𝑛𝑖=1(𝑦𝑖𝑗 − 𝜇̂ 𝑗 )(𝑦𝑖𝑘 − 𝜇̂ 𝑘 )/(𝑛 − 1).
The estimated covariance between the jth and kth measurements is also equal to 𝜌̂𝑗𝑘 𝜎̂𝑗 𝜎̂𝑘
where 𝜌̂𝑗𝑘 is the estimated Pearson correlation between the two response variables. Note
that 𝜎̂𝑗𝑘 = 𝜎̂𝑘𝑗 .
The r variances and the r(r – 1)/2 covariances of y′ = [𝑦1 𝑦2 … 𝑦𝑟 ] can be summarized in an
r × r covariance matrix denoted as cov(y). For instance, with r = 3 the covariance matrix is
𝜎̂12 𝜎̂12 𝜎̂13

cov(y) = [𝜎̂12 𝜎̂22 𝜎̂23] .
𝜎̂13 𝜎̂23 𝜎̂32
If there are n sets of response variables where the ith set has a covariance matrix Si, and
response variables from different sets are assumed to be uncorrelated, the covariance
matrix for all response variables has the following form
8
𝐒1 𝟎 ⋯ 𝟎
𝟎 𝐒2 … 𝟎
[ ]
⋮ ⋮ ⋮
𝟎 𝟎 ⋯ 𝐒𝑛
where each 0 represents a matrix of zeros. If all of the n covariance matrices are equal to
S, the above covariance matrix can then be expressed as I ⨂ S.
Variance of a Linear Function of Variables

A linear function of r variables can be expressed as ∑𝑟𝑗=1 ℎ𝑗 𝑦𝑗 where the ℎ𝑗 coefficients are
known numbers. This linear function can be expressed in matrix notation as h′y where
h′ = [ℎ1 ℎ2 … ℎ𝑟 ] and y′ = [𝑦1 𝑦2 … 𝑦𝑟 ]. The variance of h′y can be expressed in matrix
notation as h′cov(y)h. For instance, the variance of 𝑦1 + 𝑦2 + 𝑦3 is
𝜎̂12 𝜎̂12 𝜎̂13 1

[1 1 1][𝜎̂12 𝜎̂22 𝜎̂23 ] [1]
𝜎̂13 𝜎̂23 𝜎̂32 1
1
= [𝜎̂12 + 𝜎̂12 + 𝜎̂13 𝜎̂12 + 𝜎̂22 + 𝜎̂23 𝜎̂13 + 𝜎̂23 + 𝜎̂32 ] [1]
1
= 𝜎̂12 + 𝜎̂22 + 𝜎̂32 + 2𝜎̂12 + 2𝜎̂13 + 2𝜎̂23 .
̂ ) is similarly defined as
The variance of a linear function of parameter estimates (i.e., 𝒄′𝜷
̂ )c. For instance, the variance of 𝛽̂1 − 𝛽̂2 is
c′cov(𝜷
𝜎̂𝛽̂21
𝜎̂𝛽̂1 𝛽̂2 1 1
[1 −1][ 2 ] [ ] = [𝜎̂𝛽̂21 − 𝜎̂𝛽̂1 𝛽̂2 𝜎̂𝛽̂1 𝛽̂2 − 𝜎̂𝛽̂22 ] [ ]
𝜎̂𝛽̂1 𝛽̂2 𝜎̂𝛽̂2 −1 −1
= 𝜎̂𝛽̂21 − 𝜎̂𝛽̂1 𝛽̂2 − 𝜎̂𝛽̂1 𝛽̂2 + 𝜎̂𝛽̂22
= 𝜎̂𝛽̂2 + 𝜎̂𝛽̂2 − 2𝜎̂𝛽̂1 𝛽̂2 .

1 2
General Linear Model in Matrix Form

A general linear model (GLM) model for a random sample of n participants can be
expressed in matrix notation as follows
9
𝐲 = 𝐗𝜷 + 𝐞 (1.1)
where y is a n × 1 vector of response variable scores, 𝐞 is an n × 1 vector of prediction

errors with cov(e) = 𝜎𝑒2 𝐈𝑛 , 𝜷 is a (q + 1) × 1 vector of unknown population parameters, and
X is an n × (q + 1) design matrix. The first column of X is often an n × 1 vector of ones to
code the y-intercept and other q columns contain the values of q predictor variables.
To illustrate the structure of Equation 1.1, consider a model with q = 1 predictor variable
and a random sample of n = 7. The elements of 𝐲 = 𝐗𝜷 + 𝐞 for this example are shown
below.
𝑦1 1 𝑥11 𝑒1
𝑦2 1 𝑥12 𝑒2
𝑦3 1 𝑥13 𝑒3
𝛽0
𝑦4 = 1 𝑥14 [ ] + 𝑒4
𝛽
𝑦5 1 𝑥15 1 𝑒5
𝑦6 1 𝑥16 𝑒6
[𝑦7 ] [1 𝑥17 ] [ 𝑒7 ]
Using matrix notation, the ordinary least squares (OLS) estimate of 𝜷 is
̂ = (𝐗 ′ 𝐗)−1 𝐗′𝐲
𝜷 (1.2)
and the estimated residuals are
𝐞̂ = 𝐲 − 𝐲̂ (1.3)
̂ is a vector of predicted y scores. An estimate of the residual variance (𝜎𝑒2 )

where 𝐲̂ = 𝐗𝜷
is
𝑀𝑆E = 𝐞̂′ 𝒆̂/(𝑛 − 𝑞 − 1). (1.4)
̂ is 𝛽̂𝑗 . The standard error of 𝛽̂𝑗 is equal to the square root of the jth
The jth element of 𝜷
diagonal element of the following covariance matrix of 𝜷 ̂
̂ ) = 𝑀𝑆E (𝐗 ′ 𝐗)−1 .
cov(𝜷 (1.5)
A linear function of the elements of 𝜷 will be of interest in some applications A linear

function of 𝜷 can be expressed as 𝐜′𝜷 where c is a (q + 1) × 1 vector of specified numbers.
10
For instance, in a GLM with q = 2 predictor variables the linear function 𝛽1 + 5𝛽2 can be
expressed as 𝐜′𝜷 where 𝐜′ = [0 1 5]. The estimated standard error of 𝐜′𝜷 ̂ is
̂ )𝐜
𝑆𝐸𝐜′𝜷̂ = √𝐜′𝑐𝑜𝑣(𝜷 (1.6)
and a 100(1 − 𝛼)% confidence interval for 𝐜′𝜷 is
̂ ± 𝑡𝛼/2;𝑑𝑓 𝑆𝐸𝐜′𝜷̂
𝐜′𝜷 (1.7)
𝐸
where 𝑑𝑓𝐸 = n – q – 1. To obtain simultaneous Bonferroni confidence intervals for v

different linear functions of 𝜷, replace 𝛼 with 𝛼 ∗ = 𝛼/𝑣 in Equation 1.7.
A special case of Equation 1.7 is the following confidence interval for the jth element in 𝜷
𝛽̂𝑗 ± 𝑡𝛼/2;𝑑𝑓𝐸 𝑆𝐸𝛽̂𝑗 (1.8)
where
𝑀𝑆
E
𝑆𝐸𝛽̂𝑗 = √ (1.9)
̂𝑥2 .𝐱 )𝜎
(1 − 𝜌 ̂𝑥2 (𝑛 − 1)
𝑗 𝑗
and 𝜌̂𝑥2𝑗.𝐱 is the estimated squared multiple correlation between predictor variable j and
all other predictor variables.
Another special case of Equation 1.7 is the following confidence interval for the difference
between any two elements in 𝜷, such as 𝛽1 and 𝛽2
𝛽̂1 − 𝛽̂2 ± 𝑡𝛼/2;𝑑𝑓𝐸 √𝑆𝐸𝛽̂1 2 + 𝑆𝐸𝛽̂2 2 + 𝑐𝑜𝑣(𝛽̂1 , 𝛽̂2 ) (1.10)
where 𝑐𝑜𝑣(𝛽̂1 , 𝛽̂2 ) is obtained from Equation 1.5.
In a random-x GLM (where all q predictor variables are random), the squared multiple
2
correlation (denoted as 𝜌𝑦.𝐱 ) describes the proportion of variance in the y scores that is
2
predictable from all q predictor variables. An estimate of 𝜌𝑦.𝐱 (reported as “R-squared” in
most statistical packages) is
2
𝜌̂𝑦.𝐱 = 1 − 𝑆𝑆E /𝑆𝑆T . (1.11)
11
where 𝑆𝑆E = 𝐞̂′ 𝒆̂ and 𝑆𝑆T = (n – 1) 𝜎̂𝑦2 . The fact that the multiple correlation (𝜌𝑦.𝒙 ) is equal
to the Pearson correlation between y and the predicted y scores is helpful in interpreting
2
the multiple correlation and squared multiple correlation. A confidence interval for 𝜌𝑦.𝐱
does not have a simple formula but it can be computed using SAS or R.
In a fixed-x GLM (where all the predictor variables are fixed), the following estimate of
the coefficient of multiple determination
𝜂̂ 2 = 1 − 𝑆𝑆E /𝑆𝑆T (1.12)
is an estimate of 𝜂2 which is equal to 𝜌̂𝑦.𝐱

2 2
. Like 𝜌̂𝑦.𝐱 , 𝜂̂ 2 has a positive bias and the bias can
be substantial when n – q is small. A less biased estimate of 𝜂2 (confusingly called omega-
squared) is available in SAS. Although 𝜂2 = 𝜌𝑦.𝐱
2
and 𝜂̂ 2 = 𝜌̂𝑦.𝐱
2
, different symbols are used
in the random-x and fixed-x models because 𝜂̂ 2 and 𝜌̂𝑦.𝐱
2
have different sampling
distributions, and a confidence interval for 𝜂2 in the fixed-x model will be different than
2
a confidence interval for 𝜌𝑦.𝐱 in the random-x model. The confidence interval for 𝜂2 is
complicated but it can be obtained in SAS or R.
Multivariate General Linear Model
Some studies will involve r ≥ 2 response variables. A GLM for a random sample of n
participants, with the same set of q predictor variables for each response variable, can be
specified for each of the r response variables
𝐲1 = 𝐗𝜷𝟏 + 𝐞1 𝐲2 = 𝐗𝜷𝟐 + 𝐞2 … 𝐲𝑟 = 𝐗𝜷𝒓 + 𝐞𝑟
and these r models can be combined into the following multivariate general linear model
(MGLM)
Y = XB + E
where Y = [𝐲1 𝐲2 … 𝐲𝑟 ] is an n × r matrix of response variables, X is an n × (q + 1) design

matrix, B = [𝜷𝟏 𝜷𝟐 … 𝜷𝒓 ] is a (q + 1) × r matrix of unknown parameters, and E =
[𝐞1 𝐞2 … 𝐞𝑟 ] is an n × r matrix of prediction errors. Note that the design matrix in the
MGLM is the same as the design matrix in each GLM. The r columns of E are assumed to
be correlated and the r × r covariance matrix of the r columns of E will be denoted as S.
12
The independence assumption of the GLM is also required in the MGLM and this
assumption implies that the rows of E are uncorrelated.
The OLS estimate of B in the MGLM is
̂ = (𝐗 ′ 𝐗)−1 𝐗′𝐘,
𝐁 (1.17)
and the matrix of predicted scores is
̂ = X𝐁
𝐘 ̂. (1.18)
The matrix of estimated prediction errors (residuals) is
𝐄̂ = Y – 𝐘
̂, (1.19)
and an estimate of the covariance matrix S is
𝐒̂ = 𝐄̂′𝐄̂/(n – q – 1). (1.20)
̂ ∗ = vec(𝐁
Let 𝜷 ̂ ∗ is
̂ ). The covariance matrix of 𝜷
−1
̂ ∗ ) = [(𝐈 𝑟 ⊗ 𝐗)′(𝐒̂ ⊗ 𝐈𝑛 )−1 (𝐈𝑟 ⊗ 𝐗)]
cov(𝜷
= 𝐒̂ ⊗ (𝐗 ′ 𝐗)−1. (1.21)
A multivariate linear contrast of the B parameter matrix in the MGLM can be expressed as
c′Bh where c is a (q + 1) × 1 vector of researcher-specified coefficients and h is an r × 1
vector of researcher-specified coefficients. To illustrate the specification of multivariate
linear contrasts, consider the parameter matrix of the MGLM given above with q = 3
predictor variables and r = 2 response variables. In this model, B has four rows and two
columns. Each column of B contains the y-intercept and the three slope coefficients for a
single response variable. The 4 × 2 parameter matrix for this model is given below.
𝛽01 𝛽02
𝛽 𝛽12
B = [ 11 ]
𝛽21 𝛽22
𝛽31 𝛽32
13
Some examples will illustrate the specification of c and h. To estimate 𝛽11 − 𝛽21, set c′
= [0 1 -1 0] and h′ = [1 0]. To estimate 𝛽11 − 𝛽12, set c′ = [0 1 0 0] and h′ = [1 -1]. To
estimate (𝛽11 − 𝛽21 ) − (𝛽21 − 𝛽22 ), set c′ = [0 1 -1 0] and h′ = [1 -1].
A 100(1 – 𝛼)% confidence interval for c′Bh is
̂ h ± 𝑡𝛼/2;𝑑𝑓 √(𝐡′𝐒̂𝐡)𝐜 ′ (𝐗 ′ 𝐗)−1 𝐜

c′𝐁 (1.22)
where df = n – q – 1 and √(𝐡′𝐒̂𝐡)𝐜 ′ (𝐗 ′ 𝐗)−1 𝐜 is the estimated standard error of c′𝐁

̂ h. To
obtain simultaneous Bonferroni confidence intervals for v different multivariate linear
contrasts of B, replace 𝛼 with 𝛼 ∗ = 𝛼/v in Equation 1.22. A confidence interval for c′Bh can
be used to test H0: c′Bh = 0 against H1: c′Bh > 0 and H2: c′Bh < 0 using the following three-
decision rule:
• If the upper limit of the confidence interval for c′Bh is less than 0, then H0 is rejected
and H2: c′Bh < 0 is accepted.
• If the lower limit of the confidence interval for c′Bh is greater than 0, then H0 is rejected
and H1: c′Bh > 0 is accepted.
• If the confidence interval for c′Bh includes 0, then H0 cannot be rejected and the results
are inconclusive.
̂ h/√(𝐡′𝐒̂𝐡)𝐜 ′ (𝐗 ′ 𝐗)−1 𝐜 and its

Some computer programs will compute the test statistic c′𝐁
associated p-value. The p-value is used to decide if H0 can be rejected.
Although Equations 1.17 - 1.22 are elegant in their generality, all of these results can be
obtained from GLM results. Specifically, the jth column of 𝐁 ̂ in Equation 1.17 can be
̂ 𝑗 = (X′X)-1X′𝐲𝑗 (Equation 1.2), and for each column of 𝐁
computed using 𝜷 ̂ , the standard
errors can be computed from MS𝐸𝑗 (X′X)-1 (Equation 1.5) where MS𝐸𝑗 is the jth diagonal
element of 𝐒̂. Thus, the least squares estimates of B and their standard errors can be
obtained by simply analyzing each of the r response variables separately using a GLM.
To obtain the results for Equation 1.22 using GLM results, it can be shown that the
confidence interval for c′𝜷 (Equation 1.7) is identical to Equation 1.22 if the response
variable in the GLM is defined as y = ℎ1 𝑦1 + ℎ2 𝑦2 + … + ℎ𝑟 𝑦𝑟 .
14
Constrained MGLM
The MGLM can be expressed in the following vector form
y* = X*𝜷* + e*
where y*= vec(Y), X* = (𝐈𝑟 ⊗ X), 𝜷*= vec(B), e* = vec(E), and cov(e*) = 𝐒 ⊗ 𝐈𝑛 . Note that the
order of 𝜷* is r(q + 1) × 1 and X* has the following block-diagonal form
𝐗 𝟎 ⋯ 𝟎
𝟎 𝐗 … 𝟎
X* = [ ].
⋮ ⋮ ⋮
𝟎 𝟎 ⋯ 𝐗
This design matrix for the vector form of the MGLM shows that all q predictor variables
are related to all r response variables. Now suppose the predictor variables are not the
same for each response variable. A MGLM with one or more slope coefficients
constrained to equal 0 is referred to in the statistics and economics literature as a seemingly
unrelated regression (SUR) model.
With response variable 𝑦𝑗 having its own set of 𝑞𝑗 predictor variables, the SUR model can
be expresses as y* = X*𝜷* + e* where 𝜷* is a (𝑞1 + 𝑞2 + … + 𝑞𝑟 + r) × 1 vector and X* has the
following block structure form
𝐗𝟏 𝟎 ⋯ 𝟎
𝟎 𝐗𝟐 … 𝟎
X* = [ ].
⋮ ⋮ ⋮
𝟎 𝟎 ⋯ 𝐗𝒓
The prediction errors in the SUR model have the covariance structure cov(e*) = 𝐒 ⊗ 𝐈𝑛
and the following generalized least squares (GLS) estimate of 𝜷* is typically used
̂ ∗𝐺𝐿𝑆 = [𝐗 ∗ ′ (𝐒̂ −1 ⊗ 𝐈𝑛 )𝐗 ∗ ]−1 𝐗 ∗ ′(𝐒̂ −1 ⊗ 𝐈𝑛 )𝐲 ∗

𝜷 (1.23)
where 𝐒̂ is an estimate of S. The traditional method of estimating S is to first compute a

OLS estimate of 𝜷* (𝜷 ̂ ∗ = (𝐗 ∗ ′ 𝐗 ∗ )−1 𝐗 ∗ ′𝐲 ∗ ), compute the vector of estimated prediction
̂ ∗ ), convert the vector 𝐞̂ into its matrix form (𝐄̂ = mat(𝐞̂)), and then estimate
errors (𝐞̂ = y – X𝜷
S as
𝐒̂ = 𝐃𝐄̂′𝐄̂𝐃 (1.24)
15
where D is a diagonal matrix with √1/(𝑛 − 𝑞𝑗 − 1) in the jth diagonal position and 𝑞𝑗 is
the number of predictors of response variable j. Equations 1.20 and 1.24 are unstructured
covariance matrices with r estimated variances and r(r – 1)/2 estimated covariances.
̂ ∗𝐺𝐿𝑆 is
The estimated covariance matrix of 𝜷
̂ ∗𝐺𝐿𝑆 ) = [𝐗 ′ (𝐒̂ −1 ⊗ 𝐈𝑛 )𝐗]−1

cov(𝜷 (1.25)
̂ ∗𝐺𝐿𝑆 ) are the standard errors of the

and the square roots of the diagonal elements of cov(𝜷
GLS estimates. An approximate 100(1 – 𝛼)% confidence interval for a′𝜷∗ is
̂ ∗𝐺𝐿𝑆 ± 𝑧𝛼/2 √𝐚′𝑐𝑜𝑣(𝜷

a′𝜷 ̂ ∗𝐺𝐿𝑆 )𝐚 (1.26)
and an approximate confidence interval for 𝛽𝑗∗ (the jth element in 𝜷∗ ) is obtained by
specifying the jth element of a to equal 1 and all other values equal to 0. Equation 1.26 is
approximate because it uses a GLS estimate of 𝜷∗ that is biased in small samples and also
̂ ∗𝐺𝐿𝑆 is not accurately approximated by a normal
because the sampling distribution of a′𝜷
distribution in small samples.
16

Notation PDF

Uploaded by

Copyright:

Available Formats

You might also like

Notation PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Notation PDF

Uploaded by

Copyright:

Available Formats

D.G.

Matrix Notation and Operations

𝑥11 𝑥12 𝑥13

It is sometimes necessary to refer to a particular row or column of a matrix. These row or

𝑎11 𝑎12 𝑎13 𝑏11 𝑏12 𝑏13

𝑎11 + 𝑏11 𝑎12 + 𝑏12 𝑎13 + 𝑏13

𝑎11 𝑎12 𝑎13 𝑏11 𝑏12 𝑏13

𝑎11 − 𝑏11 𝑎12 − 𝑏12 𝑎13 − 𝑏13

The dot product of a n × 1 vector a with a n × 1 vector b is

After computing all 8 dot products, the following result is obtained.

Inverting a diagonal matrix D of any order is simple. The inverse of D is equal to a

tr(A) = 𝑎11 + 𝑎22 + … + 𝑎𝑛𝑛

then tr(V) = 14 + 20 + 11 = 45.

The Kronecker product of two matrices, an m × n matrix A and a p × q matrix B, is defined

𝑎11 𝐁 𝑎12 𝐁 ⋯ 𝑎1𝑛 𝐁

where 𝟎 is a matrix of zeros that has the same order as B.

The transpose of a Kronecker product of two matrices is

(A ⨂ B)' = A' ⨂ B'

and the inverse of a Kronecker product of two matrices is

(A ⨂ B)-1 = A-1 ⨂ B-1.

The product of A ⨂ B and C ⨂ D is equal to

In some statistical formulas, it is convenient to rearrange the elements of an r × c matrix

The vectorization of a matrix triple product can be expressed as

vec(ABC) = (C′ ⨂ A)vec(B)

vec(AB) = (Ip ⨂ A)vec(B)

where p is the number of columns of B.

𝜎̂𝑗𝑘 = ∑𝑛𝑖=1(𝑦𝑖𝑗 − 𝜇̂ 𝑗 )(𝑦𝑖𝑘 − 𝜇̂ 𝑘 )/(𝑛 − 1).

𝜎̂12 𝜎̂12 𝜎̂13

Variance of a Linear Function of Variables

𝜎̂12 𝜎̂12 𝜎̂13 1

= 𝜎̂𝛽̂21 − 𝜎̂𝛽̂1 𝛽̂2 − 𝜎̂𝛽̂1 𝛽̂2 + 𝜎̂𝛽̂22

= 𝜎̂𝛽̂2 + 𝜎̂𝛽̂2 − 2𝜎̂𝛽̂1 𝛽̂2 .

General Linear Model in Matrix Form

where y is a n × 1 vector of response variable scores, 𝐞 is an n × 1 vector of prediction

Using matrix notation, the ordinary least squares (OLS) estimate of 𝜷 is

and the estimated residuals are

̂ is a vector of predicted y scores. An estimate of the residual variance (𝜎𝑒2 )

𝑀𝑆E = 𝐞̂′ 𝒆̂/(𝑛 − 𝑞 − 1). (1.4)

A linear function of the elements of 𝜷 will be of interest in some applications A linear

and a 100(1 − 𝛼)% confidence interval for 𝐜′𝜷 is

where 𝑑𝑓𝐸 = n – q – 1. To obtain simultaneous Bonferroni confidence intervals for v

𝛽̂𝑗 ± 𝑡𝛼/2;𝑑𝑓𝐸 𝑆𝐸𝛽̂𝑗 (1.8)

𝛽̂1 − 𝛽̂2 ± 𝑡𝛼/2;𝑑𝑓𝐸 √𝑆𝐸𝛽̂1 2 + 𝑆𝐸𝛽̂2 2 + 𝑐𝑜𝑣(𝛽̂1 , 𝛽̂2 ) (1.10)

where 𝑐𝑜𝑣(𝛽̂1 , 𝛽̂2 ) is obtained from Equation 1.5.

𝜂̂ 2 = 1 − 𝑆𝑆E /𝑆𝑆T (1.12)

is an estimate of 𝜂2 which is equal to 𝜌̂𝑦.𝐱

Multivariate General Linear Model

𝐲1 = 𝐗𝜷𝟏 + 𝐞1 𝐲2 = 𝐗𝜷𝟐 + 𝐞2 … 𝐲𝑟 = 𝐗𝜷𝒓 + 𝐞𝑟

where Y = [𝐲1 𝐲2 … 𝐲𝑟 ] is an n × r matrix of response variables, X is an n × (q + 1) design

The OLS estimate of B in the MGLM is

and the matrix of predicted scores is

The matrix of estimated prediction errors (residuals) is

and an estimate of the covariance matrix S is

𝐒̂ = 𝐄̂′𝐄̂/(n – q – 1). (1.20)

A 100(1 – 𝛼)% confidence interval for c′Bh is

̂ h ± 𝑡𝛼/2;𝑑𝑓 √(𝐡′𝐒̂𝐡)𝐜 ′ (𝐗 ′ 𝐗)−1 𝐜

where df = n – q – 1 and √(𝐡′𝐒̂𝐡)𝐜 ′ (𝐗 ′ 𝐗)−1 𝐜 is the estimated standard error of c′𝐁

̂ h/√(𝐡′𝐒̂𝐡)𝐜 ′ (𝐗 ′ 𝐗)−1 𝐜 and its