Chap5 - Multivariate Regression and Linear Model

STAT7005 Multivariate Methods
5. Multivariate Regression
5 Multivariate Regression and Linear Model
5.1 Univariate Linear Model

The objective of using univariate linear models is to predict a response (dependent)
variable Y based on a group of explanatory (independent) variables, X1 , . . . , X k .
Hence, the data structure is
individual Y X1 X2 ··· Xk
1 y1 x11 x12 ··· x1k
2 y2 x21 x22 x2k
. . . . .
. . . . .
. . . . .
n yn xn1 xn2 ··· xnk
Assume that the model is linear in β1 , . . . , βk
Y = β1 X1 + β2 X2 + · · · + βk Xk + ε, ε ∼ N (0, σ 2 ).
Then, we have
y1 = β1 x11 + β2 x12 + · · · + βk x1k + ε1

.
.
.
yn = β1 xn1 + β2 xn2 + · · · + βk xnk + εn

iid
where ε1 , . . . , εn ∼ N (0, σ 2 ), or in matrix form,
      
y1 x11 · · · x1k β1 ε1
 ..   .. .   .  . 
.   .  +  . .
 . = .

. . .
yn xn1 · · · xnk βk εn
y = X β + ε
n×1 n×k k×1 n×1
The maximum likelihood estimator for β is
β̂ = (X 0 X)−1 X 0 y .
Remarks
1. Y is continuous and Xi 's are assumed to be non-random. Or when the Xi 's are
also random, we treat it as a conditional model given the values of the Xi 's.
Indeed, if(Y, X1 , . . . , Xk ) is (k + 1)-variate normal, we see from Property 7(d)
in Section 2.1 that the conditional mean E(Y |X1 , . . . , Xk ) is really a linear
function of the given values of X1 , . . . , Xk .
2. When all Xi 's are continuous, it is called (simple, multiple) regression.
3. When all Xi 's are dummy variables for discrete variables, it is called (one-way,
multi-way) ANOVA.
4. When some are continuous and some are dummy, it is called ANCOVA.
HKU STAT7005 1
5.2 Multivariate Linear Model

The objective of using multivariate linear models is to predict simultaneously
more than one response (dependent) variable Y based on a group of explanatory
(independent, predictor) variables, X1 , . . . , X k .
Hence, the data structure is
individual Y1 Y2 ··· Yp X1 X2 ··· Xk

1 y11 y12 ··· y1p x11 x12 ··· x1k
2 y21 y22 ··· y2p x21 x22 x2k
. . . . . . .
. . . . . . .
. . . . . . .
n yn1 yn2 ··· ynp xn1 xn2 ··· xnk
Assume that the model is linear in β1, . . . , βk .

 
Y1 = β11 X1 + β21 X2 + · · · + βk1 Xk + ε1 ε1
Y2 = β12 X1 + β22 X2 + · · · + βk2 Xk + ε2  ε2 
, where  ..  ∼ Np (0, Σ).
 
.
.
.  . 
Yp = β1p X1 + β2p X2 + · · · + βkp Xk + εp εp
Then, we have, in matrix form,
      
y11 · · · y1p x11 · · · x1k β11 · · · β1p ε11 · · · ε1p
. . . . . . . .
. . = . . . . + . . .
      
 . . . .  . . . .
yn1 · · · ynp xn1 · · · xnk βk1 · · · βkp εn1 · · · εnp
β 01
       
y 01 x11 · · · x1k ε01
. . . . .
. = . . . + .
       
 .  . .   .   . 
y 0n xn1 · · · xnk β 0k ε0n
Y = X B + U
n×p n×k k×p n×p
where ε0j = (εj1 εj2 · · · εjp ) denotes the vector of errors of observation j for j =
iid
1, . . . , n; ε1 , . . . , εn ∼ Np (0, Σ).
Remarks
1. Yi 's are continuous and Xi 's are assumed to be non-random. Or when the Xi 's
are also random, we treat it as a conditional problem given the values of the
Xi 's.
2. When all Xi 's are continuous, it is called multivariate regression.
HKU STAT7005 2
3. When all Xi 's are dummy variables for discrete variables, it is called (one-way,
multi-way) MANOVA.
4. When some are continuous and some are dummy, it is called MANCOVA.
Notation
Consider the response matrix Y = (yij ), where yij is the j th response from the ith
observational unit (i = 1, . . . , n; j = 1, . . . , p),
0
(1) (p)
Y = y ,...,y = y1 , . . . , yn ,
n×p n×1 n×1 p×1 p×1
and a k × p matrix B = (β`j ) where β`j is the `th regression coecient for the j th
response ` = 1, . . . , k ; j = 1, . . . , p). According to the partition of the matrix Y , y (j)
0
represents the observations for the j th response (j = 1, . . . , p) while y i represents
(j)
the ith observations of responses (i = 1, . . . , n). Simply speaking, y and y i are
corresponding to partition the matrix Y into columns and rows respectively. The
general multivariate linear model can be written as follows,
Y = X B + U , (5.2.1)
n×p n×k k×p n×p
iid
where U = (ε(1) , . . . , ε(p) ) = (ε1 , . . . , εn )0 and ε1 , . . . , εn ∼ Np (0, Σ). Based on the
denition of U ,  
ε(1)
 ε(2) 
Vec(U ) =  .  .
 
.
 . 
ε(p)
Clearly,
E(Vec(U )) = 0
np×1
and Var(Vec(U )) = E(Vec(U )Vec(U )0 )

 
(1) (1)0
E(ε ε ) E(ε(1) ε(2)0 ) · · · E(ε
(1) (p)0
ε )
 E(ε(2) ε(1)0 ) E(ε(2) ε(2)0 ) · · · (2) (p)0 
E(ε ε )
=

. . .
. . .

 . . . 
(p) (1)0 (p) (2)0 (p) (p)0
E(ε ε ) E(ε ε ) · · · E(ε ε )
 
σ11 I σ12 I · · · σ1p I
 σ21 I σ22 I · · · σ2p I 
=  ..
 
. .
. .

 . . . 
σp1 I σp2 I · · · σpp I
= Σ⊗I
where ⊗ is the kronecker product operator.
HKU STAT7005 3
5.3 Estimation of B and Σ

The following two methods give the same estimator of B. Multi-normality
assumption is used in the second method.
1. Least Squares Estimation
The model combines p univariate linear models into a single multivariate linear
model. If the n × k matrix X has full rank and we apply the Least Squares
(LS) principle separately to the j th column of Y and the j th column of B ,
noting that the rows are independent, we shall obtain the LS estimator, which
are unbiased, for columns of B. Putting these columns together, we have
B̂ = (X 0 X)−1 X 0 Y (5.3.1)
and the estimator of Σ is
1 0
S= Û Û , Û = Y − X B̂ ,
n−k
where Û is called the residual matrix.
2. Maximum Likelihood Estimation
If we further assume multivariate normality, B̂ is shown to be the MLE of B

and the MLE of Σ is
1 0
Σ̂ = Û Û . (5.3.2)
n
Note that Σ̂ is a biased estimator for Σ.
indep't
Proof: Since yi ∼ Np (µi , Σ) where µ0i is the ith row of E(Y ) = XB , the
log-likelihood function is
n
np n 1X
`(B, Σ) = − log(2π) − log |Σ| − (y − µi )0 Σ−1 (y i − µi )
2 2 2 i=1 i
np n 1 −1 0
=− log(2π) − log |Σ| − tr[(Y − XB)Σ (Y − XB) ]
2 2 2
np n 1 −1 0
= − log(2π) − log |Σ| − tr[Σ (Y − XB) (Y − XB)].
2 2 2
By the method of completing squares,
(Y − XB)0 (Y − XB) = (Y − X B̂)0 (Y − X B̂) + (X B̂ − XB)0 (X B̂ − XB).
Therefore, we have
np n 1
`(B, Σ) = − log(2π) − log |Σ| − tr[Σ−1 (Y − X B̂)0 (Y − X B̂)]
2 2 2
1
− tr[X(B̂ − B)Σ−1 (B̂ − B)0 X 0 ]
2
HKU STAT7005 4
np n 1
=− log(2π) − log |Σ| − tr[Σ−1 (Y − X B̂)0 (Y − X B̂)]
2 2 2
n
1 X 0 −1
− a Σ ai ,
2 i=1 i
where a0i is the ith row of X(B̂ − B). Since a0i Σ−1 ai ≥ 0 for any B, we have
np n n 0
`(B, Σ) ≤ `(B̂, Σ) = − log(2π) − log |Σ| − tr[Σ−1 Û Û /n],
2 2 2
where Û = Y − X B̂ . Then, as in Section 2.2, the result follows by applying the

0
minimization in (2.2.3) with Σ = Û Û /n
0
!
Û Û
`(B, Σ) ≤ `(B̂, Σ) ≤ ` B̂, .
n
5.4 Properties of B̂ and S (or Σ̂)

(1) (p)
1. Let B̂ = (β̂ , . . . , β̂ ). Then, B̂ has a matrix normal distribution (i.e.
stacked-up kp × 1 column vector has a multivariate normal distribution) with
(i) (j)
E(B̂) =B and Cov(β̂ , β̂ ) = σij (X 0 X)−1
where σij denotes the (i, j)th element of Σ. In other words,
Vec(B̂) ∼ Nkp (Vec(B), Σ ⊗ (X 0 X)−1 ).
Proof:
(i)
Since y (i) = Xβ (i) + ε(i) and β̂ can be obtained by the regression of the
response Yi on X1 , . . . , Xk , we have
(i)
β̂ = (X 0 X)−1 X 0 y (i)
= (X 0 X)−1 X 0 (Xβ (i) + ε(i) )
= β (i) + (X 0 X)−1 X 0 ε(i)
(i)
E(β̂ ) = β (i) for i = 1, . . . , p.
Hence,
 (1) 
β (1)
 
β̂
 (2)   (2) 
 β̂   β 
E(Vec(B̂)) = E  .  =  .  = Vec(B).
 .   .. 
 . 
β̂
(p) β (p)
HKU STAT7005 5
Similar to Var(Vec(U )),
 (1) (1) (1) (2) (1) (p) 

Cov(β̂ , β̂ ) Cov(β̂ , β̂ ) · · · Cov(β̂ , β̂ )
 Cov(β̂ (2) , β̂ (1) ) (2) (2) (2) (p) 

Cov(β̂ , β̂ ) · · · Cov(β̂ , β̂ ) 
Var(Vec(B̂)) =  . . .

 . . . 
 . . . 
(p) (1) (p) (2) (p) (p)
Cov(β̂ , β̂ ) Cov(β̂ , β̂ ) · · · Cov(β̂ , β̂ )
σ11 (X 0 X)−1 σ12 (X 0 X)−1 · · · σ1p (X 0 X)−1
 
 σ21 (X 0 X)−1 σ22 (X 0 X)−1 · · · σ2p (X 0 X)−1 
=
 
. . .
. . .

 . . . 
0 −1 0 −1 0 −1
σp1 (X X) σp2 (X X) · · · σpp (X X)
= Σ ⊗ (X 0 X)−1
where
(i) (j)
Cov(β̂ , β̂ ) = (X 0 X)−1 X 0 Cov(ε(i) , ε(j) )X(X 0 X)−1
= (X 0 X)−1 X 0 σij I X(X 0 X)−1
= σij (X 0 X)−1 for i, j = 1, . . . , p.
The multivariate normal distribution of Vec(B̂) follows from the distribution

of Vec(Y ) directly.
2. The residual CSSP matrix
0
E = Û Û = (Y − X B̂)0 (Y − X B̂)
= Y 0 (I − X(X 0 X)−1 X 0 )Y ∼ Wp (n − k, Σ). (5.4.1)
Hence, an unbiased estimator of Σ is
E 1
S= = (Y 0 Y − Y 0 X(X 0 X)−1 X 0 Y ). (5.4.2)
n−k n−k
Proof: Use property 6(a) of Wishart distribution in Section 2.3, with A =
I − X(X 0 X)−1 X 0 and M = E(Y ) = XB .
Consequently, the estimated variance of Vec(B̂) is
\
Var(Vec(B̂)) = S ⊗ (X 0 X)−1 .
3. B̂ and E (or S or Σ̂) are independent.
Proof: Use Property 6(c) of Wishart distribution in Section 2.3 by showing

0 −1 0 0 −1
that AC = 0 where A = I − X(X X) X and C = X(X X) .
N.B. In the case of multiple linear regression (i.e. p = 1 ), β̂ and s2 are

independent.
HKU STAT7005 6
4. The estimation of parameter coecients B is just a horizontal concatenation

of the results of multiple linear regression for each response on all regressors.
The complexity of multivariate regression problem arises from the correlated
estimated parameter coecients B̂ across the equations and this is caused by
the correlated structure of errors in U. Thus, the hypothesis testing becomes
more complicated than the case of multiple linear regression.
5.5 Test of Hypotheses about B

1. Suppose a linear hypothesis is based on all p variables,
H0 : LB = 0, where L is c × k, c ≤ k and rank(L) = c (5.5.1)
where the rank condition for matrix L ensures no redundant hypothesis. Then,
we can obtain the following results
(a) the p×p symmetric `hypothesis' matrix
H = (LB̂)0 {L(X 0 X)−1 L0 }−1 (LB̂)

= Y 0 X(X 0 X)−1 L0 {L(X 0 X)−1 L0 }−1 L(X 0 X)−1 X 0 Y
∼ Wp (c, Σ) under H0 (5.5.2)
(b) E and H are independent.
Therefore, the hypothesis (5.5.1) can be tested on the basis of the eigenvalues
−1
of E H , where E is given by (5.4.1) and H by (5.5.2) (See Section 4.2 for
the four commonly used statistics in which Λ and R are used for LRT and
UIT respectively).
Actually, this hypothesis is based on the restriction on all p variables because

each column of B represents the eects of Xi 's on the corresponding response
Y (c.f. B = (β (1) β (2) · · · β (p) )).
2. In general, the hypothesis (5.5.1) for sample structure of the observational

units may be based not on all the original p variables, but only based on
some linear transformed variables which can be represented by a matrix
multiplication on the original variables. That is, the testing problem is based
on the new n × m data matrix Y M , where M is p × m; m ≤ p and rank(M )
= m (c.f. Y = (y (1) y (2) · · · y (p) )). This happens in applications such as
prole analysis and growth curve models. In this case, the hypothesis (5.5.1)
takes the more general form
H0 : LBM = 0. (5.5.3)
Substituting YM for Y in (5.4.1) and (5.5.2), the hypothesis (5.5.3) can be

tested based on the following results.
HKU STAT7005 7
(a)
0
H = M 0 B̂ L0 {L(X 0 X)−1 L0 }−1 LB̂M ∼ Wm (c, M 0 ΣM ) under H0 ,
(5.5.4)
(b)
E = (Û M )0 (Û M ) ∼ Wm (n − k, M 0 ΣM ), (5.5.5)
where
Û = residual matrix from regression of Y on X

and Û M = residual matrix from regression of Y M on X ,
(c) E and H are independent.
Special Case: When M = I, no transformation is applied to the response

matrix Y.
We can then use the statistics in Section 4.2.
5.6 Prediction from Multivariate Regressions

Basically, there are two prediction problems in the multivariate regressions:
1. Predict mean response E(y 0 ) corresponding to xed values x0 =

(x01 , x02 , · · · , x0k )0 of the predictor variables.
Since y 0 = B 0 x 0 + ε0 with E(ε0 ) = 0,
E(y 0 ) = B 0 x0 .
The unbiased estimator is given by
0
\
E (y 0 ) = B̂ x0
and its variance is
0
\
Var(E (y 0 )) = Var(B̂ x0 )
 (1) 
x00 β̂
 0 (2) 
 x0 β̂ 
= Var 
 .. 

 . 
(p)
x00 β̂
(1) (p)
where B̂ = (β̂ , . . . , β̂ ). Consider the (i, j )th element of the above
covariance matrix
0 (i) 0 (j) (i) (j)

Cov(x0 β̂ , x0 β̂ ) = x00 Cov(β̂ , β̂ )x0
HKU STAT7005 8
= σij x00 (X 0 X)−1 x0 for i, j = 1, . . . , p.

Then,
\
Var(E(y 0 )) = x00 (X 0 X)−1 x0 Σ
(i)
because x00 (X 0 X)−1 x0 is a scalar. By using the distribution property of β̂
for i = 1, . . . , p, we have
\
E ∼ Np (B 0 x0 , x00 (X 0 X)−1 x0 Σ)
(y 0 )
and (n − k)S ∼ Wp (n − k, Σ).
The Hotelling's T2 statistic is
!0 !
2 (B̂ − B)0 x0 −1 (B̂ − B)0 x0
T = S ∼ T 2 (p, n − k).
x00 (X 0 X)−1 x0 x00 (X 0 X)−1 x0
p p
Consequently, the (1 − α)100% condence region for E(y 0 ) = x00 β is a

p-dimensional ellipsoid,
( 0
!0 0
! )
B̂ x − E(y 0 ) B̂ x − E(y 0 )
E(y 0 ) : p 0 0 S −1 p 0 0 ≤ Tα2 (p, n − k) .
x00 (X X)−1 x0 x00 (X X)−1 x0
The (1 − α)100% Scheé's simultaneous C.I.s for E(y0i ) = x00 β (i) are
q
0 (i)
p
x0 β̂ ± Tα (p, n − k) sii x00 (X 0 X)−1 x0
2
for i = 1, . . . , p where sii is the ith diagonal element of S.

2. Predict new individual response y 0 = B 0 x0 + ε0 at x0 where y 0 =
(y01 , y02 , · · · , y0p )0 , ε0 = (ε01 , ε02 , · · · , ε0p )0 ∼ Np (0, Σ) and ε0 is independent
of U.
0
Now, y 0 − B̂ x0 = (B − B̂)0 x0 + ε0 ∼ Np (0, (1 + x00 (X 0 X)−1 x0 )Σ). The
2
Hotelling's T statistic is
0
!0 0
!
y 0 − B̂ x0 y 0 − B̂ x0
T2 = p 0
S −1 p 0
∼ T 2 (p, n − k).
0 −1 0 −1
1 + x0 (X X) x0 1 + x0 (X X) x0
Consequently, the (1 − α)100% condence region for y0 is a p-dimensional
ellipsoid,
( 0
!0 0
! )
B̂ x0 − y 0 B̂ x0 − y 0
y0 : S −1 ≤ Tα2 (p, n − k) .
1 + x00 (X 0 X)−1 x0 1 + x00 (X 0 X)−1 x0
p p
The (1 − α)100% Scheé's simultaneous C.I.s for y0i are

q
(i) p
x00 β̂ ± Tα2 (p, n − k) sii (1 + x00 (X 0 X)−1 x0 )
for i = 1, . . . , p where sii is the ith diagonal element of S.
HKU STAT7005 9
5.7 Examples
Example 5.1
Paper qualities were measured for 41 specimen on their density (X ), strength
in machine direction (Y1 ) and strength in cross direction (Y2 ). The data set can be
found in the Moodle, named as paper_quality.dat .
(a) With this data set, t the multivariate regression for the two types of strength
on the density.
(b) Test whether the independent variable X is signicant.
(c) Does the independent variable X have the same eect to the two types of
strength?
Solution
(a)
Consider the following regression model.
Y1 = β01 + β11 X + ε1
Y2 = β02 + β12 X + ε2
and the parameter estimates are
Ŷ1 = 12.8113 + 133.2039X ,

(22.2226) (27.3471)
Ŷ2 = −76.8851 + 178.1211X,

(27.3177) (33.6171)
where the numbers in the parentheses are the standard errors of the coresponding
parameter estimates.
(b) We test the hypothesis H0 : β11 = β12 = 0. From the results of four multivariate
statistics, their p-values are less than 5%. Therefore, we can conclude that the
independent variable X is signicant at 5% level.
Statistic Value Pr > F

Wilks' Lambda 0.5571 <.0001
Pillai's Trace 0.4429 <.0001
Hotelling-Lawley Trace 0.7950 <.0001
Roy's Greatest Root 0.7950 <.0001
(c) We test the hypothesis H0 : β11 = β12 . The coecient matrix B is

β01 β02
B= .
β11 β12
The linear hypothesis is H0 : LBM = 0 where

1
L= 01 and M=
−1
HKU STAT7005 10
respectively.
p-values are greater than
From the results of four multivariate statistics, their
5%. Therefore, we can conclude that the independent variable X has the same eect
on the strength.

Wilks' Lambda 0.9231 0.0793
Pillai's Trace 0.0769 0.0793
Hotelling-Lawley Trace 0.0833 0.0793
Roy's Greatest Root 0.0833 0.0793
Example 5.2
The data le BANK.DAT contains observations on 100 bank employees on each of
six variables. The six variables are
LCURRENT = log(current salary)

LSTART = log(starting salary)
EDUC = years of education
EXPER = years of relevant work experience at time of hiring
AGE = age of employee
SENIOR = level of seniority with the bank
(a) Fit the multivariate regression for LCURRENT and LSTART on EDUC,
EXPER, AGE and SENIOR:
LCURRENT = α0 + α1 EDUC + α2 EXPER + α3 AGE + α4 SENIOR + ε1

LSTART = β0 + β1 EDUC + β2 EXPER + β3 AGE + β4 SENIOR + ε2 .
(b) Test whether all four independent variables are not useful. If not, test
individually for each independent variable.
(c) Test whether each of the four independent variables has the same eect to
LCURRENT and LSTART. If not, test individually for each independent
variable.
(d) Plot the residuals against predicted values of each dependent variable. Any
observations?
(e) Check whether the multinormality assumption is valid.
(f ) Predict the mean of the response variables when EDUC=12, EXPER=4,

AGE=28 and SENIOR=77. Find the 95% Scheé's simultaneous coindence
intervals for the prediction.
Solution
(a)
\
LCURRENT = 8.6988 + 0.0832EDUC + 0.0161EXPER − 0.0151 AGE + 0.00212SENIOR
(0.3012) (0.0097) (0.0045) (0.0038) (0.0030)
HKU STAT7005 11
\ =
LSTART 8.2848 + 0.0814EDUC + 0.0160EXPER − 0.0105 AGE − 0.0035SENIOR
(0.2679) (0.0087) (0.0040) (0.0034) (0.0027)
(b) For the testing of the joint eect of all variables, the hypothesis is
The linear hypothesis is H0 : LBM = 0 where
 
  α 0 β0
0 1 000  α 1 β1 
0 0 1 0 0  
L=  , B =  α 2 β2  , and M = I2
0 0 010   
 α 3 β3 
0 0 001
α 4 β4
respectively.
All multivariate tests reject the null hypothesis. Therefore, the hypothesis that
all four independent variables are not useful is rejected.

The multivariate tests for the eect of each independent variables (write down
L and M) are given in the order of EDUC, EXPER, AGE and SENIOR below:



HKU STAT7005 12

(c) For the hypothesis that all independent variables have the same eect on
LCURRENT and LSTART,
 
0 1 0 0 0
0 0 1 0 0 1
L=  and M= .
0 0 0 1 0 −1
0 0 0 0 1
It is rejected at 5% because the p-values by all 4 multivariate tests are less than
5%.

For the hypothesis that EDUC has the same eect on LCURRENT and LSTART,

1
L= 01000 and M= .
−1
It cannot be rejected at 5% because all p-values of multivariate tests are greater
than 5%.

For the hypothesis that EXPER has the same eect on LCURRENT and
LSTART,
1
L= 00100 and M= .
−1
than 5%.

HKU STAT7005 13
For the hypothesis that AGE has the same eect on LCURRENT and LSTART,

1
L= 00010 and M= .
−1
It is rejected at 5% because all p-values of multivariate tests are less than 5%.

For the hypothesis that SENIOR has the same eect on LCURRENT and
LSTART,
1
L= 00001 and M= .
−1

For The hypothesis that EDUC and EXPER have the same eect on

01000 1
L= and M= .
00100 −1

than 5%.

For the hypothesis that EDUC, EXPER and AGE have the same eect on
 
0100 0
1
L= 0010
 0 and M= .
−1
0001 0
HKU STAT7005 14

(d) The plots (ε̂1 vs LCURRENT) and (ε̂2 vs LSTART) are given below:
No specifc pattern is observed in both plots. This implies the residuals are close
to random.
(e) To check the univarite normality of residuals ε̂1 and ε̂2 , the Q-Q plots for residuals
of tting LCURRENT and LSTART are given belows:
Since the dots in both diagrams move along a straight line approximately, the
residuals are close to univariate normal distribution.
HKU STAT7005 15
However, from the chi-square plot, the dots are not close to the reference line.
Therefore, the multi-normality assumption may not be so valid.
(f )
rep.meas = lcurrent:
educ exper age senior lsmean SE df lower.CL upper.CL
12 4 28 77 9.50 0.0543 95 9.39 9.61
rep.meas = lstart:
educ exper age senior lsmean SE df lower.CL upper.CL
12 4 28 77 8.76 0.0483 95 8.67 8.86
Confidence level used: 0.95
> sci95
[,1] [,2]
lcurrent 9.364883 9.636202
lstart 8.643349 8.884700
HKU STAT7005 16

Chap5 - Multivariate Regression and Linear Model

Uploaded by

Copyright:

Available Formats

You might also like

Chap5 - Multivariate Regression and Linear Model

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chap5 - Multivariate Regression and Linear Model

Uploaded by

Copyright:

Available Formats

STAT7005 Multivariate Methods

5 Multivariate Regression and Linear Model

5.1 Univariate Linear Model

y1 = β1 x11 + β2 x12 + · · · + βk x1k + ε1

yn = β1 xn1 + β2 xn2 + · · · + βk xnk + εn

The maximum likelihood estimator for β is

2. When all Xi 's are continuous, it is called (simple, multiple) regression.

5.2 Multivariate Linear Model

Hence, the data structure is

individual Y1 Y2 ··· Yp X1 X2 ··· Xk

Assume that the model is linear in β1, . . . , βk .

Then, we have, in matrix form,

2. When all Xi 's are continuous, it is called multivariate regression.

and Var(Vec(U )) = E(Vec(U )Vec(U )0 )

where ⊗ is the kronecker product operator.

5.3 Estimation of B and Σ

1. Least Squares Estimation

and the estimator of Σ is

where Û is called the residual matrix.

2. Maximum Likelihood Estimation

If we further assume multivariate normality, B̂ is shown to be the MLE of B

(Y − XB)0 (Y − XB) = (Y − X B̂)0 (Y − X B̂) + (X B̂ − XB)0 (X B̂ − XB).

where Û = Y − X B̂ . Then, as in Section 2.2, the result follows by applying the

5.4 Properties of B̂ and S (or Σ̂)

where σij denotes the (i, j)th element of Σ. In other words,

Vec(B̂) ∼ Nkp (Vec(B), Σ ⊗ (X 0 X)−1 ).

Similar to Var(Vec(U )),

 (1) (1) (1) (2) (1) (p) 

The multivariate normal distribution of Vec(B̂) follows from the distribution

2. The residual CSSP matrix

Hence, an unbiased estimator of Σ is

Consequently, the estimated variance of Vec(B̂) is

3. B̂ and E (or S or Σ̂) are independent.

Proof: Use Property 6(c) of Wishart distribution in Section 2.3 by showing

N.B. In the case of multiple linear regression (i.e. p = 1 ), β̂ and s2 are

4. The estimation of parameter coecients B is just a horizontal concatenation

5.5 Test of Hypotheses about B

H0 : LB = 0, where L is c × k, c ≤ k and rank(L) = c (5.5.1)

(a) the p×p symmetric `hypothesis' matrix

H = (LB̂)0 {L(X 0 X)−1 L0 }−1 (LB̂)

(b) E and H are independent.

Actually, this hypothesis is based on the restriction on all p variables because

2. In general, the hypothesis (5.5.1) for sample structure of the observational

Substituting YM for Y in (5.4.1) and (5.5.2), the hypothesis (5.5.3) can be

Û = residual matrix from regression of Y on X

(c) E and H are independent.

Special Case: When M = I, no transformation is applied to the response

We can then use the statistics in Section 4.2.

5.6 Prediction from Multivariate Regressions

1. Predict mean response E(y 0 ) corresponding to xed values x0 =

Since y 0 = B 0 x 0 + ε0 with E(ε0 ) = 0,

The unbiased estimator is given by

and its variance is

0 (i) 0 (j) (i) (j)

= σij x00 (X 0 X)−1 x0 for i, j = 1, . . . , p.

Consequently, the (1 − α)100% condence region for E(y 0 ) = x00 β is a

for i = 1, . . . , p where sii is the ith diagonal element of S.

The (1 − α)100% Scheé's simultaneous C.I.s for y0i are

4. The estimation of parameter coecients B is just a horizontal concatenation

1. Predict mean response E(y 0 ) corresponding to xed values x0 =

Consequently, the (1 − α)100% condence region for E(y 0 ) = x00 β is a

The (1 − α)100% Scheé's simultaneous C.I.s for y0i are

(b) Test whether the independent variable X is signicant.

(c) We test the hypothesis H0 : β11 = β12 . The coecient matrix B is