Multiple Linear Regression

MULTIPLE LINEAR REGRESSION
Multiple linear regression allows us to investigate the joint effect of several predictors on
the response by relating a single outcome variable to two or more predictors simultane-
ously.
If we want to use k predictor variables X1 , X2 , .., Xk to explain a response variable y then
the model is of the form:
y = β0 + β1 X1 + ... + βk Xk + ϵ
Given a random sample of size n selected from the population, the model is of the form
Yi = β0 + β1 Xi1 + ... + βk Xik + ϵi ; i = 1, 2, .., n
In matrix form we have

      
y1 1 x11 x12 ··· x1k β0 ϵ1
      
 y2   1 x21 x22 ··· x2k  β1   ϵ2 
 . =  + 
 .   .. ..  ..   .. 
 .   . .  .   . 
yn 1 xn1 Xn2 · · · xnk βk ϵn
i.e
yn×1 = Xn×(k+1) β (k+1)×1 + ϵn×1
The matrix X is known as design matrix and β is a column vector of regression coefficients.
y is a column of response values for individuals in the sample. Individual observations of
the outcome yi are modeled as varying by an error term ϵi about an average determined
by their predictor values :
yi = E[yi |x1 , x2 , .., xk ] + ϵi

= β0 + β1 xi1 + ... + βk xik + ϵi
where xij is the value of predictor variable xj for observation i.
Assumptions of the model:
• Each error term in the error vector has mean zero; E[ϵ] = 0
• The error terms are independent of each other;Cov(ϵi , ϵj ) = 0
• The error terms have constant variance; E[ϵϵ′ = σ 2 In
• The error terms are normally distributed with mean 0 and finite variance σ 2 .
• The response variable is normally distributed.
• There is no correlation between the error terms and the predictor variables.
1
The aim of fitting regression line is:
(i) Identify predictors that are associated with the response variable in order to promote
understanding of the underlying process.
(ii) Determine the extent to which one or more of the predictors is/are linearly related
to the dependent variable after adjusting for other variables that may be related to
it.
(iii) Predict the value of the dependent variable as accurately as possible from the pre-
dictor values.
Estimation of the regression coefficients

Using the principle of least squares, the estimates of β are given by
−1
β = (X′ X) X′ y
The coefficients β̂0 , β̂1 , ..., β̂k are such that the sums of the squared distance between the
observed and predicted values (i.e. regression line) are smallest.
The vector of regression coefficients β̂ is a k +1×1 is a random vector which follows multi-
variate normal distribution with mean vector β and variance-covariance matrix σ 2 (X′ X)−1
The regression model fit is
ŷ = β̂0 + β̂1 x1 + β̂2 x2 + ..... + β̂k xk
Interpretation of regression coefficients

Each regression coefficient βj , j = 1, 2, .., k gives the change in the average value of
response variable for an unit increase in the corresponding predictor xj , while holding all
the other predictor variables in the model constant. Each of the estimates is adjusted for
the effects of all other predictors. Just like in the case of simple linear model;
(i) If βj < 0, then for every unit increase in xj , the average response value decreases
by βj units.
(ii) If βj > 0, then for every unit increase in xj , the average response value increases by
βj units.
2
Significance test for overall model fit
In this case we wish to test the hypotheses:
H0 : regression fit is not significant

vs
H1 : regression fit is significant.
These statistical hypotheses are equivalent to
H0 : β1 = β2 = β3 = ... = βk = 0
vs
H1 : at least oneβj ̸= 0 j = 1, 2, .., k
To carry out this test we compute the test statistic as follows:

∑
n
SST = yi2 − n(y)2 or SST = y′ y − n(y)2
i=1
∑n
′
SSE = yi2 − β̂ X′ y
i=1
′ ′
SSR = β̂ X y − n(y)2 = SST − SSE
and the F-ratio is

SSR/k
Fc = ∼ F (k, n − k − 1)
SSE/n − k − 1
Similar to the simple linear regression analysis, the results of this test are presented in an
ANOVA table
Source of degrees of sum of mean sum of F-ratio

variation freedom squares squares
Regression k SSR M SR = SSR k
Fc = M SR
M SE
SSE
Error n-k-1 SSE M SE = n−k−1
Total n-1 SST
We reject H0 at α level of significance if Fc > F (k, n − k − 1, α).
3
Example: For the data below:
Y X1 X2 Y X1 X2
174.4 68.5 16.7 191.1 72.8 17.1
164.4 45.2 16.8 232.0 88.4 17.4
244.2 91.3 18.2 145.3 42.9 15.8
154.6 47.8 16.3 161.1 52.5 17.8
181.7 46.9 17.3 209.7 85.7 18.4
207.5 66.1 18.2 146.4 41.3 16.5
152.8 49.5 15.9 144.0 51.7 16.3
163.2 52.0 17.2 232.6 89.6 18.1
145.4 48.9 16.6 224.1 82.7 19.1
137.2 38.4 16 .0 166.5 52.3 16.0
241.9 87.9 18.3
(i) Fit a regression model to the data above and interpret the coefficient estimates.
(ii) Is the regression fit significant? use 0.05 level of significance
Solution:
(i) the design matrix is

 
1 68.5 16.7
 1 x45.2 16.8 
 
 
 1 91.3 18.2 
 .. 
 .. ..
. 
 . . 
X= .
 1 87.9 18.3 
 
 1 72.8 17.1 
 
 .. .. .. 
 . . . 
1 52.3 16.0
it is a 21× 3 matrix.
 
21.0 1302.4 360.0
 
X′ X =  1302.4 87707.94 22609.19  .
360.0 22609.19 6190.26
and
 
29.72892348 0.07218347 −1.99255319
 
(X′ X)−1 =  0.07218347 0.0003701761 −0.005549917  .
−1.99255319 −0.005549917 0.136310637
 
3820.10
 
X′ y =  249648.04  .
66074.48
4
Thus the regression coefficients estimates are given by
    
29.72892348 0.07218347 −1.99255319 3820.10 −68.992757
    
 0.07218347 0.0003701761 −0.005549917   249648.04  =  1.453913  .
−1.99255319 −0.005549917 0.136310637 66074.48 9.376033
the regression model is ŷ = −68.993 + 1.454x1 + 9.376x2

While holding x2 constant,the average value of y by 1.45 units for every unit increase
in x1 . While holding x1 constant, the average value of y by 9.38 units for every
unit increase in x2 .
∑
(ii) y′ y = 21 2
i=1 yi = 721108.7
′
β̂ X′ y is
 
[ ] 3820.10
 
−68.992757 1.453913 9.376033  249648.04  = 718923.8
66074.48
′
Thus SSE = y′ y − β̂ X′ y = 721108.7 − 718923.8 = 2184.9
y = 181.91
SST=721108.7- (21 ∗ 181.912 )= 26192.49
SSR =SST-SSE= 24007.59
ANOVA table is

Regression 2 24007.59 12003.8 98.892
Error 18 2184.9 121.383
Total 20 26192.49
from tables, F(2,18,0.05)= 3.55, 98.892 3.55, hence regression fit is significant
Coefficient of multiple determination(R squared):

This measure, denoted by R2 , lies between zero and one. It is interpreted as the pro-
portion(or percent) of the variation in the response variable that is explained by the set
of predictor variables i.e. proportion of variation in response explained by its relation to
the predictor variables.It is indicative of the level of explained variability in the data set;
a value close to one indicates that the set of predictors variables are able to explain the
response variable adequately.
It is given by
SST − SSE SSR
R2 = =
SST SST
One main disadvantage of this measure is that it is inflated by increasing the number of
predictors in the model.
5
Adjusted R squared statistic
To counter the disadvantage of R squared measure, we can make use of the adjusted
R-squared statistic. This statistic is given by
SSE
2
R = 1− n−k−1
SST
( n−1
)
(1 − R2 ) ∗ (n − 1)
= 1− .
n−k−1
where k is the number of predictor variables used to explain the response. Unlike R2 , the
2
adjusted R2 , R , increases only if the additional predictor variables being included in the
model significantly improve the fit. The adjusted R2 can be negative, and will always be
less than or equal to R2 . The larger the value of this statistic, the better the model fit.
Significance tests for individual predictors:

We can test if a linear relationship exists between the response variable and Xj after ad-
justing for all other predictors by carrying out individual tests; H0 : βj = 0 vs H1 : βj ̸= 0,
j = 1, 2, .., k. For each test, the test statistic is
β̂j
Tj = ∼ t(n − k − 1), j = 1, 2, ..., k
s.e(β̂j )
If the computed value of Tj is greater than critical t-value (t(n-k-1,α/2)), then we conclude
that Xj is a significant predictor or that there is a linear relationship between response
and Xj . On the other hand if the computed value of Tj is less than critical t-value (t(n-
k-1,α/2)), then we conclude that there is no sufficient evidence to indicate that Xj is a
significant predictor or that there is a linear relationship between response and Xj .
Alternatively , using p-value; if p-value is less than α, Xj is a significant predictor.
Example: A realtor seeks to determine if there is a relationship between selling price
of house(in thousands of dollars) and: the listing price,LIST(in thousands of dollars);
the age of the house,AGE(in years); the compound area,ACRES(in acres) ; living room
area,LIVRM(in tens of square feet) and number of bedrooms,BED. He selected fifty houses
in the market and proposed to use multiple linear regression analysis to assess the rela-
tionship. The results of the fit were as follows:

Regression 5 139660.498 27932.1 1529.531
Error 44 803.522 18.262
Total 49 140464.02
6
Parameter estimates:
Variable coefficient estimate S.e(estimate) t-value

LIST 0.992 0.023 42.336
ACRE 0.214 0.651 0.326
BED 2.367 1.027 2.305
AGE -0.015 0.025 -0.62
LIVRM 0.597 0.152 3.93
Intercept -2.451
(i) At 0.05 level of significance, test if there is a linear relationship between selling price
and the five predictor variables.
(ii) Test for the significance of each of the five predictor variables at 0.05 level. Interpret
the results.
(iii) Interpret the coefficient estimates.
(iv) Calculate the coefficient of multiple determination. What does it indicate here?
(v) Estimate average selling price for a house whose listing price is $242000, has living
room area of 375sqft,five bedrooms, in a compound of 1.82 acres and was built ten
years ago.
Solution:
(i) The critical value is F(5,44,0.05)=2.427;the computed value is greater than the
critical value hence the fitted model
ŷ = −2.451 + 0.597LIV RM − 0.015AGE + 2.367BED + 0.214ACRE + 0.992LIST
is statistically significant.
(ii) The critical value for significance tests fot five individual predictors is t(44,0.025)=
2.015; using the t-values:
Listing price: the t-value is 42.336,42.336 > 2.015, it is a significant predictor.

After adjusting for living room area, compound area, number of bedrooms and
age; there is a significant linear relationship between selling price and listing
price
Number of bedrooms: the t-value is 2.305,2.305 > 2.015 it is a significant
predictor. After adjusting for listing price, compound area, living room area
and age; there is a significant linear relationship between selling price and
number of bedrooms.
Living room area: the t-value is 3.93,3.93 > 2.015 it is a significant predictor.
After adjusting for listing price, compound area, number of bedrooms and age;
there is a significant linear relationship between selling price and living room
area.
7
Compound area: the t-value is 0.326,0.326 < 2.015 it is not a significant predic-
tor. After adjusting for listing price, age, number of bedrooms and living room
area; there is no sufficient evidence to indicate a significant linear relationship
between selling price and compound area.
Age: the t-value is -0.62,| − 0.62| < 2.015 it is not a significant predictor.
After adjusting for listing price, compound area, number of bedrooms and
living room area; there is no sufficient evidence to indicate a significant linear
relationship between selling price and age of house.
(iii) Listing price:After adjusting for living room area, compound area, number
of bedrooms and age; the average selling price increases by $992 for every
additional $1000 in listing price.
Number of bedrooms: After adjusting for living room area, compound area,
listing price and age; the average selling price increases by $2367 for every
additional bedroom.
Living room area: After adjusting for listing price, compound area, number
additional ten sq.ft in compound area.
Compound area: After adjusting for living room area, listing price, number
additional acre in compound area.
Age: After adjusting for living room area, compound area, number of bedrooms
and listing price; the average selling price decreases by $15 for every additional
year in age of house.
(iv) the coefficient of multiple determination is

1396660.498
R2 = = 0.994
140464.02
99.4% of the total variation in selling price is accounted for by the set of five predictor
variables.
(v) The specific values of the predictor variables to be used for model fit are:
242000 375
LIST = = 242, ACRE = 1.82, BED = 5, LIVRM = = 37.5, AGE = 10
1000 10
ŷ = −2.451 + (0.597 × 37.5) − (0.015 × 10) + (2.367 × 5) + (0.214 × 1.82)

+(0.992 × 242)
= 272.075
the estimated selling price is 272.075 × 1000 = $272075
8
REGRESSION WITH DUMMY(INDICATOR) VARIABLES
Regression analysis usually treats the predictor variables as numerical values or quanti-
tative variables. However in some cases one may want to include an attribute or nominal
variable ( collectively known as categorical variables) to explain a response variable. The
categorical variables are represented in the model using dummy variables.
Definition: A dummy variable is a variable that takes on the value 0 or 1. It is also
referred to as an indicator variable. It represents a categorical variable with two distinct
categories/levels.
Regression with one continuous and one categorical variable with two cate-
gories:
Suppose a response variable y is related to one continuous variable X1 and one categorical
variable X2 which has two categories ; the model will be of the form:
y = β0 + β1 X1 + β2 X2 + ϵ
where: {
1, if response is in 1st category
X2 =
0, if response is in 2nd category
the selection of which category should be assigned the value 0 is arbitrary. The average
value of the response of individuals in the 2nd category is
E[Y |X1 , X2 = 0] = β0 + β1 X1
, whereas that of individuals in the 1st is
E[Y |X1 , X2 = 1] = β0 + β2 + β1 X1 .
The two means have the same slope β1 but different intercepts. The intercept of the 1st
category shifts by β2 when X2 = 1. β2 is known as the differential intercept coefficient
and it measures the differential effect of the categorical variable. In general β2 shows how
much higher( lower) the mean response line is for the 1st category is compared to the line
for the 2nd category( base category) for any given value of the categorical variable.
In general the category for which the categorical variable takes the value 0 is known as
the base category or reference category.
9
Estimation of regression coefficients:
Taking a sample of size n the model becomes
yi = β0 + β1 Xi1 + β2 Xi2 + ϵi , i = 1, 2, .., n
where: {
Xi2 =
In matrix notation it is of the form
y = XB + ϵ
where X is the design matrix

 
1 x11 1
 
 1 x21 1 
 .. .. .. 
 
 . . . 
 
 1 xj1 1 
 
X=
 1 xj+1,1 0 .

 .. .. .. 
 . . . 
 
 1 xk1 0 
 
 .. .. .. 
 . . . 
1 xn1 0
and B is


β0
 
b =  β1 
β2
As in case of multiple linear regression,

−1
B̂ = (X′ X) X′ y
10
Interpretation of regression coefficients
(i) While holding categorical variable X2 constant,
• If β1 < 0, then for every unit increase in x1 , the average response value de-
creases by β1 units.
• If β1 > 0, then for every unit increase in x1 , the average response value increases
by β1 units.
(ii) While holding continuous variable X1 constant,
• If β2 < 0, then the average response value for the first category is β2 units
lower than the average response value for the second category.
• If β2 > 0, then the average response value for the first category is β2 units
higher than the average response value for the second category.
Significance test for overall model fit
SST = y′ y − n(y)2
SSE = y′ y − b̂′ X′ y
SSR = SST − SSE
The ANOVA table is of the form:
Source of Sum of Degrees of Mean square F-ratio

variation squares(SS) freedom (MS)
Regression SSR 2 M SR = SSR
2
M SR
M SE
Error SSE n-3 M SE = SSE
n−3
Total SST n-1
To test H0 : regression is not significant, we reject H0 if F-ratio> Fα (2, n − 3)
Significance tests for individual predictors:

The confidence interval for βi is
β̂j ± t( α2 ;(n−3)) s.e(β̂j ), j = 1, 2
To test H0 : βj = 0 vs H1 : β1 ̸= 0 j = 1, 2 we use the test statistic
β̂j
Tj = ∼ t(n − 3), j = 1, 2
s.e(β̂j )
Xj j = 1, 2 is a significant predictor if |Tj | > t(n − 3, α/2)
11
Example: An economist wanted to see the relationship between the speed with which
a particular insurance innovation is adopted (Y )( measured in months) and the worth of
the insurance firm (X1 ) (in million dollars) as well as the type of firm (X2 ). He studied
6 mutual firms and 6 stock firms and information obtained is presented below:
Months worth of type of months worth of type of

elapsed (yi ) firm (xi1 ) firm (xi2 ) elapsed (yi ) firm (xi1 ) firm (xi2 )
17 151 mutual 28 164 stock
(i) Fit a regression to the data.
(ii) Interpret the regression coefficients.
(iii) Test for the significance of the fit.
(iv) Are the two predictors individually associated with the response variable?
(v) Estimate the time that lapses before a mutual firm worth $ 173 million adopts an
insurance innovation.
Solution: For x2 , let {

1, if in mutual firm
x2 =
0, if in stock firm
(i)
   
17 1 151 1
   
 26   1 92 1 
   
 21   1 175 1 
   
   
 22   1 104 1 
     
 12   1 210 1 
  β0  
 4     1 290 1 
y=

;
 B =  β1  and X = 

.

 28   1 164 0 
  β2  
 15   1 272 0 
   
 11   1 295 0 
   
 31   1 85 0 
   
   
 20   1 166 0 
30 1 124 0
−1
B̂ = (X′ X) X′ y
12
   
β̂0 40.823
   
B̂ =  β̂1  =  −0.099 
β̂2 −6.892
The fit of the model is thus ŷ = 40.823 − 0.099x1 − 6.892x2
(ii) While holding type of firm constant, an unit increase in worth of firm decreases the
time elapsed by 0.099;i.e for every additional $1 million in worth,the time a firm
takes to adopt an insurance innovation reduces by 0.1 months. Also while holding
the worth of firm constant, the time mutual firms take to adopt an innovation is on
average 6.89 months shorter than time taken by the stock firms.
(iii) The ANOVA table is

Regression 693.62 2 346.81 46.87 From F-tables, at
Error 66.62 9 7.4
Total 760.24 11
0.05 level of significance, F (2, 9) = 4.26, F-ratio > F-tables, hence we reject H0 and
conclude that regression fit is significant.
(iv) A summary of the coefficients gives us:

Estimated Standard t-value p-value 95% C.I
regression coefficient error
intercept 40.823
X1 -0.099 0.011 -9.000 < 0.001 [-0.124,-0.074]
X2 -6.892 1.578 -4.367 0.002 [-10.461,-3.323]
This table shows that both the worth of the firm and the type of firm are significant
in explaining the the response variable. Note that none of the confidence intervals
contains the value 0, hence the rejection of the null hypothesis H0 : βi = 0, i =
0, 1, 2
(v) the fitted model is ŷ = 40.823 − 0.099x1 − 6.892x2 . The specific values of the
predictors are x1 = 173 and x2 = 1; hence ŷ = 40.823 − 0.099(173) − 6.892(1) =
16.8 ∼ 17. This firm would take 17 months before adopting an innovation.
13
Regression with one continuous and one categorical variable with more
than two categories:
Suppose that the qualitative variable has k categories, k > 2. In this case we represent
the variable using k − 1 dummy variables, each taking the values 0 or 1.
The model will be of the form:
Y = β0 + β1 X1 + β2 X2 + β3 X3 + ... + βk Xk + ϵ
where: {
X2 =
0, otherwise
{
X3 =
0, otherwise
and so on until {
1, if response is in k − 1th category
Xk =
0, otherwise
The k − th category serves as the base category. The mean response lines are:
E[Y ] = (β0 + β2 ) + β1 X1 , when X2 = 1, X3 = 0, ...., Xk = 0
E[Y ] = (β0 + β3 ) + β1 X1 , when X2 = 0, X3 = 1, ...., Xk = 0
.. ..
. .
E[Y ] = (β0 + βk ) + β1 X1 , when X2 = 0, X3 = 0, ...., Xk = 1
E[Y ] = β0 + β1 X1 , when X2 = 0, X3 = 0, ...., Xk = 0
The regression coefficients with respect to the dummy variables are interpreted in relation
to the base category. They indicate respectively how much higher (lower) the average
response value is for the respective category compared to the base category.
Model analysis:
For a sample of size n the model takes the form
yi = β0 + β1 Xi1 + β2 Xi2 + β3 Xi3 + ... + βk Xik + ϵi , i = 1, 2, .., n
and in matrix notation it is of the form
y = Xb + ϵ
where X is the matrix
 
1 x11 1 0 . . . 0
 
 1 x21 1 0 . . . 0 
 
 .. .. .. 
 . . . 
 
 1 xj1 0 1 . . . 0 
X=
 .. .. .. .. .

 . . . . 
 
 1 xk1 0 0 . . . 0 
 
 .. .. .. .. 
 . . . . 
1 xn1 0 0 ... 1
14
and b is
 
β0
 
 β1 
 
B=
 β2 .

 .. 
 . 
βk
The ANOVA table is of the form:

Regression SSR k M SR = SSR
2
M SR
M SE
Error SSE n-k-1 M SE = SSE
n−3
Total SST n-1
To test H0 : regression is not significant, we reject H0 if F-ratio> Fα (k, n − k − 1)

Significant tests for individual predictors:
For the qualitative predictor, we compare two models; one with the predictor and one
without the predictor.
The null and alternative hypotheses are:
H0 : β2 = β3 = 0 = ... = βk = 0
vs.
H1 : not allβj are equal to zero
Under H0 ; the model is

Y = β0 + β1 X1 + ϵ
and under H1 ; the model fit is
Y = β0 + β1 X1 + β2 X2 + β3 X3 + ... + βk Xk + ϵ
The test statistic is

SSE0 −SSE1
F = k−1
∼ F (k − 1, n − k − 1)
MSE1
where:
SSE0 is the error sum of squares for model fit under H0
SSE1 is the error sum of squares for model fit under H1
MSE1 is the mean error sum of squares for model fit under H1
Reject the null hypotheses if the value of test statistic is greater than the critical value.
If null hypothesis is rejected we conclude that the predictor is significant after controlling
for quantitative predictor. On the other hand if we fail to reject the null hypothesis then
the qualitative predictor is not significant.
15
Prediction of response variable: prediction is similar to multiple linear model for
continuous variables.
Example: The data below gives the amount of scrap (Y),line speed (X1 ) and a cat-
egorical variable soap production line having three categories:
amount of line production amount of line production

scrap speed line scrap speed line
218 100 I 273 190 II
248 125 I 422 320 II
360 220 I 384 270 II
351 205 I 361 275 II
479 304 I 252 155 II
394 255 I 241 155 III
470 300 I 331 190 III
332 225 I 275 140 III
410 270 I 180 135 III
425 290 I 382 245 III
140 105 II 356 262 III
277 215 II 321 175 III
260 200 II 214 115 III
410 295 II
(i) Fit a regression line to the data and interpret the coefficient estimates.
(ii) Comment on the significance of the model fit.
(iii) Are the predictors individually significant?
(iv) Estimate the amount of scrap for soap production line III with line speed of 202.
Solution:
(i) The line is Y = β0 + β1 x1 + β2 x2 + β3 x3 + ϵ where:

For x2 , let {
1, if in production line 1
x2 =
0, otherwise
and for x3 , let {
1, if in production line 2
x3 =
0, otherwise
16
 
  1 100 1 0
218    
  β0  1 125 1 0 
 248   .. .. 
 .     .. ..
. . 
 β1   . . 
y= . 
 . ; b=  and X =  .
   β2   1 105 0 1 
 321   .. .. 
β3  .. .. 
214  . . . . 
1 115 0 0
 
0.423345728 −0.001787603 −0.025306178 0.005678937
 −0.001787603 0.00001023437 −0.0004912496 −0.0006686453 
′ −1  
(X X) = 
 −0.025306178 −0.0004912496 0.2458022038 0.143206086 
0.005678937 −0.0006686453 0.143206086 0.2659070492
and
 
8766
 
 2006513 
X′ y =  .
 3262 
2952
−1
B̂ = (X′ X) X′ y
   
β̂0 58.416
   
 β̂1   1.289 
B̂ =  = 
 β̂2   17.018 
β3 −39.768
The fit of the model is thus ŷ = 58.416 + 1.289X1 + 17.018X2 − 39.768X3

While holding type of soap production line constant; for every unit increase in line
speed, the average amount of scrap increases by 1.29 units.
While holding line speed constant: the average amount of scrap for soap production
line II is 39.8 units less than amount of line III; and the average amount of scrap
for soap production line I is 17.02 units more than amount of line III.
(ii) The ANOVA table is

Regression 190495 3 63498.333 113.048
Error 12919 23 561.696
Total 203414 26
From F-tables, at 0.05 level of significance, F (3, 23) = 3.03, F-ratio > F-tables,
hence we reject H0 and conclude that regression is significant.
17
(iii) A summary of the coefficients gives us:
Estimated Standard t-value p-value
regression coefficient error
Intercept 58.416
line speed 1.289 0.076 16.961 < 0.0001
line II -39.768 12.221 -3.254 0.004
lineI 17.018 11.75 1.448 0.161
Line speed is a significant predictor.
for type of line production:

the ANOVA table for model fit with line speed as predictor is

Regression 175789 1 175789 159.08
Error 27625 25 1105
Total 203414 26
SSEH0 −SSEH1 (27625−12919)
k−1 3−1
F = = = 6.6543
MSEH1 1105
F-critical value is F(2,23,0.05)=3.42, we reject null hypothesis, type of line produc-
tion is significant after adjusting for line speed.
(iv) the fitted model is ŷ = 58.416 + 1.289X1 + 17.018X2 − 39.768X3 . The specific values
of predictors are x1 = 202,x2 = 0 and x3 = 0 ; hence ŷ = 58.416 + 1.289(202) +
17.018(0) − 39.768(0) = 318.794; the estimated amount of scrap is 319
18

Multiple Linear Regression

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Multiple Linear Regression

Uploaded by

Copyright:

Available Formats

MULTIPLE LINEAR REGRESSION

Yi = β0 + β1 Xi1 + ... + βk Xik + ϵi ; i = 1, 2, .., n

In matrix form we have

yi = E[yi |x1 , x2 , .., xk ] + ϵi

where xij is the value of predictor variable xj for observation i.

Assumptions of the model:

• The error terms are independent of each other;Cov(ϵi , ϵj ) = 0

• The error terms have constant variance; E[ϵϵ′ = σ 2 In

• The response variable is normally distributed.

Estimation of the regression coeﬃcients

ŷ = β̂0 + β̂1 x1 + β̂2 x2 + ..... + β̂k xk

Interpretation of regression coeﬃcients

H0 : regression ﬁt is not signiﬁcant

These statistical hypotheses are equivalent to

To carry out this test we compute the test statistic as follows:

and the F-ratio is

Source of degrees of sum of mean sum of F-ratio

(ii) Is the regression fit significant? use 0.05 level of significance

(i) the design matrix is

the regression model is ŷ = −68.993 + 1.454x1 + 9.376x2

Source of degrees of sum of mean sum of F-ratio

Coeﬃcient of multiple determination(R squared):

Signiﬁcance tests for individual predictors:

Source of degrees of sum of mean sum of F-ratio

Variable coeﬃcient estimate S.e(estimate) t-value

(iii) Interpret the coeﬃcient estimates.

ŷ = −2.451 + 0.597LIV RM − 0.015AGE + 2.367BED + 0.214ACRE + 0.992LIST

Listing price: the t-value is 42.336,42.336 > 2.015, it is a signiﬁcant predictor.

(iv) the coeﬃcient of multiple determination is

ŷ = −2.451 + (0.597 × 37.5) − (0.015 × 10) + (2.367 × 5) + (0.214 × 1.82)

the estimated selling price is 272.075 × 1000 = $272075

, whereas that of individuals in the 1st is

yi = β0 + β1 Xi1 + β2 Xi2 + ϵi , i = 1, 2, .., n

where X is the design matrix

As in case of multiple linear regression,

(i) While holding categorical variable X2 constant,

(ii) While holding continuous variable X1 constant,

Signiﬁcance test for overall model ﬁt

The ANOVA table is of the form:

Source of Sum of Degrees of Mean square F-ratio

To test H0 : regression is not signiﬁcant, we reject H0 if F-ratio> Fα (2, n − 3)

Signiﬁcance tests for individual predictors:

β̂j ± t( α2 ;(n−3)) s.e(β̂j ), j = 1, 2

To test H0 : βj = 0 vs H1 : β1 ̸= 0 j = 1, 2 we use the test statistic

Xj j = 1, 2 is a signiﬁcant predictor if |Tj | > t(n − 3, α/2)

Months worth of type of months worth of type of

(i) Fit a regression to the data.

(ii) Interpret the regression coeﬃcients.

(iii) Test for the signiﬁcance of the ﬁt.

Solution: For x2 , let {

The ﬁt of the model is thus ŷ = 40.823 − 0.099x1 − 6.892x2

(iii) The ANOVA table is

(iv) A summary of the coeﬃcients gives us:

Source of Sum of Degrees of Mean square F-ratio

To test H0 : regression is not signiﬁcant, we reject H0 if F-ratio> Fα (k, n − k − 1)

Under H0 ; the model is

The test statistic is

SSE0 is the error sum of squares for model ﬁt under H0

SSE1 is the error sum of squares for model ﬁt under H1

MSE1 is the mean error sum of squares for model ﬁt under H1

amount of line production amount of line production

(ii) Comment on the signiﬁcance of the model ﬁt.