Professional Documents
Culture Documents
Multiple Linear Regression
Multiple Linear Regression
Multiple linear regression allows us to investigate the joint effect of several predictors on
the response by relating a single outcome variable to two or more predictors simultane-
ously.
If we want to use k predictor variables X1 , X2 , .., Xk to explain a response variable y then
the model is of the form:
y = β0 + β1 X1 + ... + βk Xk + ϵ
Given a random sample of size n selected from the population, the model is of the form
i.e
yn×1 = Xn×(k+1) β (k+1)×1 + ϵn×1
The matrix X is known as design matrix and β is a column vector of regression coefficients.
y is a column of response values for individuals in the sample. Individual observations of
the outcome yi are modeled as varying by an error term ϵi about an average determined
by their predictor values :
• Each error term in the error vector has mean zero; E[ϵ] = 0
• The error terms are normally distributed with mean 0 and finite variance σ 2 .
• There is no correlation between the error terms and the predictor variables.
1
The aim of fitting regression line is:
(i) Identify predictors that are associated with the response variable in order to promote
understanding of the underlying process.
(ii) Determine the extent to which one or more of the predictors is/are linearly related
to the dependent variable after adjusting for other variables that may be related to
it.
(iii) Predict the value of the dependent variable as accurately as possible from the pre-
dictor values.
The coefficients β̂0 , β̂1 , ..., β̂k are such that the sums of the squared distance between the
observed and predicted values (i.e. regression line) are smallest.
The vector of regression coefficients β̂ is a k +1×1 is a random vector which follows multi-
variate normal distribution with mean vector β and variance-covariance matrix σ 2 (X′ X)−1
The regression model fit is
(i) If βj < 0, then for every unit increase in xj , the average response value decreases
by βj units.
(ii) If βj > 0, then for every unit increase in xj , the average response value increases by
βj units.
2
Significance test for overall model fit
In this case we wish to test the hypotheses:
H0 : β1 = β2 = β3 = ... = βk = 0
vs
H1 : at least oneβj ̸= 0 j = 1, 2, .., k
3
Example: For the data below:
Y X1 X2 Y X1 X2
174.4 68.5 16.7 191.1 72.8 17.1
164.4 45.2 16.8 232.0 88.4 17.4
244.2 91.3 18.2 145.3 42.9 15.8
154.6 47.8 16.3 161.1 52.5 17.8
181.7 46.9 17.3 209.7 85.7 18.4
207.5 66.1 18.2 146.4 41.3 16.5
152.8 49.5 15.9 144.0 51.7 16.3
163.2 52.0 17.2 232.6 89.6 18.1
145.4 48.9 16.6 224.1 82.7 19.1
137.2 38.4 16 .0 166.5 52.3 16.0
241.9 87.9 18.3
(i) Fit a regression model to the data above and interpret the coefficient estimates.
Solution:
and
29.72892348 0.07218347 −1.99255319
(X′ X)−1 = 0.07218347 0.0003701761 −0.005549917 .
−1.99255319 −0.005549917 0.136310637
3820.10
X′ y = 249648.04 .
66074.48
4
Thus the regression coefficients estimates are given by
29.72892348 0.07218347 −1.99255319 3820.10 −68.992757
0.07218347 0.0003701761 −0.005549917 249648.04 = 1.453913 .
−1.99255319 −0.005549917 0.136310637 66074.48 9.376033
′
β̂ X′ y is
[ ] 3820.10
−68.992757 1.453913 9.376033 249648.04 = 718923.8
66074.48
′
Thus SSE = y′ y − β̂ X′ y = 721108.7 − 718923.8 = 2184.9
y = 181.91
SST=721108.7- (21 ∗ 181.912 )= 26192.49
SSR =SST-SSE= 24007.59
ANOVA table is
5
Adjusted R squared statistic
To counter the disadvantage of R squared measure, we can make use of the adjusted
R-squared statistic. This statistic is given by
SSE
2
R = 1− n−k−1
SST
( n−1
)
(1 − R2 ) ∗ (n − 1)
= 1− .
n−k−1
where k is the number of predictor variables used to explain the response. Unlike R2 , the
2
adjusted R2 , R , increases only if the additional predictor variables being included in the
model significantly improve the fit. The adjusted R2 can be negative, and will always be
less than or equal to R2 . The larger the value of this statistic, the better the model fit.
β̂j
Tj = ∼ t(n − k − 1), j = 1, 2, ..., k
s.e(β̂j )
If the computed value of Tj is greater than critical t-value (t(n-k-1,α/2)), then we conclude
that Xj is a significant predictor or that there is a linear relationship between response
and Xj . On the other hand if the computed value of Tj is less than critical t-value (t(n-
k-1,α/2)), then we conclude that there is no sufficient evidence to indicate that Xj is a
significant predictor or that there is a linear relationship between response and Xj .
Alternatively , using p-value; if p-value is less than α, Xj is a significant predictor.
Example: A realtor seeks to determine if there is a relationship between selling price
of house(in thousands of dollars) and: the listing price,LIST(in thousands of dollars);
the age of the house,AGE(in years); the compound area,ACRES(in acres) ; living room
area,LIVRM(in tens of square feet) and number of bedrooms,BED. He selected fifty houses
in the market and proposed to use multiple linear regression analysis to assess the rela-
tionship. The results of the fit were as follows:
6
Parameter estimates:
(i) At 0.05 level of significance, test if there is a linear relationship between selling price
and the five predictor variables.
(ii) Test for the significance of each of the five predictor variables at 0.05 level. Interpret
the results.
(iv) Calculate the coefficient of multiple determination. What does it indicate here?
(v) Estimate average selling price for a house whose listing price is $242000, has living
room area of 375sqft,five bedrooms, in a compound of 1.82 acres and was built ten
years ago.
Solution:
(i) The critical value is F(5,44,0.05)=2.427;the computed value is greater than the
critical value hence the fitted model
is statistically significant.
(ii) The critical value for significance tests fot five individual predictors is t(44,0.025)=
2.015; using the t-values:
7
Compound area: the t-value is 0.326,0.326 < 2.015 it is not a significant predic-
tor. After adjusting for listing price, age, number of bedrooms and living room
area; there is no sufficient evidence to indicate a significant linear relationship
between selling price and compound area.
Age: the t-value is -0.62,| − 0.62| < 2.015 it is not a significant predictor.
After adjusting for listing price, compound area, number of bedrooms and
living room area; there is no sufficient evidence to indicate a significant linear
relationship between selling price and age of house.
(iii) Listing price:After adjusting for living room area, compound area, number
of bedrooms and age; the average selling price increases by $992 for every
additional $1000 in listing price.
Number of bedrooms: After adjusting for living room area, compound area,
listing price and age; the average selling price increases by $2367 for every
additional bedroom.
Living room area: After adjusting for listing price, compound area, number
of bedrooms and age; the average selling price increases by $597 for every
additional ten sq.ft in compound area.
Compound area: After adjusting for living room area, listing price, number
of bedrooms and age; the average selling price increases by $214 for every
additional acre in compound area.
Age: After adjusting for living room area, compound area, number of bedrooms
and listing price; the average selling price decreases by $15 for every additional
year in age of house.
(v) The specific values of the predictor variables to be used for model fit are:
242000 375
LIST = = 242, ACRE = 1.82, BED = 5, LIVRM = = 37.5, AGE = 10
1000 10
8
REGRESSION WITH DUMMY(INDICATOR) VARIABLES
Regression analysis usually treats the predictor variables as numerical values or quanti-
tative variables. However in some cases one may want to include an attribute or nominal
variable ( collectively known as categorical variables) to explain a response variable. The
categorical variables are represented in the model using dummy variables.
Definition: A dummy variable is a variable that takes on the value 0 or 1. It is also
referred to as an indicator variable. It represents a categorical variable with two distinct
categories/levels.
Regression with one continuous and one categorical variable with two cate-
gories:
Suppose a response variable y is related to one continuous variable X1 and one categorical
variable X2 which has two categories ; the model will be of the form:
y = β0 + β1 X1 + β2 X2 + ϵ
where: {
1, if response is in 1st category
X2 =
0, if response is in 2nd category
the selection of which category should be assigned the value 0 is arbitrary. The average
value of the response of individuals in the 2nd category is
E[Y |X1 , X2 = 0] = β0 + β1 X1
E[Y |X1 , X2 = 1] = β0 + β2 + β1 X1 .
The two means have the same slope β1 but different intercepts. The intercept of the 1st
category shifts by β2 when X2 = 1. β2 is known as the differential intercept coefficient
and it measures the differential effect of the categorical variable. In general β2 shows how
much higher( lower) the mean response line is for the 1st category is compared to the line
for the 2nd category( base category) for any given value of the categorical variable.
In general the category for which the categorical variable takes the value 0 is known as
the base category or reference category.
9
Estimation of regression coefficients:
Taking a sample of size n the model becomes
where: {
1, if response is in 1st category
Xi2 =
0, if response is in 2nd category
In matrix notation it is of the form
y = XB + ϵ
and B is
β0
b = β1
β2
10
Interpretation of regression coefficients
• If β1 < 0, then for every unit increase in x1 , the average response value de-
creases by β1 units.
• If β1 > 0, then for every unit increase in x1 , the average response value increases
by β1 units.
• If β2 < 0, then the average response value for the first category is β2 units
lower than the average response value for the second category.
• If β2 > 0, then the average response value for the first category is β2 units
higher than the average response value for the second category.
SST = y′ y − n(y)2
SSE = y′ y − b̂′ X′ y
SSR = SST − SSE
β̂j
Tj = ∼ t(n − 3), j = 1, 2
s.e(β̂j )
11
Example: An economist wanted to see the relationship between the speed with which
a particular insurance innovation is adopted (Y )( measured in months) and the worth of
the insurance firm (X1 ) (in million dollars) as well as the type of firm (X2 ). He studied
6 mutual firms and 6 stock firms and information obtained is presented below:
(iv) Are the two predictors individually associated with the response variable?
(v) Estimate the time that lapses before a mutual firm worth $ 173 million adopts an
insurance innovation.
(i)
17 1 151 1
26 1 92 1
21 1 175 1
22 1 104 1
12 1 210 1
β0
4 1 290 1
y=
;
B = β1 and X =
.
28 1 164 0
β2
15 1 272 0
11 1 295 0
31 1 85 0
20 1 166 0
30 1 124 0
−1
B̂ = (X′ X) X′ y
12
β̂0 40.823
B̂ = β̂1 = −0.099
β̂2 −6.892
(ii) While holding type of firm constant, an unit increase in worth of firm decreases the
time elapsed by 0.099;i.e for every additional $1 million in worth,the time a firm
takes to adopt an insurance innovation reduces by 0.1 months. Also while holding
the worth of firm constant, the time mutual firms take to adopt an innovation is on
average 6.89 months shorter than time taken by the stock firms.
0.05 level of significance, F (2, 9) = 4.26, F-ratio > F-tables, hence we reject H0 and
conclude that regression fit is significant.
This table shows that both the worth of the firm and the type of firm are significant
in explaining the the response variable. Note that none of the confidence intervals
contains the value 0, hence the rejection of the null hypothesis H0 : βi = 0, i =
0, 1, 2
(v) the fitted model is ŷ = 40.823 − 0.099x1 − 6.892x2 . The specific values of the
predictors are x1 = 173 and x2 = 1; hence ŷ = 40.823 − 0.099(173) − 6.892(1) =
16.8 ∼ 17. This firm would take 17 months before adopting an innovation.
13
Regression with one continuous and one categorical variable with more
than two categories:
Suppose that the qualitative variable has k categories, k > 2. In this case we represent
the variable using k − 1 dummy variables, each taking the values 0 or 1.
The model will be of the form:
Y = β0 + β1 X1 + β2 X2 + β3 X3 + ... + βk Xk + ϵ
where: {
1, if response is in 1st category
X2 =
0, otherwise
{
1, if response is in 2nd category
X3 =
0, otherwise
and so on until {
1, if response is in k − 1th category
Xk =
0, otherwise
The k − th category serves as the base category. The mean response lines are:
E[Y ] = (β0 + β2 ) + β1 X1 , when X2 = 1, X3 = 0, ...., Xk = 0
E[Y ] = (β0 + β3 ) + β1 X1 , when X2 = 0, X3 = 1, ...., Xk = 0
.. ..
. .
E[Y ] = (β0 + βk ) + β1 X1 , when X2 = 0, X3 = 0, ...., Xk = 1
E[Y ] = β0 + β1 X1 , when X2 = 0, X3 = 0, ...., Xk = 0
The regression coefficients with respect to the dummy variables are interpreted in relation
to the base category. They indicate respectively how much higher (lower) the average
response value is for the respective category compared to the base category.
Model analysis:
For a sample of size n the model takes the form
yi = β0 + β1 Xi1 + β2 Xi2 + β3 Xi3 + ... + βk Xik + ϵi , i = 1, 2, .., n
and in matrix notation it is of the form
y = Xb + ϵ
where X is the matrix
1 x11 1 0 . . . 0
1 x21 1 0 . . . 0
.. .. ..
. . .
1 xj1 0 1 . . . 0
X=
.. .. .. .. .
. . . .
1 xk1 0 0 . . . 0
.. .. .. ..
. . . .
1 xn1 0 0 ... 1
14
and b is
β0
β1
B=
β2 .
..
.
βk
The ANOVA table is of the form:
H0 : β2 = β3 = 0 = ... = βk = 0
vs.
H1 : not allβj are equal to zero
Y = β0 + β1 X1 + β2 X2 + β3 X3 + ... + βk Xk + ϵ
Reject the null hypotheses if the value of test statistic is greater than the critical value.
If null hypothesis is rejected we conclude that the predictor is significant after controlling
for quantitative predictor. On the other hand if we fail to reject the null hypothesis then
the qualitative predictor is not significant.
15
Prediction of response variable: prediction is similar to multiple linear model for
continuous variables.
Example: The data below gives the amount of scrap (Y),line speed (X1 ) and a cat-
egorical variable soap production line having three categories:
(i) Fit a regression line to the data and interpret the coefficient estimates.
(iv) Estimate the amount of scrap for soap production line III with line speed of 202.
Solution:
16
1 100 1 0
218
β0 1 125 1 0
248 .. ..
. .. ..
. .
β1 . .
y= .
. ; b= and X = .
β2 1 105 0 1
321 .. ..
β3 .. ..
214 . . . .
1 115 0 0
0.423345728 −0.001787603 −0.025306178 0.005678937
−0.001787603 0.00001023437 −0.0004912496 −0.0006686453
′ −1
(X X) =
−0.025306178 −0.0004912496 0.2458022038 0.143206086
0.005678937 −0.0006686453 0.143206086 0.2659070492
and
8766
2006513
X′ y = .
3262
2952
−1
B̂ = (X′ X) X′ y
β̂0 58.416
β̂1 1.289
B̂ = =
β̂2 17.018
β3 −39.768
From F-tables, at 0.05 level of significance, F (3, 23) = 3.03, F-ratio > F-tables,
hence we reject H0 and conclude that regression is significant.
17
(iii) A summary of the coefficients gives us:
Estimated Standard t-value p-value
regression coefficient error
Intercept 58.416
line speed 1.289 0.076 16.961 < 0.0001
line II -39.768 12.221 -3.254 0.004
lineI 17.018 11.75 1.448 0.161
Line speed is a significant predictor.
(iv) the fitted model is ŷ = 58.416 + 1.289X1 + 17.018X2 − 39.768X3 . The specific values
of predictors are x1 = 202,x2 = 0 and x3 = 0 ; hence ŷ = 58.416 + 1.289(202) +
17.018(0) − 39.768(0) = 318.794; the estimated amount of scrap is 319
18