Module 6D - Multiple Linear Regression Analysis PDF

Learning Objective:
At the end of the lesson, the student should be able to:

• Use a fitted multiple regression equation to make predictions;
• Use the ANOVA table to perform an F test for overall significance;
• Interpret coefficient of multiple determination and adjusted R2;
• Use the Coefficients table to determine significance of predictors;
• Detect multicollinearity and assess its effects;
• Analyze residuals to check for violations of residual assumptions.
Multiple Regression
▪ Multiple regression extends simple regression to include several
independent variables (called predictors or explanatory variables).
▪ It is required when a single-predictor model is inadequate to describe
the relationship between the response variable (Y) and its potential
predictors (X1, X2, X3, …).
▪ The interpretation is similar to simple regression since simple
regression is a special case of multiple regression.
Statistical Analysis with Software Applications, Mc Graw Hill

Multiple Regression
▪ Calculations are done by computer.
▪ Using multiple predictors is more than a matter of “improving its fit”.
Rather, it is a question of specifying a correct model.
▪ A low R2 in a simple regression model does not necessarily mean that
X and Y are unrelated but may simply indicate that the model is
incorrectly specified.
▪ Omission of relevant predictors (model misspecification) can cause
biased estimates and misleading results.

Limitations of Simple Regression
▪ Multiple relationships usually exist.
▪ The estimates are biased if relevant predictors are omitted.
▪ The lack of fit (low R-squared) does not show that X is unrelated to Y
if the true model is multivariate.
▪ Simple regression is only used then there is a compelling need for a
simple model, or when other predictors have only modest effects and a
simple logical predictor ”stands out” as doing a very good job all by
itself.

The population regression model
Unknown regression
coefficients
Response Random
Variable error term
𝑦 = 𝛽0 + 𝛽1 𝑋1 + 𝛽2 𝑋2 + ⋯ + 𝛽𝑘 𝑋𝑘 + 𝜀
Predictor variables

The population regression model
In the population regression model,
▪ the random error ε represents everything that is not part of the model;
▪ the unknown regression coefficients, denoted by Greek letters, are
parameters;
▪ each coefficient shows the change in the expected value of y for a unit change
in Xi while holding everything constant (ceteris paribus).
▪ the errors are assumed to be unobservable, independent random
disturbances that are normally distributed with zero mean and constant
variance. Under these assumptions, the ordinary least squares (OLS)
estimation method yields unbiased, consistent, efficient estimates of the
unknown parameters.
The estimated regression equation
Estimated slope coefficients

Estimated intercept or
Predicted value of constant
the Response
Variable
𝑦ො = 𝑏0 + 𝑏1 𝑋1 + 𝑏2 𝑋2 + ⋯ + 𝑏𝑘 𝑋𝑘
Predictor variables

Fitted regression: comparison between
a 1-predictor model versus a 2-predictor model

Example 1.
A distributor of frozen dessert pies wants to evaluate factors
though to influence demand.
- the dependent variable is pie sales (units per week)
- the independent variables are price (in USD) and
advertising cost (in hundred USD)
The data are collected for 15 weeks.

Example 1.
Advertising Costs,
Week Pie Sales Price, $
($100s) Multiple regression equation:
1 350 5.50 3.3
2 460 7.50 3.3 Sales = 𝑏0 + 𝑏1 (Price) +𝑏2 (Ads cost)
3 350 8.00 3.0
4 430 8.00 4.5 or
5 350 6.80 3.0
6 380 7.50 4.0
Sales = 𝑏0 + 𝑏1 𝑋1 +𝑏2 𝑋2
7 430 4.50 3.0
8 470 6.40 3.7
9 450 7.00 3.5
10 490 5.00 4.0 Where: 𝑋1 = Price
11 340 7.20 3.5
12 300 7.90 3.2 𝑋1 = Ads cost
13 440 5.90 4.0
14 450 5.00 3.5
15 300 7.00 2.7
Use Excel to generate the output
Multiple regression equation:
Sales = 306.526 - 24.975(X1 ) + 74.131(X 2 )

Interpretation of the regression coefficients
Sales = 306.526 − 24.975(X1 ) + 74.131(X 2 )
𝑏1 = −24.975: 𝑏1 = −24.975:
Sales will decrease, on Sales will increase, on
average, by 24.975 pies per average, by 74.131 pies per
week for each $1 increase in week for each $100 increase
selling price, net of the in advertising cost, net of
effects of changes due to the effects of changes due to
advertising. price.
Predict sales for a week if the selling price is 6.50 and
the advertising cost is $420:
Sales = 306.526 − 24.975 X1 + 74.131 X 2

= 306.526 − 24.975 6.50 + 74.131 4.20
= 780.2137
Note that advertising is in

$100s, so $420 means that
Predicted sales is X2 =4.20.
780.21 pies.

ASSESSING OVERALL FIT
Similar to simple regression,
there is one residual for every
observation in a multiple
regression:
𝑒𝑖 = 𝑦𝑖 − 𝑦ො𝑖 for 𝑖 = 1, 2, … , 𝑛

ASSESSING OVERALL FIT: F-test for significance
Before determining which, if any, of the individual predictors

are significant, we perform a global test for overall fit using the
F-test.

For a regression with k predictors, the hypotheses to be tested are:
Ho: All the true coefficients are zero (𝛽1 = 𝛽2 = ⋯ = 𝛽𝑘 = 0)

H1: At least one of the coefficients is nonzero.
ANOVA Table Format
df SS MS F Significance F
Regression (explained) k MSR = SSR/k F = MSR/MSE
Residual (unexplained) n–k–1 MSE = SSE/(n-k-1)
Total n–1

𝑀𝑆𝑅 14730.013
𝐹= = = 6.539
𝑀𝑆𝐸 2252.776
The p-value is 0.012. Reject the
null hypothesis at α=0.05.
There is sufficient evidence that

at least one independent variable
affects Y.

COEFFICIENT OF MULTIPLE DETERMINATION
▪ The coefficient of multiple determination reports the
proportion total variation in Y that is explained by the
variation of all predictor variables taken together.
▪ It is also called r-squared and is obtained by:
𝑆𝑆𝑅 𝑟𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 𝑠𝑢𝑚 𝑜𝑓 𝑠𝑞𝑢𝑎𝑟𝑒𝑠

𝑟2 = =
𝑆𝑆𝑇 𝑡𝑜𝑡𝑎𝑙 𝑠𝑢𝑚 𝑜𝑓 𝑠𝑞𝑢𝑎𝑟𝑒𝑠
0 ≤ 𝑟2 ≤ 1
ASSESSING OVERALL FIT: Coeff. of Multiple Determination
2
𝑆𝑆𝑅 24960.027
𝑅 = = = 0.521
𝑆𝑆𝑇 56493.333
52.1% of the variation in pie sales
is explained by the variation in
selling price and advertising cost.

ADJUSTED R2
▪ R-squared decreases when a new predictor variable X is
added to the model.
▪ This can be a disadvantage when comparing models.
▪ What is the net effect of adding a new variables?
▪ We lose a degree of freedom when a new variable is
added.
▪ Did the new X variable add enough independent power to
offset the loss of one degree of freedom?

ADJUSTED R2
▪ The adjusted R2 shows the proportion of variation in Y explained by
all X variables adjusted for the number of X variables used.
▪ It penalizes excessive use of unimportant predictor variables.

▪ It is smaller than R2.
▪ It is useful when comparing models.

Adjusted R2
Adjusted 𝑅 2 = 0.442
44.2% of the variation in pie sales is explained by
the variation in selling price and advertising cost,
taking into account the sample size and number
of predictor variables.

How many predictors?
▪ One way to prevent overfitting the model is to limit the
number of predictors based on the sample size.
▪ These rules are merely suggestions.

SIGNIFICANCE OF PREDICTORS
▪ We are usually interested in testing each estimated coefficient
to see whether it is significantly different from zero, that is, if a
predictor variable helps explain the variation in Y.
▪ Use t-tests of individual variable slopes.
▪ Shows if there is a linear relationship between the variables Y
and Xi.
▪ Hypotheses: bj − 0
t=
Sb j
Significance of Price as a predictor
−24.975−0
For price: 𝑡 = = −2.306,
10.832
𝑝 = 0.040 < α = .05

Significance of advertising cost as a predictor
74.131 −0
For Ads cost: 𝑡 = 25.967 = 2.855,
𝑝 = 0.014 < α = .05
Reject the null hypothesis for both
variables. There is sufficient evidence that
both price and advertising cost affect pie
sales at the 0.05 level of significance.

SIGNIFICANCE OF PREDICTORS

Detecting MULTICOLLINEARITY
▪ When the predictor variables are related to each other instead of
being independent, we have a condition known as
multicollinearity.
▪ Multicollinearity induces variance inflation and makes the t
statistics less reliable.
▪ Least squares estimation fails when this condition is present.

Detecting MULTICOLLINEARITY
Ways of detecting multicollinearity:
▪ To check whether 2 predictors are correlated, compute the
correlation coefficients. Suspect multicollinearity if two
predictors are highly correlated (r ≥ 0.80) or if the correlation
coefficient exceeds the multiple R.
▪ Multicollinearity is present if variance inflationary factor (VIF)
is at least 10. The VIF is provided in regression output in
JASP.

REGRESSION DIAGNOSTICS
▪ Independence of errors – the error values (difference between
observed and estimated values) are statistically independent OR
non-autocorrelated. (for time-series data and panel data)
▪ Normality of error – the error values are normally distributed
for any given value of X
▪ Equal variance or homoskedasticity – the probability
distribution of the errors has constant variance.

Checking the assumptions by examining the residuals
Residual
Analysis for
Equal variance:
Plot predicted
values against
residuals

Residual Analysis for Normality:
1. Examine the Stem-and-Leaf Display of the Residuals
2. Examine the Box-and-Whisker Plot of the Residuals
3. Examine the Histogram of the Residuals
4. Construct a normal probability plot.
5. Construct a Q-Q plot.

If residuals are normal, the probability plot

and the Q-Q plot should be approximately
linear.

Residual Analysis for

Independence of Errors:
Plot times series X against
residuals
Independence of errors means that the

distribution of errors is random and is not
influenced by or correlated to the errors
in prior observations.

Residual Analysis for

Independence of Errors:
Plot times series X against
residuals
Clearly, independence can be checked

when we know the order in which the
observations were made. The opposite of
independence is auto-correlation.

Measuring Autocorrelation
▪ Another way of checking for independence of errors is by
testing the significance of the Durbin Watson Statistic.
▪ The Durbin-Watson Statistic measure detects the presence of
autocorrelation.
▪ It is used when data are collected over time to detect the
presence of autocorrelation.
▪ Autocorrelation exists if residuals in one time period are
related to residuals in another period.

Measuring Autocorrelation
▪ The presence of autocorrelation of errors (or residuals)
violates the regression assumption that residuals are
statistically independent.

The Durbin-Watson, DW, Statistic
▪ The DW statistic is used to test for autocorrelation.
n
H0: residuals are not correlated
H1: autocorrelation is present
 (e i − ei −1 ) 2
D= i =2
n
▪ The possible range is 0 ≤ D ≤ 4
i
e 2
i =1
▪ D should be close to 2 if H0 is true
▪ D less than 2 may signal positive The value of DW can be

autocorrelation, D greater than 2 may signal obtained from software like
negative autocorrelation
SPSS, Gretl and JASP.
Sample
output
from JASP


Module 6D - Multiple Linear Regression Analysis PDF

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Module 6D - Multiple Linear Regression Analysis PDF

Uploaded by

Copyright:

Available Formats

Learning Objective:

At the end of the lesson, the student should be able to:

Statistical Analysis with Software Applications, Mc Graw Hill

Statistical Analysis with Software Applications, Mc Graw Hill

Statistical Analysis with Software Applications, Mc Graw Hill

Statistical Analysis with Software Applications, Mc Graw Hill

Estimated slope coefficients

Statistical Analysis with Software Applications, Mc Graw Hill

Statistical Analysis with Software Applications, Mc Graw Hill

The data are collected for 15 weeks.

Statistical Analysis with Software Applications, Mc Graw Hill

Multiple regression equation:

Sales = 306.526 - 24.975(X1 ) + 74.131(X 2 )

Statistical Analysis with Software Applications, Mc Graw Hill

Sales = 306.526 − 24.975(X1 ) + 74.131(X 2 )

Sales = 306.526 − 24.975 X1 + 74.131 X 2

Note that advertising is in

Statistical Analysis with Software Applications, Mc Graw Hill

Statistical Analysis with Software Applications, Mc Graw Hill

Before determining which, if any, of the individual predictors

Statistical Analysis with Software Applications, Mc Graw Hill

Ho: All the true coefficients are zero (𝛽1 = 𝛽2 = ⋯ = 𝛽𝑘 = 0)

Regression (explained) k MSR = SSR/k F = MSR/MSE

Residual (unexplained) n–k–1 MSE = SSE/(n-k-1)

Statistical Analysis with Software Applications, Mc Graw Hill

There is sufficient evidence that

Statistical Analysis with Software Applications, Mc Graw Hill

𝑆𝑆𝑅 𝑟𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 𝑠𝑢𝑚 𝑜𝑓 𝑠𝑞𝑢𝑎𝑟𝑒𝑠

Statistical Analysis with Software Applications, Mc Graw Hill

Statistical Analysis with Software Applications, Mc Graw Hill

▪ It penalizes excessive use of unimportant predictor variables.

Statistical Analysis with Software Applications, Mc Graw Hill

Statistical Analysis with Software Applications, Mc Graw Hill

▪ These rules are merely suggestions.

Statistical Analysis with Software Applications, Mc Graw Hill

Statistical Analysis with Software Applications, Mc Graw Hill

Statistical Analysis with Software Applications, Mc Graw Hill

Statistical Analysis with Software Applications, Mc Graw Hill

Statistical Analysis with Software Applications, Mc Graw Hill

Statistical Analysis with Software Applications, Mc Graw Hill

Statistical Analysis with Software Applications, Mc Graw Hill

Statistical Analysis with Software Applications, Mc Graw Hill

Statistical Analysis with Software Applications, Mc Graw Hill

If residuals are normal, the probability plot

Statistical Analysis with Software Applications, Mc Graw Hill

Residual Analysis for

Independence of errors means that the

Statistical Analysis with Software Applications, Mc Graw Hill

Residual Analysis for

Clearly, independence can be checked

Statistical Analysis with Software Applications, Mc Graw Hill

Statistical Analysis with Software Applications, Mc Graw Hill

Statistical Analysis with Software Applications, Mc Graw Hill

▪ D less than 2 may signal positive The value of DW can be

Statistical Analysis with Software Applications, Mc Graw Hill

You might also like