Download as pdf or txt
Download as pdf or txt
You are on page 1of 42

Learning Objective:

At the end of the lesson, the student should be able to:


• Use a fitted multiple regression equation to make predictions;
• Use the ANOVA table to perform an F test for overall significance;
• Interpret coefficient of multiple determination and adjusted R2;
• Use the Coefficients table to determine significance of predictors;
• Detect multicollinearity and assess its effects;
• Analyze residuals to check for violations of residual assumptions.
Multiple Regression
▪ Multiple regression extends simple regression to include several
independent variables (called predictors or explanatory variables).
▪ It is required when a single-predictor model is inadequate to describe
the relationship between the response variable (Y) and its potential
predictors (X1, X2, X3, …).
▪ The interpretation is similar to simple regression since simple
regression is a special case of multiple regression.

Statistical Analysis with Software Applications, Mc Graw Hill


Multiple Regression
▪ Calculations are done by computer.
▪ Using multiple predictors is more than a matter of “improving its fit”.
Rather, it is a question of specifying a correct model.
▪ A low R2 in a simple regression model does not necessarily mean that
X and Y are unrelated but may simply indicate that the model is
incorrectly specified.
▪ Omission of relevant predictors (model misspecification) can cause
biased estimates and misleading results.

Statistical Analysis with Software Applications, Mc Graw Hill


Limitations of Simple Regression
▪ Multiple relationships usually exist.
▪ The estimates are biased if relevant predictors are omitted.
▪ The lack of fit (low R-squared) does not show that X is unrelated to Y
if the true model is multivariate.
▪ Simple regression is only used then there is a compelling need for a
simple model, or when other predictors have only modest effects and a
simple logical predictor ”stands out” as doing a very good job all by
itself.

Statistical Analysis with Software Applications, Mc Graw Hill


The population regression model

Unknown regression
coefficients
Response Random
Variable error term

𝑦 = 𝛽0 + 𝛽1 𝑋1 + 𝛽2 𝑋2 + ⋯ + 𝛽𝑘 𝑋𝑘 + 𝜀

Predictor variables

Statistical Analysis with Software Applications, Mc Graw Hill


The population regression model
In the population regression model,
▪ the random error ε represents everything that is not part of the model;
▪ the unknown regression coefficients, denoted by Greek letters, are
parameters;
▪ each coefficient shows the change in the expected value of y for a unit change
in Xi while holding everything constant (ceteris paribus).
▪ the errors are assumed to be unobservable, independent random
disturbances that are normally distributed with zero mean and constant
variance. Under these assumptions, the ordinary least squares (OLS)
estimation method yields unbiased, consistent, efficient estimates of the
unknown parameters.
Statistical Analysis with Software Applications, Mc Graw Hill
The estimated regression equation

Estimated slope coefficients


Estimated intercept or
Predicted value of constant
the Response
Variable

𝑦ො = 𝑏0 + 𝑏1 𝑋1 + 𝑏2 𝑋2 + ⋯ + 𝑏𝑘 𝑋𝑘

Predictor variables

Statistical Analysis with Software Applications, Mc Graw Hill


Fitted regression: comparison between
a 1-predictor model versus a 2-predictor model

Statistical Analysis with Software Applications, Mc Graw Hill


Example 1.
A distributor of frozen dessert pies wants to evaluate factors
though to influence demand.
- the dependent variable is pie sales (units per week)
- the independent variables are price (in USD) and
advertising cost (in hundred USD)

The data are collected for 15 weeks.

Statistical Analysis with Software Applications, Mc Graw Hill


Example 1.
Advertising Costs,
Week Pie Sales Price, $
($100s) Multiple regression equation:
1 350 5.50 3.3
2 460 7.50 3.3 Sales = 𝑏0 + 𝑏1 (Price) +𝑏2 (Ads cost)
3 350 8.00 3.0
4 430 8.00 4.5 or
5 350 6.80 3.0
6 380 7.50 4.0
Sales = 𝑏0 + 𝑏1 𝑋1 +𝑏2 𝑋2
7 430 4.50 3.0
8 470 6.40 3.7
9 450 7.00 3.5
10 490 5.00 4.0 Where: 𝑋1 = Price
11 340 7.20 3.5
12 300 7.90 3.2 𝑋1 = Ads cost
13 440 5.90 4.0
14 450 5.00 3.5
15 300 7.00 2.7
Statistical Analysis with Software Applications, Mc Graw Hill
Use Excel to generate the output

Multiple regression equation:

Sales = 306.526 - 24.975(X1 ) + 74.131(X 2 )

Statistical Analysis with Software Applications, Mc Graw Hill


Interpretation of the regression coefficients

Sales = 306.526 − 24.975(X1 ) + 74.131(X 2 )

𝑏1 = −24.975: 𝑏1 = −24.975:
Sales will decrease, on Sales will increase, on
average, by 24.975 pies per average, by 74.131 pies per
week for each $1 increase in week for each $100 increase
selling price, net of the in advertising cost, net of
effects of changes due to the effects of changes due to
advertising. price.
Statistical Analysis with Software Applications, Mc Graw Hill
Predict sales for a week if the selling price is 6.50 and
the advertising cost is $420:

Sales = 306.526 − 24.975 X1 + 74.131 X 2


= 306.526 − 24.975 6.50 + 74.131 4.20
= 780.2137

Note that advertising is in


$100s, so $420 means that
Predicted sales is X2 =4.20.
780.21 pies.

Statistical Analysis with Software Applications, Mc Graw Hill


ASSESSING OVERALL FIT
Similar to simple regression,
there is one residual for every
observation in a multiple
regression:

𝑒𝑖 = 𝑦𝑖 − 𝑦ො𝑖 for 𝑖 = 1, 2, … , 𝑛

Statistical Analysis with Software Applications, Mc Graw Hill


ASSESSING OVERALL FIT: F-test for significance

Before determining which, if any, of the individual predictors


are significant, we perform a global test for overall fit using the
F-test.

Statistical Analysis with Software Applications, Mc Graw Hill


ASSESSING OVERALL FIT: F-test for significance
For a regression with k predictors, the hypotheses to be tested are:

Ho: All the true coefficients are zero (𝛽1 = 𝛽2 = ⋯ = 𝛽𝑘 = 0)


H1: At least one of the coefficients is nonzero.
ANOVA Table Format

df SS MS F Significance F

Regression (explained) k MSR = SSR/k F = MSR/MSE

Residual (unexplained) n–k–1 MSE = SSE/(n-k-1)

Total n–1

Statistical Analysis with Software Applications, Mc Graw Hill


ASSESSING OVERALL FIT: F-test for significance

𝑀𝑆𝑅 14730.013
𝐹= = = 6.539
𝑀𝑆𝐸 2252.776
The p-value is 0.012. Reject the
null hypothesis at α=0.05.

There is sufficient evidence that


at least one independent variable
affects Y.

Statistical Analysis with Software Applications, Mc Graw Hill


COEFFICIENT OF MULTIPLE DETERMINATION
▪ The coefficient of multiple determination reports the
proportion total variation in Y that is explained by the
variation of all predictor variables taken together.
▪ It is also called r-squared and is obtained by:

𝑆𝑆𝑅 𝑟𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 𝑠𝑢𝑚 𝑜𝑓 𝑠𝑞𝑢𝑎𝑟𝑒𝑠


𝑟2 = =
𝑆𝑆𝑇 𝑡𝑜𝑡𝑎𝑙 𝑠𝑢𝑚 𝑜𝑓 𝑠𝑞𝑢𝑎𝑟𝑒𝑠

0 ≤ 𝑟2 ≤ 1
Statistical Analysis with Software Applications, Mc Graw Hill
ASSESSING OVERALL FIT: Coeff. of Multiple Determination

2
𝑆𝑆𝑅 24960.027
𝑅 = = = 0.521
𝑆𝑆𝑇 56493.333
52.1% of the variation in pie sales
is explained by the variation in
selling price and advertising cost.

Statistical Analysis with Software Applications, Mc Graw Hill


ADJUSTED R2
▪ R-squared decreases when a new predictor variable X is
added to the model.
▪ This can be a disadvantage when comparing models.
▪ What is the net effect of adding a new variables?
▪ We lose a degree of freedom when a new variable is
added.
▪ Did the new X variable add enough independent power to
offset the loss of one degree of freedom?

Statistical Analysis with Software Applications, Mc Graw Hill


ADJUSTED R2
▪ The adjusted R2 shows the proportion of variation in Y explained by
all X variables adjusted for the number of X variables used.

▪ It penalizes excessive use of unimportant predictor variables.


▪ It is smaller than R2.
▪ It is useful when comparing models.

Statistical Analysis with Software Applications, Mc Graw Hill


Adjusted R2

Adjusted 𝑅 2 = 0.442
44.2% of the variation in pie sales is explained by
the variation in selling price and advertising cost,
taking into account the sample size and number
of predictor variables.

Statistical Analysis with Software Applications, Mc Graw Hill


How many predictors?
▪ One way to prevent overfitting the model is to limit the
number of predictors based on the sample size.

▪ These rules are merely suggestions.

Statistical Analysis with Software Applications, Mc Graw Hill


SIGNIFICANCE OF PREDICTORS
▪ We are usually interested in testing each estimated coefficient
to see whether it is significantly different from zero, that is, if a
predictor variable helps explain the variation in Y.
▪ Use t-tests of individual variable slopes.
▪ Shows if there is a linear relationship between the variables Y
and Xi.
▪ Hypotheses: bj − 0
t=
Sb j
Statistical Analysis with Software Applications, Mc Graw Hill
Significance of Price as a predictor
−24.975−0
For price: 𝑡 = = −2.306,
10.832
𝑝 = 0.040 < α = .05

Statistical Analysis with Software Applications, Mc Graw Hill


Significance of advertising cost as a predictor
74.131 −0
For Ads cost: 𝑡 = 25.967 = 2.855,
𝑝 = 0.014 < α = .05
Reject the null hypothesis for both
variables. There is sufficient evidence that
both price and advertising cost affect pie
sales at the 0.05 level of significance.

Statistical Analysis with Software Applications, Mc Graw Hill


SIGNIFICANCE OF PREDICTORS

Statistical Analysis with Software Applications, Mc Graw Hill


Detecting MULTICOLLINEARITY
▪ When the predictor variables are related to each other instead of
being independent, we have a condition known as
multicollinearity.
▪ Multicollinearity induces variance inflation and makes the t
statistics less reliable.
▪ Least squares estimation fails when this condition is present.

Statistical Analysis with Software Applications, Mc Graw Hill


Detecting MULTICOLLINEARITY
Ways of detecting multicollinearity:
▪ To check whether 2 predictors are correlated, compute the
correlation coefficients. Suspect multicollinearity if two
predictors are highly correlated (r ≥ 0.80) or if the correlation
coefficient exceeds the multiple R.
▪ Multicollinearity is present if variance inflationary factor (VIF)
is at least 10. The VIF is provided in regression output in
JASP.

Statistical Analysis with Software Applications, Mc Graw Hill


REGRESSION DIAGNOSTICS
▪ Independence of errors – the error values (difference between
observed and estimated values) are statistically independent OR
non-autocorrelated. (for time-series data and panel data)
▪ Normality of error – the error values are normally distributed
for any given value of X
▪ Equal variance or homoskedasticity – the probability
distribution of the errors has constant variance.

Statistical Analysis with Software Applications, Mc Graw Hill


Checking the assumptions by examining the residuals

Residual
Analysis for
Equal variance:
Plot predicted
values against
residuals

Statistical Analysis with Software Applications, Mc Graw Hill


Checking the assumptions by examining the residuals
Residual Analysis for Normality:
1. Examine the Stem-and-Leaf Display of the Residuals
2. Examine the Box-and-Whisker Plot of the Residuals
3. Examine the Histogram of the Residuals
4. Construct a normal probability plot.
5. Construct a Q-Q plot.

Statistical Analysis with Software Applications, Mc Graw Hill


Checking the assumptions by examining the residuals

If residuals are normal, the probability plot


and the Q-Q plot should be approximately
linear.

Statistical Analysis with Software Applications, Mc Graw Hill


Checking the assumptions by examining the residuals

Residual Analysis for


Independence of Errors:
Plot times series X against
residuals

Independence of errors means that the


distribution of errors is random and is not
influenced by or correlated to the errors
in prior observations.

Statistical Analysis with Software Applications, Mc Graw Hill


Checking the assumptions by examining the residuals

Residual Analysis for


Independence of Errors:
Plot times series X against
residuals

Clearly, independence can be checked


when we know the order in which the
observations were made. The opposite of
independence is auto-correlation.

Statistical Analysis with Software Applications, Mc Graw Hill


Measuring Autocorrelation
▪ Another way of checking for independence of errors is by
testing the significance of the Durbin Watson Statistic.
▪ The Durbin-Watson Statistic measure detects the presence of
autocorrelation.
▪ It is used when data are collected over time to detect the
presence of autocorrelation.
▪ Autocorrelation exists if residuals in one time period are
related to residuals in another period.

Statistical Analysis with Software Applications, Mc Graw Hill


Measuring Autocorrelation
▪ The presence of autocorrelation of errors (or residuals)
violates the regression assumption that residuals are
statistically independent.

Statistical Analysis with Software Applications, Mc Graw Hill


The Durbin-Watson, DW, Statistic
▪ The DW statistic is used to test for autocorrelation.
n
H0: residuals are not correlated
H1: autocorrelation is present
 (e i − ei −1 ) 2

D= i =2
n
▪ The possible range is 0 ≤ D ≤ 4
i
e 2

i =1
▪ D should be close to 2 if H0 is true

▪ D less than 2 may signal positive The value of DW can be


autocorrelation, D greater than 2 may signal obtained from software like
negative autocorrelation
SPSS, Gretl and JASP.
Statistical Analysis with Software Applications, Mc Graw Hill
Sample
output
from JASP

Statistical Analysis with Software Applications, Mc Graw Hill


Statistical Analysis with Software Applications, Mc Graw Hill
Statistical Analysis with Software Applications, Mc Graw Hill

You might also like