BMS2024-Multiple Linear Regression-1 Lesson

ADVANCED MANAGERIAL
STATISTICS
Multiple Linear Regression
Objectives
apply multiple regression analysis to business
decision-making situations
analyze and interpret the computer output for a
multiple regression model
test the significance of the multiple regression
model
test the significance of the independent
variables in a multiple regression model
Recap: Simple Linear Regression
What is regression analysis?
What does it mean by linear relationship?
What are dependent and independent
(predictor) variables?

The Multiple Regression Model
Idea: Examine the linear relationship between
1 dependent (Y) & 2 or more independent variables (X
i
)
X X X Y
ki k 2i 2 1i 1 0 i
+ + + + + =
Multiple Regression Model with k Independent Variables:
Y-intercept
Population slopes Random Error
Multiple Regression Equation
The coefficients of the multiple regression model are
estimated using sample data
ki k 2i 2 1i 1 0 i
X b X b X b b Y
+ + + + =
Estimated
(or predicted)
value of Y
Estimated slope coefficients
Multiple regression equation with k independent variables:
Estimated
intercept
In this chapter, we will always use Excel to obtain the
regression slope coefficients and other regression
summary measures.
Example:
2 Independent Variables
A distributor of frozen desert pies wants to evaluate
factors thought that influence demand

Dependent variable: Pie sales (units per week)
Independent variables: Price (in $)
Advertising ($100s)
Data are collected for 15 weeks
Pie Sales Example
Sales = b
0
+ b
1
(Price)
+ b
2
(Advertising)
Week
Pie
Sales
Price
($)
Advertising
($100s)
1 350 5.50 3.3
2 460 7.50 3.3
3 350 8.00 3.0
4 430 8.00 4.5
5 350 6.80 3.0
6 380 7.50 4.0
7 430 4.50 3.0
8 470 6.40 3.7
9 450 7.00 3.5
10 490 5.00 4.0
11 340 7.20 3.5
12 300 7.90 3.2
13 440 5.90 4.0
14 450 5.00 3.5
15 300 7.00 2.7
Multiple regression equation:
Estimating a Multiple Linear
Regression Equation
Excel will be used to generate the
coefficients and measures of goodness of
fit for multiple regression
Excel:
Data / Data Analysis... / Regression
Instructions are attached here.
Multiple Linear Regression: Excel
Summary Output
Multiple Regression Excel Output
Regression Statistics
Multiple R 0.72213
R Square 0.52148
Adjusted R
Square 0.44172
Standard Error 47.46341
Observations 15
ANOVA

df SS MS F Significance F
Regression 2 29460.027 14730.013 6.53861 0.01201
Residual (Error) 12 27033.306 2252.776
Total 14 56493.333

Coefficien
ts
Standard
Error t Stat P-value Lower 95%
Upper
95%
Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404
Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392
Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888
ertising) 74.131(Adv ce) 24.975(Pri - 306.526 Sales + =
The Multiple Regression Equation
ertising) 74.131(Adv ce) 24.975(Pri - 306.526 Sales + =
b
1
= -24.975: sales
will decrease, on
average, by 24.975
pies per week for
each $1 increase in
selling price, holding
advertising constant
b
2
= 74.131: sales will
increase, on average,
by 74.131 pies per
week for each $100
increase in
advertising, holding
price constant
where
Sales is in number of pies per week
Price is in $
Advertising is in $100s.
Using The Equation to Make
Predictions
Predict sales for a week in which the selling
price is $5.50 and advertising is $350:
Predicted sales
is 428.62 pies
428.62
(3.5) 74.131 (5.50) 24.975 - 306.526
ertising) 74.131(Adv ce) 24.975(Pri - 306.526 Sales
=
+ =
+ =
Note that Advertising is
in $100s, so $350
means that X
2
= 3.5
Measures of Variation
Total variation is made up of two parts:
SSE SSR SST + =
Total Sum of
Squares
Regression Sum
of Squares
Error Sum of
Squares
=
2
i
) Y Y ( SST

=
2
i i
) Y
Y ( SSE
=
2
i
) Y Y
( SSR
where:
= Average value of the dependent variable
Y
i
= Observed values of the dependent variable

i
= Predicted value of Y for the given X
i
value Y
Y
SST = total sum of squares
Measures the variation of the Y
i
values around their
mean Y
SSR = regression sum of squares
Explained variation attributable to the relationship
between X and Y
SSE = error sum of squares
Variation attributable to factors other than the
relationship between X and Y
X
i
Y
X
Y
i
SST

SSE
SSR

_
Y
.
Y
Y
_
Y
.
Regression Line
The coefficient of determination is the portion
of the total variation in the dependent
variable that is explained by variation in the
independent variable
The coefficient of determination is also called
R-squared and is denoted as R
2

Coefficient of Determination, R
2
1 R 0
2
s s
note:
squares of sum
squares of regression
2
total
sum
SST
SSR
R = =
Composition of Total Variation
Total
Variation
Explained
Variation
Unexplained
Variation
SST
SSRegression
(SSR)
SSResidual / SSError
(SSE)
=

R
2
= 1
How Strong is The Model?
Y
X
Y
X
R
2
= 1
R
2
= 1
Perfect linear relationship
between X and Y:

100% of the variation in Y is
explained by variation in X
Y
X
Y
X
0 < R
2
< 1
Weaker linear relationships
between X and Y:

Some but not all of the
variation in Y is explained
by variation in X
R
2
= 0
No linear relationship between
X and Y:

The value of Y does not
depend on X. (None of the
variation in Y is explained by
variation in X)
Y
X
R
2
= 0
Multiple R 0.72213
R Square 0.52148
Adjusted R
Square 0.44172
Observations 15
ANOVA

df SS MS F
Significance
F
Regression 2 29460.03 14730.02 6.539 0.01201
Residual (Error) 12 27033.31 2252.78
Total 14 56493.34

Coefficien
ts
Standard
Upper
95%
Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404
Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392
Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888
.52148
56493.3
29460.0
SST
SSR
R
2
= = =
52.1% of the variation in pie sales is
explained by the variation in price
and advertising. 47.9% is the
unexplained variation.
Coefficient of Determination
Adjusted Coefficient of Determination (R
2
adj
)
R
2
never decreases when a new X variable is
added to the model
This can be a disadvantage when comparing
models
What is the net effect of adding a new variable?
We lose a degree of freedom when a new X
variable is added
Did the new X variable add enough
explanatory power to offset the loss of one
degree of freedom?
Shows the proportion of variation in Y explained by all X
variables adjusted for the number of X variables used

(where n = sample size, p = number of independent
variables)

Penalize excessive use of unimportant independent
variables
Smaller than R
2
Useful in comparing among models
Adjusted R
2
(
|
|
.
|
\
|

=
1
1
) 1 ( 1
2 2
p n
n
R R
adj
Using the pie sales example:

(where n = sample size, p = number of independent variables)

Adjusted R
2
(computation)

4417 . 0
1 2 15
1 15
) 5215 . 0 1 ( 1
1
1
) 1 ( 1
2 2
=
(
|
.
|
\
|

=
(
|
|
.
|
\
|

=
p n
n
R R
adj
Multiple R 0.72213
R Square 0.52148
Adjusted R
Square 0.44172
Observations 15
ANOVA

df SS MS F
Significance
F
Regression 2 29460.027 14730.013 6.53861 0.01201
Residual (Error) 12 27033.306 2252.776
Total 14 56493.333

Coefficien
ts
Standard
Upper
95%
Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404
Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392
Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888
.44172 R
2
adj
=
44.2% of the variation in pie sales is
explained by the variation in price and
advertising, taking into account the sample
size and number of independent variables
Adjusted R
2
Is the Model Significant?
F-Test for Overall Significance of the Model
Shows if there is a linear relationship between all of the
X variables considered together and Y
Use F-test statistic
Hypotheses:
H
0
:
1
=
2
= =
k
= 0 (no linear relationship)
H
1
: At least one
i
0 (at least one independent
variable affects Y)
F-Test for Overall Significance
Test statistic:

=

=
()
()

F-distribution
Like t-distribution, the shape of F-distribution
curve depends on the number of degrees of
freedom (df).
It has two degrees of freedom (i.e. df numerator &
df denominator).
It is right skewed but skewness decreases as the df
increases.

Characteristics:
The F-distribution is continuous and skewed to the right
The units of an F-distribution are nonnegative.

Critical Value: F-distribution
o = 0.05

F
o,df
1
,df
2
Rejection Region

Non
Rejection
Region

Degree of freedom (df
1
) = p
Degree of freedom (df
2
) = (n p 1) where p = number of predictors
F
0

Decision Rule:
If F test statistic > F
o,df
or
p-value < o = 0.05

So, reject H
0
otherwise
Do Not Reject H
0

6.5386
2252.8
14730.0
MSE
MSR
F = = =
Multiple R 0.72213
R Square 0.52148
Adjusted R
Square 0.44172
Observations 15
ANOVA

df SS MS F
Significan
ce F
Regression 2 29460.027 14730.013 6.53861 1.20E-02
Error 12 27033.306 2252.776
Total 14 56493.333

Coefficient
s
Standard
Upper
95%
Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404
Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392
Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888
With 2 and 12 degrees
of freedom
p-value for
the F-Test
= 0.012
H
0
:
1
=
2
= 0
H
1
:
1
and
2
not both zero
o = .05
df
1
= 2 df
2
= 12
Test Statistic:

Decision:

Conclusion:

Since F test statistic is in the
rejection region (p-value <
.05), reject H
0

There is evidence that at least one
independent variable affects Y
0
o = .05

F
.05
= 3.885
Reject H
0
Do not
reject H
0
6.5386
MSE
MSR
F = =
Critical
Value:
F
o
= 3.885
F
The hypothesis may also be written as such:

H
0
: The overall model is not valid (not significant)
H
1
: The overall model is valid (significant)

The rest of the hypothesis steps remains the same.
Are Individual Variables Significant?
Use t-tests of individual variable slopes
Shows if there is a linear relationship
between the variable X
i
and Y
Hypotheses:
H
0
:
i
H
1
:
i
0 (linear relationship does exist
between X
i
and Y)
H
0
:
i
H
1
:
i
0 (significant linear relationship)
where i = 1 to x

Test Statistic:

(df = n p 1)
i
b
i
S
b
t =
Multiple R 0.72213
R Square 0.52148
Adjusted R
Square 0.44172
Observations 15
ANOVA

df SS MS F
Significanc
e F
Regression 2 29460.027 14730.013 6.53861 1.20E-02
Residual/Error 12 27033.306 2252.776
Total 14 56493.333
Coefficients
Standard
Upper
95%
Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404
Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392
Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888
t-value for Price is t = -2.306, with
p-value .0398

t-value for Advertising is t = 2.855,
with p-value .0145
d.f. = 15-2-1 = 12
o = .05
t
o/2
= 2.179
Inferences about the Slope:
t-test Example
H
0
:
i
= 0
H
1
:
i
= 0
The test statistic for each variable falls
in the rejection region (p-values < .05)
There is evidence that both
Price and Advertising affect
pie sales at o = .05
From Excel output:
Reject H
0
for each variable
Coefficients Standard Error t Stat P-value
Price -24.97509 10.83213 -2.30565 0.03979
Advertising 74.13096 25.96732 2.85478 0.01449
Decision:
Conclusion:
Reject H
0
Reject H
0
o/2=.025
-t
/2
Do not reject H
0
0

t
/2
o/2=.025
-2.179 2.179
Summary
Developed the multiple regression model
Tested the significance of the multiple
regression model
Discussed the R
2
& adjusted R
2
Interpreted the regression coefficients
Tested individual regression coefficients

BMS2024-Multiple Linear Regression-1 Lesson

Uploaded by

Copyright:

Available Formats

You might also like

BMS2024-Multiple Linear Regression-1 Lesson

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

BMS2024-Multiple Linear Regression-1 Lesson

Uploaded by

Copyright:

Available Formats

ADVANCED MANAGERIAL

You might also like