COMM5005 Lecture 8

Quantitative Methods
for Business
COMM5005
Lecture 8
Statistics Flow Chart
Statistics
Lecture 5
Probability
Descriptive Inferential
Distributions of Sampling distributions Lecture 6

Moments
Variables
First Moment: Mean Confidence Interval

Bi-variate Distribution Estimation
Lecture 7
Second Moment: Variance Binomial Distribution Lecture 6 Hypothesis Testing
Third Moment: Skewness Uniform Distribution Simple linear regression
Lecture 8
Normal Distribution Multiple linear regression

Lecture 8 topics 4-2
In this lecture we will cover:

• Simple linear regression
• Introduction to multiple regression
Objectives
Ø Conduct a simple regression and interpret the meaning of the

regression coefficients
Ø Use regression analysis to make prediction and inferences
Ø Evaluate the assumptions of regression analysis
Ø Construct a multiple regression model and analyse model output
Ø Incorporate categorical into a regression model
Readings
4-4
Sections of Berenson, M. et al. 5th ed. will help you to

understand this week’s topics more clearly.
Chapter Name Pages

12.1 – 12.5 Simple linear regression 455-476
13.1-13.4! 13.6 Introduction to multiple regression 504-519!
525-534
1. Simple linear regression
Motivating example:
Does the number of COVID19 cases in Sydney affect consumer
demand in Sydney? If so, how much is the impact? What is the
impact of an increase of 100 cases in a week?
What you need to answer the questions?

• Data on weekly consumer expenditure and the number of
COVID19 cases for the last 6 months in Sydney.
• A regression model to conduct regression analysis.
Regression analysis
• Regression analysis is used to:
• predict the value of a dependent variable (Y) based on the
value of at least one independent variable (X)
• explain the impact of changes in an independent variable on
the dependent variable
• Dependent variable (Y): the variable we wish to predict or
explain (response variable)
• Independent variable (X): the variable used to explain the
dependent variable (explanatory variable)
Introduction to regression analysis
Simple linear regression:

• Only one independent variable (X)
• Relationship between X and Y is described by a linear
function
• Changes in Y are assumed to be caused by changes in X
Types of linear relationships
Examples of types of relationships found in scatter diagrams

Simple linear regression model
Simple linear regression model
Prediction line
• The simple linear regression equation provides an estimate of the population

regression line.
• The individual random error terms ei have a mean of zero.

Estimation: Least squares method
• b0 and b1 are obtained by finding the values of b0 and b1 that minimise the sum of
the squared differences between actual values (Y) and predicted values ( Ŷ)
min å (Yi -Ŷi )2 = min å (Yi - (b0 + b1Xi ))2
• b0 is the estimated average value of Y when the value of X is zero

• b1 is the estimated change in the average value of Y as a result of a one-unit
change in X
Simple linear regression example (1 of 6)
• A manager of a local computer games store wishes to:

• examine the relationship between weekly sales (Y) and the number of
customers making purchases (X) over a 10 week period; and
• use the results of that examination to predict future weekly sales
Weekly sales in Number of

$1,000s Customers
(Y) (X)
245 1400
500
312 1600
400
Weekly sales
279 1700
300
308 1875
200
199 1100 100
219 1550 0
405 2350 0 500 1000 1500 2000 2500 3000
324 2450 Number of customers
319 1425
255 1700
Weekly sales = 98.24833 + 0.10977 (customers)

b0 is the estimated average value of Y when the value of X is zero (if X =
0 is in the range of observed X values)
• Here, for no customers, b0 = 98.2483 which appears nonsensical.
However, the intercept simply indicates that over the sample size selected,
the portion of weekly sales not explained by number of customers is
$98,248.33. Also note that X=0 is outside the range of observed values
b1 measures the estimated change in the average value of Y as a result
of a one-unit change in X
• Here, b1 = .10977 tells us that the average value of weekly sales increases
by .10977($1,000) = $109.77, on average, for each additional customer
• Predict the weekly sales for the local store for 2,000 customers
Weekly sales = 98.25 + 0.1098 (2000)

= 98.25 + 0.1098(2000)
= 317.85
• The predicted weekly sales for the local computer games store for 2,000
customers is 317.85 ($1,000s) = $317,850.
Measures of variation
• Total variation is made up of two parts.
SST = SSR + SSE

SST: Total sum SSR: SSE: Error sum
of squares Regression sum of squares
of squares
SST = å ( Yi - Y )2 SSR = å (Yˆi - Y ) 2 SSE = å (Yi - Yˆi ) 2
Measures the Explained variation Variation attributable

variation of the Yi attributable to the to factors other than
values around their relationship between the relationship
mean Y X and Y between X and Y
Measures of variation
Coefficient of determination, r2
The coefficient of determination is the portion of the total
variation in the dependent variable that is explained by
variation in the independent variable
The coefficient of determination is also called r-squared and

is denoted as r2
SSR regression sum of squares

r2 = =
SST total sum of squares
Note:
0 £ r2 £ 1
Assumptions
Use the acronym LINE:
Linearity
• The underlying relationship between X and Y is linear
Independence of errors
• Error values are statistically independent
Normality of error
• Error values (ε) are normally distributed for any given value of X
Equal variance (homoscedasticity)

• The probability distribution of the errors has constant variance
Residual analysis for linearity
Residual analysis for independence
Residual analysis for normality
A normal probability plot of the residuals can be used to check for normality.
Residual analysis for equal variance
(homoscedasticity)
Now, please Click the link in
Moodle to complete the
MyExperience Survey
Secondary details go here

2. Multiple regression model
Idea: Examine the linear relationship between

1 dependent (Y) and 2 or more independent variables (Xi)
Multiple Regression Model with k Independent Variables:
Y- Population
intercept slopes Random Error
Yi = β0 + β1X1i + β2 X2i + ! + βk Xki + ε i

Multiple Regression Equation (1 of 2)
The coefficients of the multiple regression model are estimated using sample
data
Multiple regression equation with k independent

variables:
Estimated Estimated slope

(or predicted) Estimated coefficients
value of Y intercept
Ŷi = b0 + b1X1i + b 2 X 2i + ! + bk Xki

We will always use Excel to obtain the regression slope coefficients and other
regression summary measures
Multiple Regression Equation (2 of 2)
Two variable model:
Pie Sales
Pie Price Advertising A distributor of frozen dessert
Week Sales ($) ($100s) pies wants to evaluate
1 350 5.50 3.3 factors thought to influence
2 460 7.50 3.3 demand
3 350 8.00 3.0
Dependent variable:
4 430 8.00 4.5
Pie sales (units per week)
5 350 6.80 3.0
6 380 7.50 4.0 Independent variables:
7 430 4.50 3.0 Advertising ($100s), Price (in $)
8 470 6.40 3.7 Data are collected for 15 weeks
9 450 7.00 3.5
Multiple regression equation:
10 490 5.00 4.0
11 340 7.20 3.5
12 300 7.90 3.2 Sales = b0 + b1 (Price)
13 440 5.90 4.0
14 450 5.00 3.5
+ b2 (Advertising)
15 300 7.00 2.7
Multiple Regression Output
Regression Statistics
Multiple R 0.72213
R Square 0.52148
Adjusted R Sales = 306.526 - 24.975(Price) + 74.131(Advertising)
Square 0.44172
Standard Error 47.46341
Observations 15
Significance
ANOVA df SS MS F F
Regression 2 29460.027 14730.01 6.53861 0.01201

Residual 12 27033.306 2252.776
Total 14 56493.333
Standard Upper
Coefficients Error t Stat P-value Lower 95% 95%
Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404
Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392
Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888

The multiple regression equation
Sales = 306.526 - 24.975(Price) + 74.131(Adv ertising)
b1 = -24.975: sales b2 = 74.131: sales will

will decrease, on increase, on average,
average, by 24.975 by 74.131 pies per
pies per week for week for each $100
each $1 increase in increase in advertising,
selling price, keeping all other
keeping all other factor(s) fixed.
factor(s) fixed.
Using the equation to make predictions
Predict sales for a week in which the selling price is $5.50 and advertising is $350:
Sales = 306.526 - 24.975(Price) + 74.131(Advertising)

= 306.526 - 24.975 (5.50) + 74.131 (3.5)
= 428.62
Predicted sales is Note: Advertising is in

428.62 pies $100s, so $350
means that X2 = 3.5
Coefficient of multiple determination
Reports the proportion of total variation in Y explained by all X variables
taken together
SSR regression sum of squares

r2 = =
SST total sum of squares
Coefficient of multiple determination
Multiple R 0.72213
R Square 0.52148
Adjusted R
Square 0.44172
Observations 15
Significance
ANOVA df SS MS F F
Regression 2 29460.027 14730.01 6.53861 0.01201

Residual 12 27033.306 2252.776
Total 14 56493.333
Standard
Coefficients Error t Stat P-value Lower 95% Upper 95%
Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404
Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392
Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888
Adjusted r2 (1 of 3)
• r2 never decreases when a new X variable is added to the
model
• this can be a disadvantage when comparing models
• What is the net effect of adding a new variable?

• we lose a degree of freedom when a new X variable is
added
• did the new X variable add enough explanatory power to
offset the loss of one degree of freedom?
Adjusted r2 shows the proportion of variation in Y explained by all X variables
adjusted for the number of X variables used
é 2 æ n - 1 öù
r 2
adj = 1 - ê(1 - r )ç ÷ú
ë è n - k - 1 øû
(where: n = sample size, k = number of independent variables)
• Penalises excessive use of unimportant independent variables
• Smaller than r2
• Useful in comparing among models
Multiple R 0.72213
R Square 0.52148
Adjusted R
Square 0.44172
Observations 15
ANOVA df SS MS F Significance F
Regression 2 29460.027 14730.01 6.53861 0.01201
Residual 12 27033.306 2252.776
Total 14 56493.333
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404
Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392
Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888
Is the model significant?
• F Test for Overall Significance of the Model
• Shows if there is a linear relationship between all of the X variables
considered together and Y
Hypotheses:
H0: β1 = β2 = … = βk = 0 (no linear relationship)
H1: at least one βi ≠ 0 (at least one independent variable
affects Y)
F test for overall significance (1 of 3)
Test statistic
SSR
MSR k
F= =
MSE SSE
n - k -1
where MSR: mean squares of the regression,

MSE: mean squares of the error,
and F has:
• (numerator) = K degrees of freedom, and
• (denominator) = (n – k – 1) degrees of freedom
Multiple R 0.72213
R Square 0.52148
Adjusted R
Square 0.44172
Observations 15
Significance
ANOVA df SS MS F F
Regression 2 29460.027 14730.01 6.53861 0.01201
Residual 12 27033.306 2252.776
Total 14 56493.333
Standard
Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404
Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392
Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888
Are individual variables significant?
We want to examine if there is a linear relationship between the

variable Xj and Y.
Hypotheses:
H0: βj = 0 (no linear relationship)
H1: βj ≠ 0 (linear relationship does exist)
Use t tests of individual variable slopes (between Xj and Y)

H0: βj = 0 (no linear relationship)
H1: βj ≠ 0 (linear relationship does exist between xj and y)
Test Statistic:
bj - 0
t=
Sb j
where the degree of freedom for this statistic is n – k – 1, and 𝑆!! is the
standard error of estimate 𝑏" .
Multiple R 0.72213
t-value for Price is t =
R Square 0.52148 -2.306, with p-value .0398
Adjusted R
Square 0.44172
t-value for Advertising is t =
Observations 15 2.855, with p-value .0145
Significance
ANOVA df SS MS F F
Regression 2 29460.027 14730.01 6.53861 0.01201
Residual 12 27033.306 2252.776
Total 14 56493.333
Standard
Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404
Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392
Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888
From Excel output:
H0: βi = 0 Coefficients
Standard
Error t Stat P-value
-
H1: βi ≠ 0 Price -24.97509 10.83213 2.30565 0.03979
Advertising 74.13096 25.96732 2.85478 0.01449
d.f. = 15-2-1 = 12
a = .05 ta/2 = 2.1788
Check: The statistic for each variable falls in
the rejection region (p-values < .05)
a/2=.025 a/2=.025
Conclusion: Reject H0 for each
variable.
Reject Do not reject H0 Reject There is evidence that both
H0 -tα/2 0 tα/2 H0
Price and Advertising affect
-2.1788 2.1788 pie sales at a = .05
Confidence Interval Estimate for the Slope
Confidence interval for the population slope βj
b j ± t n - k -1Sb j Where t has:

(n – k – 1) d.f.
Standard
Coefficients Error Here, t has:
Intercept 306.52619 114.25389 (15 – 2 – 1) = 12
Price -24.97509 10.83213 d.f.
Advertising 74.13096 25.96732
Example: Form a 95% confidence interval for the effect of changes in price (X1) on
pie sales: -24.975 ± (2.1788)(10.832)
So the interval is (-48.576, -1.374)

(This interval does not contain zero, so price has a significant effect on sales)
Confidence Interval Estimate for the Slope
Confidence interval for the population slope βi
Standard
Coefficients Error … Lower 95% Upper 95%
Intercept 306.52619 114.25389 … 57.58835 555.46404
Price -24.97509 10.83213 … -48.57626 -1.37392
Advertising 74.13096 25.96732 … 17.55303 130.70888
Example: Excel output also reports these interval

endpoints:
Weekly sales are estimated to be reduced by between

1.37 and 48.58 pies for each increase of $1 in the selling
price
Using dummy variables (1 of 3)
A dummy variable is a categorical explanatory variable with two levels:
• yes or no, on or off, male or female
• coded as 0 or 1
• if more than two levels, the number of dummy variables needed is
number of levels minus 1
Example (with 2 levels)
Ŷ = b0 + b1 X1 + b 2 X 2
Let: Y = pie sales

X1 = price
X2 = holiday (dummy variable)
(X2 = 1 if a holiday occurred during the week)
(X2 = 0 if there was no holiday that week)
Ŷ = b0 + b1 X1 + b 2 (1) = (b0 + b 2 ) + b1 X1 Holiday
Ŷ = b0 + b1 X1 + b 2 (0) = b0 + b1 X1 No holiday
Different Same
Y (sales) intercept slope
If H0: β2 = 0 is
b0 + b2
Holi rejected, then
day
(X = 1) ‘Holiday’ has a
b0 2
No h significant effect on
olida
y (X pie sales
2 = 0)
X1 (Price)
Notices
1-53
• The final online test will be held on 5 December 2022 from
9.00 am 1.00 pm. See the Moodle site for details.
• Topics examined in the final exam will be mostly statistics
(80% of questions), as well as financial mathematics and
calculus.
• This is an open-book test, so textbook and notes are
allowed. You can also use excel and any calculators you
want.
• Last eLearning tutorial and online quiz #3 will be released
this week.
• Last seminar will be next week (on week 10).
• Group assignment due at 6.00 pm on Friday 18 November.
• Finally, I am available for consultation until the final exam.

COMM5005 Lecture 8

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

COMM5005 Lecture 8

Uploaded by

Copyright:

Available Formats

Quantitative Methods

Distributions of Sampling distributions Lecture 6

First Moment: Mean Confidence Interval

Third Moment: Skewness Uniform Distribution Simple linear regression

Normal Distribution Multiple linear regression

In this lecture we will cover:

Ø Conduct a simple regression and interpret the meaning of the

Sections of Berenson, M. et al. 5th ed. will help you to

Chapter Name Pages

What you need to answer the questions?

Simple linear regression:

Examples of types of relationships found in scatter diagrams

• The simple linear regression equation provides an estimate of the population

• The individual random error terms ei have a mean of zero.

min å (Yi -Ŷi )2 = min å (Yi - (b0 + b1Xi ))2

• b0 is the estimated average value of Y when the value of X is zero

• A manager of a local computer games store wishes to:

Weekly sales in Number of

324 2450 Number of customers

Weekly sales = 98.24833 + 0.10977 (customers)

Weekly sales = 98.25 + 0.1098 (2000)

SST = SSR + SSE

SST = å ( Yi - Y )2 SSR = å (Yˆi - Y ) 2 SSE = å (Yi - Yˆi ) 2

Measures the Explained variation Variation attributable

The coefficient of determination is also called r-squared and

SSR regression sum of squares

Equal variance (homoscedasticity)

Secondary details go here

Idea: Examine the linear relationship between

Multiple Regression Model with k Independent Variables:

Yi = β0 + β1X1i + β2 X2i + ! + βk Xki + ε i

Multiple regression equation with k independent

Estimated Estimated slope

Ŷi = b0 + b1X1i + b 2 X 2i + ! + bk Xki

Regression 2 29460.027 14730.01 6.53861 0.01201

Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404

Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392

Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888

b1 = -24.975: sales b2 = 74.131: sales will

Sales = 306.526 - 24.975(Price) + 74.131(Advertising)

Predicted sales is Note: Advertising is in

SSR regression sum of squares

Regression 2 29460.027 14730.01 6.53861 0.01201

• What is the net effect of adding a new variable?

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

where MSR: mean squares of the regression,

We want to examine if there is a linear relationship between the

Use t tests of individual variable slopes (between Xj and Y)

b j ± t n - k -1Sb j Where t has:

So the interval is (-48.576, -1.374)

Example: Excel output also reports these interval

Weekly sales are estimated to be reduced by between

Let: Y = pie sales

Ŷ = b0 + b1 X1 + b 2 (1) = (b0 + b 2 ) + b1 X1 Holiday

You might also like