Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 15

BOPR5204

Business Statistics

Week 3: Correlation and Simple Linear Regression

 Hongyanto Setio, MBA


Introduction
• In making decisions, we have to make some estimations/ predictions
(demand, price, etc.). The accuracy of the estimations will have significant
influence to the outcome/result of the decisions. Eg. Over/under capacity,
over/out of stock etc. In-efficiency and opportunity lost.
• When we have a cross-sectional (not time series) dataset with just a single
variable on a sample of observations, then the good way to
estimate/predict the value is using the mean. Why?
• The probability values near the mean is greater
than the probability values far from the mean.
 Using the mean give more accuracy
estimations

• Now we will learn techniques to estimate/predict the value of one


variable knowing the value of the other variables. The techniques also
help us to understand the relationship between the two variables, how a
change in one of the variables is associated with a change in the other
variable.
Scatter Diagram
A Scatter Diagram is a chart that portrays the
relationship between two variables.
For example:
Sales
Sales Rep Calls Copiers Sold
BV 96 41
Dependent
CR 40 41
variable
CS 104 51
GF 128 60
JH 164 61
MREY 76 29
MRUM 72 39
MK 80 50
RS Positive
36 28
RN 84 43 Independent
Relationship variable
RB 180 70
SS 132 56
SJ 120 45
SW 44 31
TK 84 30
Correlation Coefficient
 The Coefficient of Correlation (ρ or r) is a measure of the strength of the
linear relationship between two variables and direction of the relationship.
 It can range from -1.00 to +1.00.

 Values close to 0.0 indicate weak correlation.

 Negative values indicate an inverse relationship and positive values


indicate a direct relationship.
Pearson's product-moment correlation
Sales
Sales Rep Calls Copiers Sold
BV 96 41data: CopiersSold and SalesCalls
CR 40 41t = 6.2051, df = 13, p-value = 3.193e-05
CS 104 51
GF 128 60
alternative hypothesis: true correlation is not equal to 0
JH 164 6195 percent confidence interval:
MREY 76 29 0.6325270 0.9542427
MRUM 72 39sample estimates:
MK 80 50
RS 36 28 cor
RN 84 430.8646318
RB 180 70
SS 132 56
WhatSJ
does correlation
120
of 0.8646
45
mean?
It isSWpositive, so we
44 see there
31is a direct relationship between the number of
sales
TK calls and the84 number of
30 copiers sold. The value of 0.8646 is fairly close to
1.00, so we conclude that the association is strong.
Testing the Significance of
the Correlation Coefficient

H0:  = 0 (the correlation in the population is 0)


H1:  ≠ 0 (the correlation in the population is not 0)
Reject H0 if:
t > t/2,n-2 or t < -t/2,n-2
Testing the Significance of the Correlation Coefficient –
Copier Sales Example
H0:  = 0 (the correlation in the population is 0)
H1:  ≠ 0 (the correlation in the population is not 0)
Reject H0 if:
t > t/2,n-2 or t < -t/2,n-2
t > t0.025,13 or t < -t0.025,13
t > 2.160 or t < -2.160
r n2 0.865 15  2
t   6.216
1 r 2 1  0.8652
The computed t (6.216) is within the rejection region, therefore, we will
reject H0. This means the correlation in the population is not zero. There is
correlation with respect to the number of sales calls made and the number
of copiers sold in the population of salespeople.
Regression Analysis
o Regression analysis is a statistical technique that uses observed data to
relate the dependent variable to one or more independent variables
o Dependent (or response) variable: the variable we wish to predict. This
variable should be quantitative.
o Independent (or predictor) variable: the variable we will use to predict the
dependent variable.
o Regression equation: a mathematical model that expresses the
relationship between a dependent variable and independent variables.
yˆ  a  b1 x1  b2 x2  ...  bk xk

o Least Squares Criterion: determine the regression coefficients (a, b1, b2, …)
with the objective of minimizing the sum square error.
i n i n
SSE    i   ( yi  yˆ ) 2
2

i 1 i 1
Simple Linear Regression Equation
yˆ  a  bx

Sales
Calls Copiers Sold
Sales Rep (x) (y)
BV 96 41
CR 40 41 MIN SSE
CS 104 51
GF 128 60
JH 164 61
MREY 76 29
MRUM 72 39
MK 80 50
RS 36 28
RN 84 43
RB 180 70
SS 132 56
SJ 120 45
SW 44 31
TK 84 30
Call:
lm(formula = y ~ x, data = Dataset)

Residuals:
Min 1Q Median 3Q Max
-11.873 -2.861 0.255 3.511 10.595

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 19.9800 4.3897 4.552 0.000544 ***
x 0.2606 0.0420 6.205 3.19e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 6.72 on 13 degrees of freedom


Multiple R-squared: 0.7476, Adjusted R-squared: 0.7282
F-statistic: 38.5 on 1 and 13 DF, p-value: 3.193e-05
Problem Example:
a) Determine the regression equation.
b) Interpret the values of a and b.
c) Estimates the number of copier sold by a representative who made 84 calls.

d) Give comments to RN and TK sales.


Solution:
a) yˆ  19.98  0.2606 x
b) The b value of 0.2606 indicates that for each additional sales call, the sales
representative can expect to increase the number of copier sold in average by
0.2606 while other factors remain the same.
The value of 19.98 should not be interpreted as x=0 is outside the range of
values included in the sample (36 to 180 calls).
c) When x=84 then the expected number of copier sold=
19.98+0.2606*84=41.87
d) RN and TL have made the same 84 calls. However RN sold 43 (slightly
above the expected) unit and TK sold 30 unit (below the expected). The
manager may investigate the other factors beside the number of sales call.
Simple Linear Regression Equation
yˆ  a  bx
where:
 sy
b  r 

 r
 ( x  x )( y  y )
 sx  (n  1) s x s y

sx 
 ( x  x ) 2

sy 
 ( y  y ) 2

n 1 n 1

a  y  bx
Example: Simple Linear Regression Equation
Given a data set:
x: 4 5 3 6 10
y: 4 6 5 7 7
Find ŷ when x is 7.
x y
4 4 -1.6 -1.8 2.56 3.24 2.88
5 6 -0.6 0.2 0.36 0.04 -0.12
3 5 -2.6 -0.8 6.76 0.64 2.08
6 7 0.4 1.2 0.16 1.44 0.48
10 7 4.4 1.2 19.4 1.44 5.28
28 29 29.2 6.8 10.6
x  28 / 5  5.6 s x  29.2 /(5  1)  2.702
y  29 / 5  5.8 s y  6.8 /(5  1)  1.304
10.6
r  0.7522
(5  1)2.702 x1.304
Example: Simple Linear Regression Equation
Given a data set:
x: 4 5 3 6 10
y: 4 6 5 7 7
Find ŷ when x is 7.
x  5.6 y  5.8 s x  2.702 s y  1.304 r  0.7522
 sy 
b  r    0.7522 *1.304 / 2.702  0.363
 sx 

a  y  bx  5.8  0.363 * 5.6  3.767


When x=7 then yˆ  3.767  0.363 * 7  6.308
Exercises
a) Page 450, Self Review 13-2

b) Page 450, no 9

c) Page 450, no 10
d) Page 458, no 16 and no 17

You might also like