Chapter 8

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

CHAPTER 8: SIMPLE LINEAR REGRESSION

LEARNING OBJECTIVES:

At the end of this chapter, the student would be able to


 Draw a regression line
 Find the equation of regression line using conventional method
 Fit the simple linear regression equation using the least square method
 Fit the multiple linear regression equation for two predictor variable.
 Draw an ANOVA Table
 Test the significant of regression
 Interpret the meaning of the regression equation b, which is also the slope of
regression line
 Estimate dependent variable using regression line

INTRODUCTION

Definition
Independent variable, x – The variables used to predict or model y and denoted by the symbol
x1 , x2 , x3 , etc.

Dependent variable, y – The variable to be predicted or modelled.


Regression analysis – A statistical technique for investigating and modelling the relationship
between variables.

THE GRAPHICAL METHOD

Scatter Diagram

A scatter diagram is simply a two dimensional Cartesian plot of paired  xi , yi  values, where
i  1, 2,3, , n . From the diagram, we can have an idea about the kind of relationship between
the two variables.
Example 8.1
The data below obtained from a study of age and systolic blood pressure of six randomly
selected patients. Draw the scatter plot to view the relationship between age and pressure.
Patient Age, x Pressure, y
A 43 128
B 48 120
C 56 135
D 61 143
E 67 141
F 70 152

Solution:

160

150

140
Pressure

130

120

110
40 50 60 70
Age
THE COEFFICIENT OF DETERMINATION

The coefficient of determination is a measure if the variation of the dependent variable that is
explained by the regression line and the independent variable.
Formula

The coefficient of determination, R 2


SSR SSE
R2   1
S yy S yy

where

SSR  S yy  SSE  regression sum of squares.


SSE  S yy  ˆ1S xy  residual sum of squares.

S yy = corrected sum of squares of the observations. Since 0  SSE  S yy , it follows that


0  R2  1 .

The sample correlation coefficient, r


S xy
r
S xx S yy

S 
2
ˆ1S xy SSR
r     R2
2 xy

S xx S yy S yy S yy

where
2
 n 
  xi 
S xx   xi   i 1 
n
2

i 1 n
2
 n 
  yi 
yi 2   i 1 
n
S yy  
i 1 n

 n  n 
  xi   yi 
S xy   xi yi   i 1  i 1 
n

i 1 n
Table below shows that the value of r and the relationship between variables:
Value of r Relationship between Variables
r  1.00 Perfect positive linear relationship.
r  1.00 Perfect negative linear relationship.
0.50  r  1.00 Strong positive linear relationship.
0.50  r  1.00 Strong negative linear relationship.
0  r  0.50 Weak positive linear relationship.
0  r  0.50 Weak negative linear relationship.
r 0 None linear relationship.

SIMPLE LINEAR REGRESSION MODEL

The simple linear regression model is defined as below:

y  0  1 x  

where
x = independent variable or predictor
y = dependent variable or response variable
0 = the y – intercept of the line
1 = the slope of the line
 = a statistical error, that is, it is a random variable that accounts for the failure of
the model to fit the data exactly.

The Least Square Method

By using the least square method, we may estimate the unknown parameters, 0 and 1 , in
order to obtain the best-fitting line for a set of data. The least square method is the minimization
procedure for estimating the parameters. The estimated, or fitted regression line is given by

ŷ  ˆ0  ˆ1 x

where
1 n 1 n
ˆ0  y  ˆ1 x ;y   yi , x  xi
n i 1 n i 1
S
ˆ1  xy
S xx
Example 8.2
The data obtained in a study of age and blood pressure are as follow:
Age, x Pressure, y
43 128
48 120
56 135
58 137
61 143
67 141
70 152

a) Find S xx , S yy and S xy .
b) Find and interpret the sample correlation coefficient.
c) Find ˆ and ̂ .
1 0

d) Find the estimated regression line, ŷ .


e) Estimate value of ŷ if value of x  65 .

Solution:
Example 8.3
A study was made by a businesswoman to determine the relation between advertising cost daily
and sales closed. The data is as follow:
Advertising Costs (RM) Sales (RM)
40 385
25 395
30 475
40 490
50 560
25 480

a) Find S xx , S yy and S xy .
b) Find and interpret the sample correlation coefficient.
c) Find ˆ and ̂ .
1 0

d) Find the estimated regression line, ŷ and sketch this line.


e) Estimate the weekly sales when advertising costs are RM35.

Solution:

x y x2 xy y2
40 385
25 395
30 475
40 490
50 560
25 480
 x  210  y  2785  x 2
 7850  xy  99125  y 2
 1313975
EXERCISE
1) A study is done to investigate if Statistics scores have some effect on students’ CPA
scores. Data below are Statistics final examination scores of 10 randomly students and
their corresponding CPA scores.
Scores, x 87 69 75 56 63 90 71 74 80 78
CPA, y 3.41 3.15 3.28 2.46 2.89 3.73 3.11 3.23 3.50 3.34

a) Find S xx , S yy and Sxy .


b) Find and interpret the sample correlation coefficient, r .
c) Find ̂ 0 and ˆ1 .
d) Find the estimated regression line, ŷ and sketch the graph.
e) Predict a CPA score if a student get 65 in Statistics.

2) A supervisor wants to determine the relationship between the age of her employee and
the number of sick days they take each year. The data is as follow:

Age, x 18 21 25 36 48 53
Days, y 16 12 9 5 6 2

a) Find S xx , S yy and S xy .
b) Find and interpret the sample correlation coefficient.
c) Find ˆ and ̂ .
1 0

d) Find the estimated regression line, ŷ .


e) Estimate value of ŷ if value x  32 .

3) A researcher wishes to study the relationship between the monthly e-commerce sales
and the online advertising cost. You have the survey results for 7 online stores for the
last year. The data were recorded as follow:

Online cost, Sales, y


x
1.7 368
1.5 340
2.8 665
5 954
1.3 331
2.2 556
1.3 376

a) Find S xx , S yy and S xy .
b) Find and interpret the sample correlation coefficients.
c) Find ˆ and ̂ .
1 0

d) Find the estimated regression line, ŷ .


e) Estimate value of ŷ if value x  2.7 .

4) A study was made on the amount of converted sugar in a certain process at various
temperatures. The data were coded and recorded as follows.
Temperature, x Converted sugar, y
1.0 8.1
1.1 7.8
1.2 8.5
1.3 9.8
1.4 9.5
1.5 8.9
1.6 8.6
1.7 10.2
1.8 9.3
1.9 9.2
2.0 10.2

a) Find S xx , S yy and Sxy .


b) Find and interpret the sample correlation coefficient, r .
c) Find ̂ 0 and ˆ1 .
d) Find the estimated regression line, ŷ and sketch the graph.
e) Estimate the amount of converted sugar produced when the coded temperature
is 1.75.

You might also like