Scatter Plot/Diagram Simple Linear Regression Model

CHAPTER 3:
INTRODUCTORY
LINEAR REGRESSION
Chapter Outline
3.1 Simple Linear Regression
•Scatter Plot/Diagram
•Simple Linear Regression Model
3.2 Curve Fitting
3.3 Inferences About Estimated Parameters
3.4 Adequacy of the model coefficient of
determination
3.5 Pearson Product Moment Correlation
Coefficient
3.6 Test for Linearity of Regression
3.7 ANOVA Approach Testing for Linearity of
Regression
INTRODUCTION TO LINEAR
REGRESSION
 Regression – is a statistical procedure for establishing

the r/ship between 2 or more variables.
 This is done by fitting a linear equation to the
observed data.
 The regression line is used by the researcher to see the
trend and make prediction of values for the data.
 There are 2 types of relationship:
Simple ( 2 variables)
Multiple (more than 2 variables)
 Many problems in science and engineering
involve exploring the relationship between two
or more variables.
 Two statistical techniques:
(1) Regression Analysis
(2) Computing the Correlation Coefficient (r).
 Linear regression - study on the linear
relationship between two or more variables.
 This is done by fitting a linear equation to the
observed data.
 The linear equation is then used to predict
values for the data.
 In simple linear regression only two variables
are involved:
i. X is the independent variable.
ii. Y is dependent variable.

 The correlation coefficient (r ) tells us how
strongly two variables are related.
Example 3.1:
1) A nutritionist studying weight loss programs might

wants to find out if reducing intake of carbohydrate can
help a person reduce weight.
a) X is the carbohydrate intake (independent
5
variable).
b) Y is the weight (dependent variable).
2) An entrepreneur might want to know whether

increasing the cost of packaging his new product will
have an effect on the sales volume.
a) X is cost
b) Y is sales volume
3.1 SIMPLE LINEAR REGRESSION MODEL
 Linear regression model is a model that
expresses the linear relationship between two
variables.
 The simple linear regression model is written
as: Y   0  1 X  
where ;
 0 = intercept of the line with the Y-axis
1  slope of the line
 = random error
Random error is the difference of data point from the
deterministic value.
 This regression line is estimated from the data collected
by fitting a straight line to the data set and getting the
equation of the straight line,
Yˆ  ˆ0  ˆ1 X
3.2 CURVE FITTING (SCATTER PLOT)
SCATTER PLOT
 Scatter plots show the relationship between
two variables by displaying data points on
a two-dimensional graph.
 The variable that might be considered as
an explanatory variable is plotted on the
x-axis, and the response variable is plotted
on the y- axis.
 Scatter plots are especially useful when
there are a large number of data points.
 They provide the following information about
the relationship between two variables:
(1) Strength
(2) Shape - linear, curved, etc.
(3) Direction - positive or negative
(4) Presence of outliers
EXAMPLES:
PLOTTING LINEAR REGRESSION MODEL
A linear regression can be develop by freehand plot of the data.

Example 3.2:
11
The given table contains values for 2 variables, X and Y. Plot
the given data and make a freehand estimated regression line.
X -3 -2 -1 0 1 2 3
Y 1 2 3 5 8 11 12
12
3.3 INFERENCES ABOUT ESTIMATED PARAMETERS
LEAST SQUARES METHOD

 The Least Square method is the method
most commonly used for estimating the
regression coefficients  0 and 1
 The straight line fitted to the data set is the
line: ˆ ˆ ˆ
Y   0  1 X
where Yˆ is the estimated value of y for a
given value of X.
i) y-Intercept for the Estimated Regression
Equation, ̂ 0
ˆ0  y  ˆ1 x
x and y are the mean of x and y respectively

ii) Slope for the Estimated Regression Equation,
S xy
ˆ1 
S xx
 n  n 
n  i   i 
x y
S xy   xi yi   i 1   i 1 
i 1 n
2
 n

n  i  y
S yy  y 2
i   i 1 
i 1 n
2
 n 
n   xi 
S xx   xi2   i 1 
i 1 n
EXAMPLE 3.3: STUDENTS SCORE IN HISTORY
The data below represent scores obtained by ten primary school
students before and after they were taken on a tour to the
museum (which is supposed to increase their interest in history)
Before, x 65 63 76 46 68 72 68 57 36 96
After, y 68 66 86 48 65 66 71 57 42 87
a) Develop a linear regression model with “before” as the

independent variable and “after” as the dependent variable.
b) Predict the score a student would obtain “after” if he scored

60 marks “before”.
Solution
n  10  xy  44435
 x  647  x  44279
2
x  64.7
 y  656  y  44884
2
y = 65.6
S xy  44435 
 647   656 
 1991.8
10
2
647
S xx  44279   2418.1
10
6562
S yy  448.84   1850.4
10
ˆ S xy 1991.8
a) 1    0.8237
S xx 2418.1
ˆ0  y  ˆ1 x  65.6   0.8237   64.7   12.3063

Y  12.3063  0.8237 X
b) X  60

Y  12.3063  0.8237  60   61.7283
EXERCISE 3.1:
INCOME, x FOOD EXPENDITURE, y
55 14
83 24
38 13
61 16
33 9
49 15
67 17
a) Fit a linear regression model with income as the

independent variable and food expenditure as the
dependent variable.
b) Predict the food expenditure if income is 50.
Answer: a)Yˆ  1.505  0.2525 X b)Yˆ  14.13

EXERCISE 3.2:
3.4 ADEQUACY OF THE MODEL COEFFICIENT OF
DETERMINATION( R 2 )
 The coefficient of determination is a measure of the
variation of the dependent variable (Y) that is
21
explained by the regression line and the
independent variable (X).
 The symbol for the coefficient of determination is r 2
2
or R .
 If r =0.90, then r 2 =0.81. It means that 81% of the
variation in the dependent variable (Y) is accounted
for by the variations in the independent variable (X).
 The rest of the variation, 0.19 or 19%, is
unexplained and called the coefficient of non
determination.
 Formula for the coefficient of non determination
is 1.00  r 2
 Relationship Among SST, SSR, SSE
SST = SSR + SSE
 i
( y  y ) 2
  i
( ˆ
y  y ) 2
  i i
( y  ˆ
y ) 2
23
where:
SST = total sum of squares
SSR = sum of squares due to regression
SSE = sum of squares due to error
 The coefficient of determination is:
2
SSR Sxy 
r 
2

SST SxxSyy
where:
SSR = sum of squares due to regression
3.5 PEARSON PRODUCT
MOMENT CORRELATION
COEFFICIENT (r)
 Correlation measures the strength of a linear
relationship between the two variables.
 Also known as Pearson’s product moment coefficient
of correlation.
 The symbol for the sample coefficient of correlation
is (r)
 Formula :
Sxy
r
S xx .S yy
or r  (sign of b1 ) r 2
Properties of (r):
1  r  1
 Values of r close to 1 implies there is a strong
positive linear relationship between x and y.
 Values of r close to -1 implies there is a strong
negative linear relationship between x and y.
 Values of r close to 0 implies little or no linear
relationship between x and y.
ASSUMPTIONS ABOUT THE ERROR
TERM E
1.
1. The error  is
The error is aa random
random variable
variable with
with mean
mean of
of zero.
zero.
2.
2. The
The variance of  ,, denoted
variance of by 
denoted by 22,, is
is the
the same
same for
for
all
all values
values of
of the
the independent
independent variable.
variable.
3.
3. The
The values of 
values of  are
are independent.
independent.
4.
4. The error  is
The error is aa normally
normally distributed
distributed random
random
variable.
variable.
EXAMPLE 3.4: REFER PREVIOUS EXAMPLE
3.2, STUDENTS SCORE IN HISTORY
Calculate the value of r and interpret its meaning.
SOLUTION: Sxy
r
Sxx .Syy
1991.8

 2418.1  1850.4 
 0.9416
Thus, there is a strong positive linear relationship between

score obtain before (x) and after (y).
EXERCISE 3.3:
Refer to previous Exercise 3.1 and Exercise 3.2,

calculate coefficient correlation and interpret the
results.
3.6 TEST FOR LINEARITY OF
REGRESSION
 To test the existence of a linear relationship

between two variables x and y, we proceed with
testing the hypothesis.
 Two test are commonly used:
(i)
t -Test
(ii) F -Test
(i) t-Test
1. Determine the hypotheses.
H 0 :  1  0 ( no linear r/ship)
H 1 :  1  0 (exist linear r/ship)
2. Compute Critical Value/ level of significance.
t or p  value
,n  2
2
3. Compute the test statistic.

ˆ1
t
Var ( ˆ1 )
 S yy  ˆ1 S xy  1
Var ( ˆ1 )   
 n  2  S xx
4. Determine the Rejection Rule.
Reject H0 if :
t  t  or t  t 
,n 2 ,n  2
2 2
p-value < 
5.Conclusion.
There is a significant relationship between

variable X and Y.
EXAMPLE 3.5: REFER PREVIOUS EXAMPLE
3.3, STUDENTS SCORE IN HISTORY
Test to determine if their scores before and after the trip is related.
Use a=0.05
SOLUTION:
1) H :   0 ( no linear r/ship)
0 1
2)   0.05
t 0.05  2.306
,8
2

3) 1
ttest   

 S   S
1 xy  1
 Var (  1 )  
yy
Var (  1 )  n  2  Sxx
 
0.8237  1850.4  (0.8237)(1991.8)  1
  7.926 
0.0108  8  2418.1
 0.0108
4) Rejection Rule:
ttest  t0.025 ,8
 7.926 2.306
5) Conclusion:
Thus, we reject H0. The score before (x) is linear relationship
to the score after (y) the trip.
EXERCISE 3.4:
EXERCISE 3.5:
(ii) F-Test
1. Determine the hypotheses.
H 0 :  1  0 ( no linear r/ship)
2. Specify the level of significance.
F ,1,n  2 or p  value
3. Compute the test statistic.
F = MSR/MSE - this value can get from ANOVA table

4. Determine the Rejection Rule.
Reject H0 if :
p-value < a
F test > F ,1,n  2
5.Conclusion.
There is a significant relationship between

variable X and Y.
3.7 ANOVA APPROACH FOR TESTING
LINEARITY OF REGRESSION
 The analysis of variance (ANOVA) method is an approach to

test the significance of the regression.
 We can arrange the test procedure using this approach in an
ANOVA table as shown below;
EXAMPLE 3.6:
 The manufacturer of Cardio Glide exercise equipment
wants to study the relationship between the number of
months since the glide was purchased and the length of
time (hours) the equipment was used last week.
At   0.01 , test whether there is a linear relationship between

the variables.
Solution:
1) Hypothesis:
H 0 : 1  0
H1 : 1  0
1) F-distribution table: F0.01,1,8  11.26
2) Test Statistic:
F = MSR/MSE = 17.303
or using p-value approach:
significant value =0.003
4) Rejection region:
Since F statistic > F table (17.303>11.2586 ), we reject
H0 or since p-value (0.003  0.01 ) we reject H0
5) Thus, there is a linear relationship between the variables
(month X and hours Y).
EXERCISE 3.6:
An agricultural scientist planted alfalfa on several plots of
land, identical except for the soil pH. Following are the dry
matter yields (in pounds per acre) for each plot.
pH Yield
4.6 1056
4.8 1833
5.2 1629
5.4 1852
5.6 1783
5.8 2647
6.0 2131
a) Construct a scatter plot of yield (y) versus pH (x). Verify
that a linear model is appropriate.
b) Compute the estimated regression line for predicting
Yield from pH.
c) If the pH is increased by 0.1, by how much would you
predict the yield to increase or decrease?
d) For what pH would you predict a yield of 1500 pounds
per acre?
e) Calculate coefficient correlation, and interpret the
results.
Answer : b) yˆ  2090.9  737.1x
c) yˆ  73.71
d ) pH  4.872
EXERCISE 3.7
A regression analysis relating the current market value in dollars to
the size in square feet of homes in Greeny County, Tennessee,
follows. The portion of a regression software output as below:
Predictor Coef SE Coef T P

Constant 12.726 8.115 1.57 0.134
Size 0.00011386 0.00002896 3.93 0.001
Analysis of Variance
Source DF SS MS F P
Regression 1 10354 10354 15.46 0.001
Error 18 12054 670
a)Determine how 19
Total many homes
22408inthe sample.
b)Determine the regression equation.
c)Can you conclude that there a linear relationship between the

variables at ?
  0.05

Scatter Plot/Diagram Simple Linear Regression Model

Uploaded by

Copyright:

Available Formats

You might also like

Scatter Plot/Diagram Simple Linear Regression Model

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Scatter Plot/Diagram Simple Linear Regression Model

Uploaded by

Copyright:

Available Formats

CHAPTER 3:

 Regression – is a statistical procedure for establishing

ii. Y is dependent variable.

1) A nutritionist studying weight loss programs might

2) An entrepreneur might want to know whether

A linear regression can be develop by freehand plot of the data.

LEAST SQUARES METHOD

x and y are the mean of x and y respectively

a) Develop a linear regression model with “before” as the

b) Predict the score a student would obtain “after” if he scored

a) Fit a linear regression model with income as the

Answer: a)Yˆ  1.505  0.2525 X b)Yˆ  14.13

SST = SSR + SSE

Calculate the value of r and interpret its meaning.

Thus, there is a strong positive linear relationship between

Refer to previous Exercise 3.1 and Exercise 3.2,

 To test the existence of a linear relationship

3. Compute the test statistic.

There is a significant relationship between

3. Compute the test statistic.

F = MSR/MSE - this value can get from ANOVA table

There is a significant relationship between

 The analysis of variance (ANOVA) method is an approach to

At   0.01 , test whether there is a linear relationship between

Answer : b) yˆ  2090.9  737.1x

Predictor Coef SE Coef T P

c)Can you conclude that there a linear relationship between the

You might also like