Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 14

Chapter 9

Simple Linear Regression And Correlation


Analysis

1
Regression Analysis

• RA: is the statistical method that helps to formulate a functional relationship


between two or more variables.
• It can be used for assessment of association, estimation and prediction.
– Variables are
• Independent Variable: influences the values or is used for prediction
• Dependent variable: influenced or to be predicted by another
– Relationship
• Linear: curve is a straight line
• Non linear: curve is a not straight line
– The relationship based on number of independent variables
• Simple: only single predictor
• Multiple : more than one predictor

2
Simple Regression Analysis

• The simple linear regression of Y on X can be expressed with respect to the


population parameters 𝜶 and 𝝱 as

Y= 𝜶+𝝱X+𝝴

where 𝜶 = y-intercept that represents the mean value of the dependent variable Y
when the independent variable X is zero;

𝝱 = slope of the regression line that represents the change in the mean of Y for a
unit change in the value of X ;

𝝴 = error term

3
• Basic Assumptions
– There is linear relationship between dependent variable y and explanatory
variable x
– Expected value of error term is zero and its variance is constant (σ2).
– Error term is approximately normally distributed with mean zero and
constant variance (σ2).
– The dependent variable has normal distribution with mean β0+β1X and
variance σ2
– Data on independent variables are fixed numbers.

4
Parameter estimation
• The population parameters 𝜶 and 𝝱 can be estimated from sample data
using the least square technique. The estimators of 𝜶 and 𝝱 are usually
denoted by and , respectively. The resulting regression line is
X
• The estimated values of Y are denoted by . The observed values of Y are
denoted by y.
• The difference between the observed and the estimated values, Y - , is
known as error or residual, and is denoted by .
• The residual can be positive, negative or zero.
• A best fitting line is the one for which the sum of squares of the residuals,
has the minimum value. This is called the method of least squares.
• According to this method, one would select a and b such that = is minimum.
The solution of this minimization problem using partial differentiation is as
follows:
1 𝑛
σ 𝑛𝑖=1 𝑋𝑖 𝑌𝑖 − σ σ 𝑛
𝑋𝑖 𝑖=1 𝑌𝑖
β෠ = 𝑛 𝑖=1

𝑛 2 1 𝑛 2
σ 𝑖=1 𝑋𝑖 − (σ 𝑖=1 𝑋𝑖 )
𝑛
𝑛 𝑛 𝑛
𝑛 σ 𝑖=1 𝑋𝑖 𝑌𝑖 − σ 𝑖=1 𝑋𝑖 σ 𝑖=1 𝑌𝑖
= 𝑛 2 𝑛 2
𝑛 σ 𝑖=1 𝑋𝑖 − (σ 𝑖=1 𝑋𝑖 )
σ 𝑛𝑖=1 𝑋𝑖 𝑌𝑖 − 𝑛𝑋ത𝑌ത
= 𝑛
σ 𝑖=1 𝑋𝑖2 − 𝑛𝑋ത2
= 𝑌ത− β෠𝑋ത
𝛼ො
Example 9.1: A computer manager needs to know how efficiency of her new
computer program depends on the size of incoming data. Efficiency will be
measured by the number of processed requests per hour. Applying the program to
data sets of different sizes, she gets the following results,

Data size (gigabytes), x 6 7 7 8 10 10 15

Processed requests, y 40 55 50 41 17 26 16

– Determine the least square regression equation of processed requests on


data size
– Estimate the number of processed requests per hour of a program that the
fdata size is 9 gigabytes.

7
task x y (xy) (x2)
1 6 40 240 36
2 7 55 385 49
3 7 50 350 49
4 8 41 328 64
5 10 17 170 100
6 10 26 260 100
7 15 16 240 225
Total 63 245 1973 623
Average 9 35

8
• Parameter estimation…

σ 𝑛𝑖=1 𝑋𝑖 𝑌𝑖 − 𝑛𝑋ത𝑌ത and bˆ0 = y - bˆ1 x


β෠ = 𝑛
σ 𝑖=1 𝑋𝑖2 − 𝑛𝑋ത2 = 35 - (- 4.14 * 9)
= 72.26
1973 − 7ሺ9ሻ(35)
=
623 − 7(9)2
= −4.14
» The fitted regression line is yˆ = 72.26 - 4.14 x
• The Estimate of number of processed requests per hour of a program that the
fdata size is 9 gigabytes is

yˆ = 72.26 - 4.14 ´ 9
= 35
9
Correlation Analysis

• Correlation analysis is concerned with measuring the strength (degree) of the


relationship between two or more variables.
• Correlation can be:
– simple correlation,
– partial correlation,
– Autocorrelation
• Correlation can be
– Direct (positive)
– Indirect (negative)

10
Simple correlation coefficient

11
Properties of Simple Correlation Coefficient
• Coefficient of correlation lies between –1≤ r ≤1
• If r =0 indicate that there is no linear relationship between two variables.
• If r = -1 or +1 indicate that there is perfect negative (inverse) or positive
(direct) linear relationship between two variables respectively.
• A coefficient of correlation(r) that is closes to zero shows the relationship is
quite weak, whereas r is closest to +1 or -1, shows that the relationship is
strong.
Remark
– The strength of correlation does not depend on the positiveness and
negativeness of r.
– The slope of simple linear regression (coefficient of regression) and
correlation coefficient should be the same in sign.

12
Example 9.2: Calculate and interpret simple correlation coefficient
for example 9.1.
task x y (xy) (x2) (y2)
1 6 40 240 36 1600
2 7 55 385 49 3025
3 7 50 350 49 2500
4 8 41 328 64 1681
5 10 17 170 100 289
6 10 26 260 100 676
7 15 16 240 225 256
Total 63 245 1973 623 10027
Average 9 35

13
𝑛 σ 𝑋𝑖 𝑌𝑖 − σ 𝑋𝑖 σ 𝑌𝑖
𝑟=
ට ሺ𝑛 σ 𝑋𝑖2 − ሺσ 𝑋𝑖 ሻ2 ሻሺ𝑛 σ 𝑌𝑖2 − ሺσ 𝑌𝑖 ሻ2 ሻ
7 ∗ 1973 − 63 ∗ 245
=
ඥሺ7 ∗ 623 − 632 ሻሺ7 ∗ 10027 − 2452 ሻ
−1624
= = −0.8136
1996.068
• There is a strong negative indirect relationship.

14

You might also like