Chapter2 (Simple Linear Regression)

CHAPTER TWO - SIMPLE LINEAR REGRESSION
Introduction
The simplest regression structure uses the Simple Linear Regression Model.
The term “simple” implies that a single regressor/independent variable X is
involved. ”Linear” implies linear relationship between X and Y or linear in
the parameters that is no parameters appear as an exponent or is multiplied
or divided by another parameter.
Regression is concerned with modelling the relationships between two or

more variables, thus in simple linear regression we are measuring or investi-
gating the relationship between two variables X and Y .
Examples 2.1
• blood levels and age.
• reading time before exam and exam mark.
• crop yield and amount of rainfall.
• height and weight of person
Uses of Regression
1. Prediction - using the data available in conjunction with the model to
predict/estimate future outcomes.
2. Variable screening - removing unnecessary variables from the model.
3. Explaining - system explanation, which variables contributes the most

and how they contribute to the dependent variables.
4. Inference - estimation of the parameters in the model.
5. Planning and control - if we have an appropriate model, we can explain

the physical system and thus plan ahead and control the system.
1
In regression analysis the 1st step is to determine whether it is reasonable to
assume that the curve of regression of Y on X is a straight line. For simple
linear regression, the model is given by;
Yi = β0 + β1 Xi + i
where Y is the response/dependent variable, β0 and β1 are intercept and

slope respectively, and is the error term.
We shall use the Least Squares (LS) procedure to estimate the parameters
in the model given above. Although there are other procedures available for
estimating the parameters we shall, however, only use LS procedure.
This procedure is based on the following assumptions;
1. there is linear relationship between xi and yi .
2. xi are non-random and are observed with negligible error.
3. 0i s are random variables with mean zero and constant variance. This
is called homogeneous assumption. Mathematically, this assumption is
E(i ) = 0 and V ar(i ) = E(2i ) = σ 2 .
4. The 0i s are uncorrelated, that is,

(
0 if i 6= j
E(i , j ) =
σ 2 if i = j
5. ∼ N (0, σ 2 )
Method of Least Squares

It is the most extensively used technique for estimating the parameters β0
and β1 , to give estimates β̂0 and β̂1 respectively. When we use these estimates
in the model, yi = β0 + β1 yi + i , we get the fitted model given by
ŷi = β̂0 + β̂1 xi + i
We call this the fitted model because the model now has estimated parame-
ters.
2
Defn: Residual
Let ri = yi − ŷi , the difference between observed and fitted values, and this
difference is called a residual.
Note: The residuals, as we shall see later are very useful for assessing the
adequacy of a model.
The logic behind Least Squares Estimation

From the many straight lines that can be drawn through the scatter diagram,
we wish to pick the one that ‘best fits’ the data, in the sense that the values
of β̂0 and β̂1 chosen will be those that minimize the sum of the squares of the
distances between the data points and estimated regression. Thus we need
estimates that minimizes ni=1 ri2 , Residual Sum of Squares (RSS).
P
To minimize RSS, β̂0 and β̂1 must satisfy the conditions

n
!
δ X
r2 = 0, and
δβ0 i=1 i
n
!
δ X
r2 = 0
δβ1 i=1 i
Thus
n n
!
δ X δ X
ri2 = (yi − β̂0 − β̂1 )2
δβ0 i=1 δβ0 i=1
n
X
= −2 (yi − β̂0 − β̂1 )
i=1
n
X n
X
−2 yi + 2nβ̂0 + 2β̂1 xi = 0
i=1 i=1
Dividing the above by -2 we have

n
X n
X
yi − nβ̂0 − β̂1 xi = 0 ∗
i=1 i=1
Dividing by n leads to
ȳ − β̂0 − β̂1 x̄ = 0
3
For β1 we have
n n
!
δ X δ X
ri2 = (yi − β̂0 − β̂1 )2
δβ1 i=1 δβ1 i=1
n
X
= −2 xi (yi − β̂0 − β̂1 )
i=1
n n n
x2i = 0
X X X
= −2 yi xi + 2β̂0 xi + 2β̂1
i=1 i=1 i=1
This simplifies to
n n n
x2i = 0
X X X
yi xi − β̂0 xi − β̂1 ∗
i=1 i=1 i=1
Equations marked with * are called normal equations. Solving for β̂0 from
the 1st normal equation we have
Pn Pn
i=1 yi i=1 xi
β̂0 = − β̂1
n n
and we can write the above as
β̂0 = ȳ − β̂1 x̄
Substituting for β̂0 in the second normal equation gives;

n Pn Pn n n
!
i=1 yi i=1 xi X
x2i = 0
X X
yi xi − − β̂1 xi + β̂1
i=1 n n i=1 i=1
n n n n
!2 n
1X 1
x2i = 0
X X X X
yi xi − xi yi + β̂1 xi − β̂1
i=1 n i=1 i=1 n i=1 i=1
n n n n n
!
1 1
( xi )2 − x2i =
X X X X X
β̂1 xi yi − yi xi
n i=1 i=1 n i=1 i=1 i=1
Making β̂1 subject of formulae we have

Pn
yi xi − n1 ( ni=1 yi )( ni=1 xi )
P P
βˆ1 = i=1
Pn 2 1 Pn 2
i=1 xi − n ( i=1 xi )
4
Pn
yi xi − nx̄ȳ
= Pi=1
n 2 2
i=1 xi − nx̄
Pn
i=1 (yi − ȳ)(xi − x̄)
= Pn 2
i=1 (xi − x̄)
We often state the above as
Sxy
βˆ1 =
Sxx
= i=1 (yi − ȳ)(xi − x̄) and Sxx = ni=1 (xi − x̄)2 .
Pn P
Where Sxy
Note: We first find β̂1 and then find β̂0 , using β̂0 = ȳ − β̂1 x̄.
The fitted line is ŷi = β̂0 + β̂0 xi . This fitted line is referred to by many
different names in Statistics. Some of the names are: the least squares line,
fitted regression line, estimated regression line or just the fitted model.
Meaning of Regression Parameters

Our model is yi = β0 + β1 xi + i
Fitted model is ŷi = β̂0 + β̂0 xi
β1 is the slope. It indicates the change in the mean of the probability distri-
bution of Y per unit increase of X.
β0 is vertival axis or Y intercept.
Example 2.2
Consider the following data set, where Y is the dependent variable and X is
the regressor variable.
X 4 8 5 9 10
Y 2 3 4 6 8
Find the least squares estimates of β0 and beta1 .
Solution
Pn Pn Pn
i=1 (yi − ȳ)(xi − x̄) yi xi
βˆ1 = Pn 2
β̂0 = i=1
− β̂1 i=1
i=1 (xi − x̄) n n
5
From the data set:
Pn
xi = 36, ni=1 yi = 23,P ni=1 xi yi = 186, ni=1 x2i = 286,
P P P
i=1P n n
xi yi
x̄ = i=1
n
= 7.2 and ȳ = i=1
n
= 4.6
Thus
186 − 5(7.2)(4.6)
βˆ1 = = 0.7612
286 − 5(7.2)2
Therefore
β̂0 = 4.6 − 0.7617(7.2) = −0.8806
Thus the fitted line is;
ŷi = −0.8806 + 0.7617xi
If X is increased by one unit then Y will increase by 0.7617.
Prediction
The fitted regression line is used to predict the values of the dependent vari-
able given the value of the independent variable.
Example 2.3
Find the value of y given that x = 7
Solution
ŷ = −0.8806 + 0.7617(7) = 4.4478
Residuals and Checking the Model Assump-

tions
Residuals
The residuals measures the deviation of yi from ŷi . Since i is usually un-
known it is estimated by ri . the residuals are needed not only for estimating
the magnitude of the random variation in the yi ’s, but also when assessing
the appropriateness of the regression model employed.
6
Properties of Least Squares Residuals
Pn
(i) i=1 ri =0
Pn
(ii) i=1 ri is minimum
Pn
(iii) i=1 xi ri = 0, that is sum of residuals weighted by the xi values is zero.
Pn
(iv) i=1 ŷi ri = 0, that is sum of residuals weighted by fitted values is zero.
Checking the Model Assumptions

Having the fitted model, we should check if any of the assumtions have been
violated. If any of the least squares assumptions were violated, we cannot
rely on the estimates β̂0 and β̂0 .
1. Linear Assumption
We use scatter plot or correlation coefficient to check whether this
assumption is valid or not.
2. Non-Stochastic Independent Variable Assumption

The independent variable should be non-random. We shall assume
the X is non-random. A whole area of Statistics called Experimental
Design is devoted to tackling this problem.
3. Assumtion that E(i ) = 0, V ar(i ) = σ 2 .

To validate this assumption, we plot resiaduals agianst time index and
no pattern should be formed for the assumption to be valid. Infact the
residuals should be randomlyb scattered about zero.
The plot mentioned above is also used to detect autocorrelation. As-
sumtion 4 assumed that the errors should be uncorrelated that is,
Cov(i ; j ) = 0 for i 6= j.
Thus if the errors are uncorrelated the plot of residuals against time
index must not form any pattern. Also detection of autocorrelation can
be accomplished by plotting residuals against fitted values and for the
assumption to be valid, no pattern must be formed.
4. Checking that ∼ N (0, σ 2 )

This assumption can also be checked using residuals. To test for the
normality assumption, that is, that the errors are normally distributed,
7
we usually use what are referred to as Q − Q plots. These are plots
of ordered residuals versus the normal distribution quartiles q(j) which
are defined by
1
Z q(j) 1 1 2 j−
P (Z ≤ q(j) ) = exp− 2 z dz = 2
−∞ 2π n
The resulting plot should be a straight line or almost a straight line,
that is, there should be a linear relationship between q(j) and ordered
rj . We usually construct Q − Q plots using computer programs or
statistical packages, such as MINITAB, SPSS, SAS, e.t.c. We may also
use the histogram of residuals and for the assumption to be valid, the
histogram must resemble a bell/symtric shape.
5. Checking for the homogeneity of Error Variances

We expect the errors to be uncorrelated as well as to have same vari-
ance, σ 2 , that is the variances are homogeneous. When variances of
the errors are different, that is, V ar(ei ) = σi2 , with the σi2 not necessar-
ily equal, we say the variances are heterogenous. For the homogeneity
assumption to be valid, the plot of residuals aginst time index must re-
semple a random pattern around zero. Violation of randomness implies
heterogeneity variances.
Methods of Scaling Residuals

Deviations from assumptions are often best detected by working with resid-
uals that have the same precision.
1. Standardized Residuals
ei
di = √ , i = 1, 2, ..., n, E(di ) = 0 and V ar(di ) = 1
hatσ 2
If the errors are normally distributed, then approximately 95% of the
standardized residuals should fall in the interval (-2,2). Residuals that
are far outside this interval may indicate the presence of an outlier,
that is, an observation that is not typical of the rest of the data. You
may (may not) discard the outlier.
8
2. Studentised Residuals
ei
ri = q
M SE(1 − hii )
where, Hii is the diagonal elements of the hat matrixm (H).

V ar(ri ) = 1 when model is correct.
ri is the most commonly used residual by statistical software (MINITAB,

SPSS, SAS, etc.)
Remedial to Some Problems
Problem Solution
-Variance increases with Y -Variance statbilizing transformation,
√
for example, y,ln y, y1
-Varinace shows ‘Double Bow’ shape -logistic regression
-Curvature with xi -Include x2i in the model
-Increasing variance with xi -Use weigted least squares
in the estimation of parameters
-Increasing variance with time -Use weigted least squares
in the estimation of parameters
-Linear time trend Include time as a regressor
-Seasonal effect -Include seasonal effect in the model
-Autocorrelation -Time series modelling
9
Activity 2.1
1. The table below gives observations on systolic blood pressure and age
for a sample of eight individuals
bp(y) 116 138 132 162 154 220 128 142

age (x) 20 45 29 67 56 47 38 50
(note ȳ = 149.00,x̄ = 44, Sxx = 1556.00 , Syy = 7224.00, and Sxy =

1692 )
(a) What assumptions must be met before carrying out a regression

analysis ?
(b) Find the least squares estimates of β0 and β1 . Write down the
equation of the regression line.
2. A researcher carries out an experiment to determine the relationship

between Chirps (Y) and temp (X). The following results were obtained:
Chirps (Y) 24 21 19 19 19 15 12 11 10 22 18
Temp (X) 63 62 60 60 58 58 55 53 52 64 61
(a) Plot Y against X. Is there a relationship between Y and X?

(b) Fit the simple linear regression model to the data and write down
the equation of the line.
3. Let yi = β0 + β1 xi + i , i = 1, ..., n, be a linear regression model where

β0 and β1 are parameters and i is the error term,
(a) Derive the normal equations for estimating β0 and β1 . [Hint: Do

not derive the estimates.]
(b) State the assumptions that are required to fit the linear regression
model.
10
(c) Briefly describe how each of these model assumptions is checked.
4. (a) Outline what is involved in the analysis of residuals from a linear

regression model.
(b) The table below gives observations on mathematics achievement
test score (xi ) and calculus grades (yi ) for ten independently se-
lected college freshmen.
xi 39 43 21 64 57 47 28 75 34 52
yi 65 78 52 82 92 89 73 98 56 75
Assuming that a simple linear regression model is appropriate, the

model ŷi = 40.7842 + 0.7656xi was fitted to the data. The fitted
values for 10 observations are as follows,
70.64 73.70 56.86 89.78 84.42 76.77 62.22 98.20 66.81 80.59
Plot residuals against the fitted values and comment on the ade-
quacy of the fitted model.
11

Chapter2 (Simple Linear Regression)

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter2 (Simple Linear Regression)

Uploaded by

Copyright:

Available Formats

CHAPTER TWO - SIMPLE LINEAR REGRESSION

Regression is concerned with modelling the relationships between two or

• blood levels and age.

• reading time before exam and exam mark.

• crop yield and amount of rainfall.

• height and weight of person

2. Variable screening - removing unnecessary variables from the model.

3. Explaining - system explanation, which variables contributes the most

4. Inference - estimation of the parameters in the model.

5. Planning and control - if we have an appropriate model, we can explain

where Y is the response/dependent variable, β0 and β1 are intercept and

2. xi are non-random and are observed with negligible error.

4. The 0i s are uncorrelated, that is,

Method of Least Squares

ŷi = β̂0 + β̂1 xi + i

The logic behind Least Squares Estimation

To minimize RSS, β̂0 and β̂1 must satisfy the conditions

Dividing the above by -2 we have

Substituting for β̂0 in the second normal equation gives;

Making β̂1 subject of formulae we have

Meaning of Regression Parameters

Find the least squares estimates of β0 and beta1 .

i=1 (xi − x̄) n n

Find the value of y given that x = 7

Residuals and Checking the Model Assump-

Checking the Model Assumptions

2. Non-Stochastic Independent Variable Assumption

3. Assumtion that E(i ) = 0, V ar(i ) = σ 2 .

4. Checking that  ∼ N (0, σ 2 )

5. Checking for the homogeneity of Error Variances

Methods of Scaling Residuals

where, Hii is the diagonal elements of the hat matrixm (H).

ri is the most commonly used residual by statistical software (MINITAB,

Remedial to Some Problems

bp(y) 116 138 132 162 154 220 128 142

(note ȳ = 149.00,x̄ = 44, Sxx = 1556.00 , Syy = 7224.00, and Sxy =

(a) What assumptions must be met before carrying out a regression

2. A researcher carries out an experiment to determine the relationship

(a) Plot Y against X. Is there a relationship between Y and X?

3. Let yi = β0 + β1 xi + i , i = 1, ..., n, be a linear regression model where

(a) Derive the normal equations for estimating β0 and β1 . [Hint: Do

4. (a) Outline what is involved in the analysis of residuals from a linear

Assuming that a simple linear regression model is appropriate, the

You might also like

4. The 0i s are uncorrelated, that is,

ŷi = β̂0 + β̂1 xi + i

3. Assumtion that E(i ) = 0, V ar(i ) = σ 2 .

4. Checking that ∼ N (0, σ 2 )

3. Let yi = β0 + β1 xi + i , i = 1, ..., n, be a linear regression model where