Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 25

Simple Regression Analysis

Pravat Uprety
Simple Linear Regression Equation
(Prediction Line)
The simple linear regression equation provides an estimate of the
population regression line

Estimated (or
predicted) Y Estimate of the Estimate of the
value for regression regression slope
observation i intercept

Value of X for

Ŷi  b0  b1Xi
observation i

The individual random error terms ei have a mean of zero

06/09/2021 Prepared by Pravat Uprety


Computation of slope (b1 ) and Y-intercept (b0)

• By using least squares method the value of bo


and b1 are obtained as
n  XY  (  X )(  Y )
b1 
n X 2
 ( X ) 2

b0 
 Y
b
 X
1
n n

Then the estimating equation becomes as


Yˆ  b0  b1 X
06/09/2021 Prepared by Pravat Uprety
Example
Hhno Income Expenditure XY X2 Y2
(in 000) (in 000) (Y)
(X)
1 18 10
180 324 100
2 20 12
240 400 144
3 20 15
300 400 225
4 25 15
375 625 225
5 28 17
476 784 289
6 30 20
600 900 400
141 89 2171 3433 1383

06/09/2021 Prepared by Pravat Uprety


The estimating equation is
Yˆ  0.794 0.665X

Syx=   b0  Y  b1  XY
Y 2

n2

1383 (0.794) X 89 0.665X 2171


=
62

= 1.57

06/09/2021 Prepared by Pravat Uprety


coefficient of determination
• R2 = b0  Y  b1  XY  n Y 2

Y 2
 nY 2

For the previous example


 0.794 X 89  0.665 X 2171  6 X (14.83) 2
• R2 =
1383  6 X (14.83) 2
53.47
=
63.42
= 0.8431

84.31 % of variation in expenditure (Y) is explained by income


(X).

06/09/2021 Prepared by Pravat Uprety


Inferences about population slope
• Population slope = β1 (parameter)
(Regression Coefficient)
Sample Slope = b1 (Statistic/estimator)
Standard error of estimate = Syx
Standard errorSof slope or regression coefficient
YX
(Sb1) = (X  X ) 2

SYX

X
2
= 2
 nX
Confidence interval estimate for the
population slope
• The (1-α)% confidence interval estimate for
the population slope is

b1 ± tn-2, α Sb1
taking -ve sign which gives lower limit (LL)
taking +ve sign which gives upper limit (UL)

Prob (LL ≤ β1 ≤ UL) = (1 – α)


Example
• For the previous example
We have
n= 6, b1 = 0.665, Syx = 1.57, ∑X = 141 and ∑X2
=3433
The standard error of regression coefficient is
SYX 1.57
Sb1 = = 141 2 = 0.1436
 X  nX
2 2
3433  6( )
6
• Now 95% confidence interval estimate is
b1 ± t6-2, 0.05 Sb1
= 0.665 ± 2.776 X 0.1436
= 0.665 ± 0.3986
Taking -ve sign, Lower limit = 0.2664
Taking +ve sign, Upper limit = 1.0636
The 95% confidence interval estimate is

Prob (0.2664 ≤ β1 ≤ 1.0636) = 95%


Hypothesis testing for population slope
(regression coefficient)
• Case (a) If any past value is not given or zero is given
Null Hypothesis (H0): β1 = 0 (i.e. there is no
significant relationship between Y and X)
Or
Null Hypothesis (H0): β1 ≥ 0 (i.e. there is no
significant –ve relationship between Y and X)
Or
Null Hypothesis (H0): β1 ≤ 0 (i.e. there is no
significant +ve relationship between Y and X)
Alternative Hypothesis (H1): β1 ≠ 0 (i.e. there is
significant relationship between Y and X)
Or
Alternative Hypothesis (H1): β1 < 0 (i.e. there is
significant –ve relationship between Y and X)
Or
Alternative Hypothesis (H1): β1 > 0 (i.e. there is
significant +ve relationship between Y and X)
• Test statistic
b1   b1
t= 1 =
Sb 1 Sb 1

Calculated t = |t|
Tabulated t = tn-2, α

Decision
If Cal value ≤ Tab value
We do not reject Ho

If Cal value > Tab value


We reject Ho
Hypothesis testing for the previous example
(Test of significance)
• Null Hypothesis (H0): β1 = 0
there is no significant relationship between income (X)
and consumption(Y) .

Alternative Hypothesis (H1): β1 ≠ 0


there is significant relationship between income (X) and
consumption(Y) .
• Test statistic
b1   b1
t= 1 = = 0.665/0.1436 = 4.63
Sb 1 Sb 1

Calculated t = |t|= 4.63


Tabulated t = tn-2, α = t4, 0.05 = 2.776

Decision
Here, Cal value (4.63) > Tab value (2.776)
We reject Ho
There is significant relationship between income (X) and
consumption (Y).
Case (b) If any past value is given

• Null Hypothesis (H0): β1 = given value (i.e. the


population slope has not significantly changed from its
past value)
Or
Null Hypothesis (H0): β1 ≥ given value (i.e. the population
slope has not significantly decreased from its past value)
Or
Null Hypothesis (H0): β1 ≤ given value (i.e. the
population slope has not significantly increased from its
past value)
Alternative Hypothesis (H1): β1 ≠ given value
(i.e. the population slope has significantly changed
from its past value)
Or
Alternative Hypothesis (H1): β1 < given value
(i.e. the population slope has significantly decreased
from its past value)
Or
Alternative Hypothesis (H1): β1 > given value
(i.e. the population slope has significantly increased
from its past value)
• Test statistic
t= b1   1
Sb 1

Calculated t = |t|
Tabulated t = tn-2, α

Decision
If Cal value ≤ Tab value
We do not reject Ho

If Cal value > Tab value


We reject Ho
Hypothesis testing for the previous example (Test of
significantly changed from its past value of 0.85)

• Null Hypothesis (H0): β1 = 0.85

the population slope has not significantly changed from its past value of
0.85.

Alternative Hypothesis (H1): β1 ≠ 0.85


the population slope has significantly changed from its past value of 0.85.
• Test statistic
t= b1   1 = (0.665-0.85)/0.1436 = -1.288
Sb 1

Calculated t = |t|= 1.288


Tabulated t = tn-2, α = t4, 0.05 = 2.776

Decision
Here, Cal value (1.288) < Tab value (2.776)
We do not reject Ho

the population slope has not significantly changed from


its past value of 0.85.
Estimating Mean Values and
Predicting Individual Values
Goal: Form intervals around Y to express uncertainty about the
value of Y for a given Xi

Confidence
Interval for Y 
the mean of Y
Y, given Xi


Y = b0+b1Xi

Prediction Interval for


an individual Y, given
Xi
Xi X
Example
Hhno Income (in Expenditure (in
000) (X) 000) (Y)
1 18 10
2 20 12
3 20 15 Mean of Y – Lower
and upper
(Confidence)
4 25 15
5 28 17
X = 29 Y =? LL Prediction
UL (Individual)
6 30 20

06/09/2021 Prepared by Pravat Uprety


Confidence Interval for
the Average/Mean Y, Given X
Confidence interval estimate for the
mean value of Y given a particular Xi

Confidence interval for μ Y|X  X i :


Yˆ  t n  2, SYX hi

Size of interval varies according to


distance away from mean, X

1 (X i  X) 2 1 (X i  X ) 2
hi    
n  (X i  X) 2
n  X 2  nX 2
Prediction Interval for
an Individual Y, Given X
Prediction interval estimate for an
Individual value of Y given a particular Xi

Prediction interval for YX  X i :


Yˆ  t n  2, SYX 1  hi

This extra term adds to the interval width to reflect


the added uncertainty for an individual case

You might also like