Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 45

CHAPTER 1

INTRODUCTION
&
CHAPTER 2
SIMPLE LINEAR
1

REGRESSION MODEL
2

1.1 Objectives of Regression Analysis


The determination of explicit form of regression equation is the ultimate objective of regression analysis.
It is finally a good and valid relationship between study variable and explanatory variables. Such
regression equation can be used for several purposes. For example, to determine the role of any
explanatory variable in the joint relationship in any policy formulation, to forecast the values of response
variable for given set of values of explanatory variables. The regression equation helps is understanding
the interrelationships of variables among them
Under the regression analysis we have mainly two types of variables, predictor variables and response
variables. By predictor variables we shall usually mean variables that can either set to the desired value or
else take values that can be observed but not controlled. As a result of the change of predictor variables
the response variable is change.
Throughout the course we shall be most often concerned with relationship of the form
3  

 
Where is some well-defined function and the parameters which characterize the
role and contribution of respectively. The term reflects the stochastic nature of the relationship between and and
indicates that such a relationship is not exact in nature. When then the relationship is called the mathematical
model otherwise the statistical model.
 
A model or relationship is termed as linear if it is linear in parameters and
nonlinear, if it is not linear in parameters. In other words, if all the partial derivatives of with respect to each of
the parameters are independent of the parameters, then the model is called as a linear model.

Example 1.1:
4
Steps in regression analysis
Regression analysis includes the following steps:
 Statement of the problem under consideration
 Choice of relevant variables
 Collection of data on relevant variables
 Specification of model
 Choice of method for fitting the data
 Fitting of model
 Model validation and criticism
 Using the chosen model(s) for the solution of the posed problem.
5

CHAPTER 2

SIMPLE LINEAR

REGRESSION MODEL
6
2.1 The Simple Linear Regression Model
 
We consider the modeling between the dependent and one independent variable. When there is only one
independent variable in the linear regression model, the model is generally termed as simple linear
regression model. When there are more than one independent variables in the model, then the linear
model is termed as the multiple linear regression model.
The simple linear regression model is

- Response variable
- Regressor (Control, Independent, Explanatory) variable
- Intercept
- Slope
7 Assumptions on the model:
Basic

 is a random variable with zero mean and variance (unknown) , that is .


 andare uncorrelated , , so .
 is a normally distributed independent random variable with zero mean and variance
 

Therefore
8

2.2 Least –Square Estimation of the Parameters


Suppose a sample of sets of paired observations (). These observations are assumed to satisfy the simple
linear regression model and so we can write

 
The parameters are unknown and must be estimated using sample data .
We have to fit a straight line for given data which is better. The line fitted by least square is the one that
makes sum of squares of all vertical displacements as small as possible.
The vertical displacement of ith observation is

9
Least square estimator does that

The least square estimator of must satisfy


10

The equations are called as normal equations.


By solving the two equations we can show that

Where and
The solutions of these two equations are called the direct regression estimators, or usually called as the
ordinary least squares (OLS) estimators of .
11
Further, we have
12

The Hessian matrix which is the matrix of second order partial derivatives in the case is given as

That is is positive definite for any , therefore has a global minimum at .


2.3 Useful properties of least square estimator
13
(a). Sum of the residuals in any regression model that contains an intercept is always 0.

(b).

(c).

(d).  

 
(e). Both are unbiased estimator of respectively.
14
(f). and
Both variance and covariance ofhave but value is unknown, therefore replaced by estimator of.
(g). The residual sum of squares is given as

 
Therefore
15

(Residual mean square) is a unbiased estimator of.


16
2.4.1 Estimation of
The estimator of is obtained from the residual sum of squares as follows. Assuming that is normally
distributed, it follows that has a distribution with () degrees of freedom, so

Thus using the result about the expectation of a chi-square random variable, we have

 
Thus an unbiased estimator of is

Note thathas only degrees of freedom. The two degrees of freedom are lost due to estimation
of . Since depends on the estimates, so it is a model dependent estimate of
2.4.2 Estimate of variances of:
17
The estimators of variances of are obtained by replacing by its estimate as follows:
and
2.5 Centered Model:
18
Sometimes it is useful to measure the independent variable around its mean. In such a case,
model has a centered version as follows:
 

The sum of squares due to error is given by

Solving
we can obtain

And

Respectively.
19

Under the assumption that , it follows that

and
The fitted model of is

And the predicted values are

Note that centered model .


2.6 No intercept term model:
20
Sometimes in practice, a model without an intercept term is used in those situations when A no-intercept
model is

The sum of squares due to error is given by

 
Given the estimator of as
21

Also

And an unbiased estimator of is obtained as


2.7 Testing of hypotheses and confidence interval estimation for intercept term:
22
Now we consider the tests of hypothesis and confidence interval estimation for the intercept termof the
model under two cases, when is known and when is unknown.
 Case 1: When is known
Hypothesis:

When is known, then using the result that and is a linear combination of normally distributed random
variables, the following statistic

Has a distribution under.


Reject if .
23

The confidence intervals for when is known is

So the of confidence interval of is


Case 2: When is unknown
Hypothesis:
24

We know that

And

When is unknown, then the following statistic is constructed

Which follows distribution with degree of freedom under .


Reject if .
25

The confidence intervals for when is unknown is

So the of confidence interval of is


2.8 Testing of hypotheses and confidence interval estimation for slope parameter:
Now26
we consider the tests of hypothesis and confidence interval estimation for the slope parameter of the
model under two cases, viz., when is known and when is unknown.
 Case 1: When is known
Hypothesis:

When is known, then using the result that and is a linear combination of normally distributed random
variables, the following statistic

Has a distribution under.


Reject if .
27

The confidence intervals for when is known is

 
So the of confidence interval of is
Case 2: When is unknown
Hypothesis:
28

 When is unknown, then the following statistic is constructed

Which follows distribution with degree of freedom under .


Reject if .
29

The confidence intervals for when is unknown is

So the of confidence interval of is


Test Hypothesis for
30 considered two types of test statistics for testing the hypothesis about the intercept term and
We have
slope parameter- when is known and when is unknown. While dealing with the case of known , the
value of is known from some external sources like past experience, long association of the experimenter
with the experiment, past studies etc. In such situations, the experimenter would like to test the hypothesis
like

 
Where is specified. The test statistic is based on the result . So test statistic is
 

Reject if or .
31
Confidence interval for

The corresponding confidence interval for is


2.932
Analysis of variance:
The technique of analysis of variance is usually used for testing the hypothesis related to equality of more
than one parameters, like population means or slope parameters. It is more meaningful in case of multiple
regression model when there are more than one slope parameters. This technique is discussed and lustrated
here to understand the related basic concepts and fundamentals which will be used in developing the
analysis of variance in the next module in multiple linear regression model where the explanatory variables
are more than two.
 
Total Variance in data
The question is how much of the variation is explained by the model?
We have
33

Degree of freedom of is. (Since , n terms and , one constrain ).


Degree of freedom of is . (Since,and two constrain).
Analysis of Variation (ANOVA) Table
(To
34test Hypothesis: )

Source of Degree of freedom Sum of Squares Mean sum of F


Variation (d.f.) (SS) Squares (MSS)

Regression 1

Residual  

Total    

Therefore

To test we compute F and reject if .


2.10 Goodness of fit of regression
35
It can be noted that a fitted model can be said to be good when residuals are small. Since is based on residuals,
so a measure of quality of fitted model can be based on . When intercept term is present in them odel, a
measure of goodness of fit of the model is given by

This is known as the coefficient of determination. This measure is based on the concept that how much
variation in y ’s stated by is explainable by and how much unexplainable part is contained in The ratio
describes the proportion of variability that is explained by regression in relation to the total variability of y .
The ratio describes the proportion of variability that is not covered by the regression.
Cleary.
2.11 Prediction of values of study variable
36
An important use of linear regression modeling is to predict the average and actual values of study variable. The term
prediction of value of study variable corresponds to knowing the value of (in case of average value) and value of y (in
case of actual value) for a given value of explanatory variable. We consider both the cases.
Case 1: Prediction of average value
Suppose we want to predict the value of for a given value of . Then the predictor is given by

Predictive bias
Then the prediction error is given as

Thus the predictoris an unbiased predictor of


37
Predictive variance:
The predictive variance ofis

Estimate of predictive variance


Prediction interval estimation:
The 38
prediction interval for is obtained as follows:
The predictor is a linear combination of normally distributed random variables, so it is also normally
distributed as

When is unknown

 
39

Note that the width of prediction interval is a function of . The interval width is minimum for . This is
expected also as the best estimates of y to be made at x -values lie near the center of the data and the
precision of estimation to deteriorate as we move to the boundary of the x -space.

Note that the width of prediction interval is a function of . The interval width is minimum for . This is
expected also as the best estimates of y to be made at x -values lie near the center of the data and the
precision of estimation to deteriorate as we move to the boundary of the x -space.
40

Case 2: Prediction of actual value


If is the value of the explanatory variable, then the actual value predictor for y is . The true value of y in
the prediction period is given by where indicates the value that would be drawn from the distribution of
random error in the prediction period. Note that the form of predictor is the same as of average value
predictor but its predictive error and other properties are different. This is the dual nature of predictor.
41
Predictive bias:
The predictive error of is given by

which implies that is an unbiased predictor of .


Predictive variance
Estimate of predictive variance
42
Prediction interval:
If is known,
43 then the distribution of
The prediction interval for is obtained as follows:
The predictor is a linear combination of normally distributed random variables, so it is also normally
distributed as

When is unknown
Example: Simple Regression Analysis (A Complete Example)
A44random sample of eight drivers insured with a company and having similar auto insurance policies was
selected. The following table lists their driving experiences (in years) and monthly auto insurance
premiums.
Driving Experience (years) Monthly Auto Insurance Premium ($)
5 64
2 87
12 50
9 71
15 44
6 56
25 42
16 60
45

a. Does the insurance premium depend on the driving experience or does the driving experience depend on
the insurance premium? Do you expect a positive or a negative relationship between these two variables?
b. Compute .
c. Find the least squares regression line by choosing appropriate dependent and independent variables based
on your answer in part a.
d. Interpret the meaning of the values of and calculated in part c.
e. Plot the scatter diagram and the regression line.
f. Calculate r and and explain what they mean.
g. Predict the monthly auto insurance premium for a driver with 10 years of driving experience.

You might also like