Chapter 1 and 2

CHAPTER 1
INTRODUCTION
&
CHAPTER 2
SIMPLE LINEAR
1
REGRESSION MODEL
2
1.1 Objectives of Regression Analysis

The determination of explicit form of regression equation is the ultimate objective of regression analysis.
It is finally a good and valid relationship between study variable and explanatory variables. Such
regression equation can be used for several purposes. For example, to determine the role of any
explanatory variable in the joint relationship in any policy formulation, to forecast the values of response
variable for given set of values of explanatory variables. The regression equation helps is understanding
the interrelationships of variables among them
Under the regression analysis we have mainly two types of variables, predictor variables and response
variables. By predictor variables we shall usually mean variables that can either set to the desired value or
else take values that can be observed but not controlled. As a result of the change of predictor variables
the response variable is change.
Throughout the course we shall be most often concerned with relationship of the form
3

Where is some well-defined function and the parameters which characterize the
role and contribution of respectively. The term reflects the stochastic nature of the relationship between and and
indicates that such a relationship is not exact in nature. When then the relationship is called the mathematical
model otherwise the statistical model.

A model or relationship is termed as linear if it is linear in parameters and
nonlinear, if it is not linear in parameters. In other words, if all the partial derivatives of with respect to each of
the parameters are independent of the parameters, then the model is called as a linear model.
Example 1.1:
4
Steps in regression analysis
Regression analysis includes the following steps:
 Statement of the problem under consideration
 Choice of relevant variables
 Collection of data on relevant variables
 Specification of model
 Choice of method for fitting the data
 Fitting of model
 Model validation and criticism
 Using the chosen model(s) for the solution of the posed problem.
5
CHAPTER 2
SIMPLE LINEAR
REGRESSION MODEL
6
2.1 The Simple Linear Regression Model

We consider the modeling between the dependent and one independent variable. When there is only one
independent variable in the linear regression model, the model is generally termed as simple linear
regression model. When there are more than one independent variables in the model, then the linear
model is termed as the multiple linear regression model.
The simple linear regression model is
- Response variable
- Regressor (Control, Independent, Explanatory) variable
- Intercept
- Slope
7 Assumptions on the model:
Basic
 is a random variable with zero mean and variance (unknown) , that is .

 andare uncorrelated , , so .
 is a normally distributed independent random variable with zero mean and variance

Therefore
8
2.2 Least –Square Estimation of the Parameters

Suppose a sample of sets of paired observations (). These observations are assumed to satisfy the simple
linear regression model and so we can write

The parameters are unknown and must be estimated using sample data .
We have to fit a straight line for given data which is better. The line fitted by least square is the one that
makes sum of squares of all vertical displacements as small as possible.
The vertical displacement of ith observation is
9
Least square estimator does that
The least square estimator of must satisfy

10
The equations are called as normal equations.

By solving the two equations we can show that
Where and
The solutions of these two equations are called the direct regression estimators, or usually called as the
ordinary least squares (OLS) estimators of .
11
Further, we have
12
The Hessian matrix which is the matrix of second order partial derivatives in the case is given as
That is is positive definite for any , therefore has a global minimum at .

2.3 Useful properties of least square estimator
13
(a). Sum of the residuals in any regression model that contains an intercept is always 0.
(b).
(c).
(d).

(e). Both are unbiased estimator of respectively.
14
(f). and
Both variance and covariance ofhave but value is unknown, therefore replaced by estimator of.
(g). The residual sum of squares is given as

Therefore
15
(Residual mean square) is a unbiased estimator of.

16
2.4.1 Estimation of
The estimator of is obtained from the residual sum of squares as follows. Assuming that is normally
distributed, it follows that has a distribution with () degrees of freedom, so
Thus using the result about the expectation of a chi-square random variable, we have

Thus an unbiased estimator of is
Note thathas only degrees of freedom. The two degrees of freedom are lost due to estimation
of . Since depends on the estimates, so it is a model dependent estimate of
2.4.2 Estimate of variances of:
17
The estimators of variances of are obtained by replacing by its estimate as follows:
and
2.5 Centered Model:
18
Sometimes it is useful to measure the independent variable around its mean. In such a case,
model has a centered version as follows:

The sum of squares due to error is given by
Solving
we can obtain
And
Respectively.
19
Under the assumption that , it follows that
and
The fitted model of is
And the predicted values are
Note that centered model .

2.6 No intercept term model:
20
Sometimes in practice, a model without an intercept term is used in those situations when A no-intercept
model is
The sum of squares due to error is given by

Given the estimator of as
21
Also
And an unbiased estimator of is obtained as

2.7 Testing of hypotheses and confidence interval estimation for intercept term:
22
Now we consider the tests of hypothesis and confidence interval estimation for the intercept termof the
model under two cases, when is known and when is unknown.
Case 1: When is known
Hypothesis:
When is known, then using the result that and is a linear combination of normally distributed random
variables, the following statistic
Has a distribution under.

Reject if .
23
The confidence intervals for when is known is
So the of confidence interval of is

Case 2: When is unknown
Hypothesis:
24
We know that
And
When is unknown, then the following statistic is constructed
Which follows distribution with degree of freedom under .

Reject if .
25
The confidence intervals for when is unknown is

2.8 Testing of hypotheses and confidence interval estimation for slope parameter:
Now26
we consider the tests of hypothesis and confidence interval estimation for the slope parameter of the
model under two cases, viz., when is known and when is unknown.
Case 1: When is known
Hypothesis:
When is known, then using the result that and is a linear combination of normally distributed random
variables, the following statistic
Has a distribution under.

Reject if .
27
The confidence intervals for when is known is

Case 2: When is unknown
Hypothesis:
28
When is unknown, then the following statistic is constructed
Which follows distribution with degree of freedom under .

Reject if .
29
The confidence intervals for when is unknown is

Test Hypothesis for
30 considered two types of test statistics for testing the hypothesis about the intercept term and
We have
slope parameter- when is known and when is unknown. While dealing with the case of known , the
value of is known from some external sources like past experience, long association of the experimenter
with the experiment, past studies etc. In such situations, the experimenter would like to test the hypothesis
like

Where is specified. The test statistic is based on the result . So test statistic is

Reject if or .
31
Confidence interval for
The corresponding confidence interval for is

2.932
Analysis of variance:
The technique of analysis of variance is usually used for testing the hypothesis related to equality of more
than one parameters, like population means or slope parameters. It is more meaningful in case of multiple
regression model when there are more than one slope parameters. This technique is discussed and lustrated
here to understand the related basic concepts and fundamentals which will be used in developing the
analysis of variance in the next module in multiple linear regression model where the explanatory variables
are more than two.

Total Variance in data
The question is how much of the variation is explained by the model?
We have
33
Degree of freedom of is. (Since , n terms and , one constrain ).

Degree of freedom of is . (Since,and two constrain).
Analysis of Variation (ANOVA) Table
(To
34test Hypothesis: )
Source of Degree of freedom Sum of Squares Mean sum of F

Variation (d.f.) (SS) Squares (MSS)
Regression 1
Residual
Total
Therefore
To test we compute F and reject if .

2.10 Goodness of fit of regression
35
It can be noted that a fitted model can be said to be good when residuals are small. Since is based on residuals,
so a measure of quality of fitted model can be based on . When intercept term is present in them odel, a
measure of goodness of fit of the model is given by
This is known as the coefficient of determination. This measure is based on the concept that how much
variation in y ’s stated by is explainable by and how much unexplainable part is contained in The ratio
describes the proportion of variability that is explained by regression in relation to the total variability of y .
The ratio describes the proportion of variability that is not covered by the regression.
Cleary.
2.11 Prediction of values of study variable
36
An important use of linear regression modeling is to predict the average and actual values of study variable. The term
prediction of value of study variable corresponds to knowing the value of (in case of average value) and value of y (in
case of actual value) for a given value of explanatory variable. We consider both the cases.
Case 1: Prediction of average value
Suppose we want to predict the value of for a given value of . Then the predictor is given by
Predictive bias
Then the prediction error is given as
Thus the predictoris an unbiased predictor of

37
Predictive variance:
The predictive variance ofis
Estimate of predictive variance

Prediction interval estimation:
The 38
prediction interval for is obtained as follows:
The predictor is a linear combination of normally distributed random variables, so it is also normally
distributed as
When is unknown

39
Note that the width of prediction interval is a function of . The interval width is minimum for . This is
expected also as the best estimates of y to be made at x -values lie near the center of the data and the
precision of estimation to deteriorate as we move to the boundary of the x -space.
Note that the width of prediction interval is a function of . The interval width is minimum for . This is
expected also as the best estimates of y to be made at x -values lie near the center of the data and the
precision of estimation to deteriorate as we move to the boundary of the x -space.
40
Case 2: Prediction of actual value

If is the value of the explanatory variable, then the actual value predictor for y is . The true value of y in
the prediction period is given by where indicates the value that would be drawn from the distribution of
random error in the prediction period. Note that the form of predictor is the same as of average value
predictor but its predictive error and other properties are different. This is the dual nature of predictor.
41
Predictive bias:
The predictive error of is given by
which implies that is an unbiased predictor of .

Predictive variance
Estimate of predictive variance
42
Prediction interval:
If is known,
43 then the distribution of
The prediction interval for is obtained as follows:
The predictor is a linear combination of normally distributed random variables, so it is also normally
distributed as
When is unknown
Example: Simple Regression Analysis (A Complete Example)
A44random sample of eight drivers insured with a company and having similar auto insurance policies was
selected. The following table lists their driving experiences (in years) and monthly auto insurance
premiums.
Driving Experience (years) Monthly Auto Insurance Premium ($)
5 64
2 87
12 50
9 71
15 44
6 56
25 42
16 60
45
a. Does the insurance premium depend on the driving experience or does the driving experience depend on
the insurance premium? Do you expect a positive or a negative relationship between these two variables?
b. Compute .
c. Find the least squares regression line by choosing appropriate dependent and independent variables based
on your answer in part a.
d. Interpret the meaning of the values of and calculated in part c.
e. Plot the scatter diagram and the regression line.
f. Calculate r and and explain what they mean.
g. Predict the monthly auto insurance premium for a driver with 10 years of driving experience.

Chapter 1 and 2

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 1 and 2

Uploaded by

Copyright:

Available Formats

CHAPTER 1

1.1 Objectives of Regression Analysis

 is a random variable with zero mean and variance (unknown) , that is .

2.2 Least –Square Estimation of the Parameters

The least square estimator of must satisfy

The equations are called as normal equations.

That is is positive definite for any , therefore has a global minimum at .

(Residual mean square) is a unbiased estimator of.

The sum of squares due to error is given by

Under the assumption that , it follows that

And the predicted values are

Note that centered model .

The sum of squares due to error is given by

And an unbiased estimator of is obtained as

Has a distribution under.

The confidence intervals for when is known is

So the of confidence interval of is

When is unknown, then the following statistic is constructed

Which follows distribution with degree of freedom under .

The confidence intervals for when is unknown is

So the of confidence interval of is

Has a distribution under.

The confidence intervals for when is known is

When is unknown, then the following statistic is constructed

Which follows distribution with degree of freedom under .

The confidence intervals for when is unknown is

So the of confidence interval of is

The corresponding confidence interval for is

Degree of freedom of is. (Since , n terms and , one constrain ).

Source of Degree of freedom Sum of Squares Mean sum of F

To test we compute F and reject if .

Thus the predictoris an unbiased predictor of

Estimate of predictive variance

Case 2: Prediction of actual value

which implies that is an unbiased predictor of .

You might also like