Professional Documents
Culture Documents
Chapter 1 and 2
Chapter 1 and 2
INTRODUCTION
&
CHAPTER 2
SIMPLE LINEAR
1
REGRESSION MODEL
2
Where is some well-defined function and the parameters which characterize the
role and contribution of respectively. The term reflects the stochastic nature of the relationship between and and
indicates that such a relationship is not exact in nature. When then the relationship is called the mathematical
model otherwise the statistical model.
A model or relationship is termed as linear if it is linear in parameters and
nonlinear, if it is not linear in parameters. In other words, if all the partial derivatives of with respect to each of
the parameters are independent of the parameters, then the model is called as a linear model.
Example 1.1:
4
Steps in regression analysis
Regression analysis includes the following steps:
Statement of the problem under consideration
Choice of relevant variables
Collection of data on relevant variables
Specification of model
Choice of method for fitting the data
Fitting of model
Model validation and criticism
Using the chosen model(s) for the solution of the posed problem.
5
CHAPTER 2
SIMPLE LINEAR
REGRESSION MODEL
6
2.1 The Simple Linear Regression Model
We consider the modeling between the dependent and one independent variable. When there is only one
independent variable in the linear regression model, the model is generally termed as simple linear
regression model. When there are more than one independent variables in the model, then the linear
model is termed as the multiple linear regression model.
The simple linear regression model is
- Response variable
- Regressor (Control, Independent, Explanatory) variable
- Intercept
- Slope
7 Assumptions on the model:
Basic
Therefore
8
The parameters are unknown and must be estimated using sample data .
We have to fit a straight line for given data which is better. The line fitted by least square is the one that
makes sum of squares of all vertical displacements as small as possible.
The vertical displacement of ith observation is
9
Least square estimator does that
Where and
The solutions of these two equations are called the direct regression estimators, or usually called as the
ordinary least squares (OLS) estimators of .
11
Further, we have
12
The Hessian matrix which is the matrix of second order partial derivatives in the case is given as
(b).
(c).
(d).
(e). Both are unbiased estimator of respectively.
14
(f). and
Both variance and covariance ofhave but value is unknown, therefore replaced by estimator of.
(g). The residual sum of squares is given as
Therefore
15
Thus using the result about the expectation of a chi-square random variable, we have
Thus an unbiased estimator of is
Note thathas only degrees of freedom. The two degrees of freedom are lost due to estimation
of . Since depends on the estimates, so it is a model dependent estimate of
2.4.2 Estimate of variances of:
17
The estimators of variances of are obtained by replacing by its estimate as follows:
and
2.5 Centered Model:
18
Sometimes it is useful to measure the independent variable around its mean. In such a case,
model has a centered version as follows:
Solving
we can obtain
And
Respectively.
19
and
The fitted model of is
Given the estimator of as
21
Also
When is known, then using the result that and is a linear combination of normally distributed random
variables, the following statistic
We know that
And
When is known, then using the result that and is a linear combination of normally distributed random
variables, the following statistic
So the of confidence interval of is
Case 2: When is unknown
Hypothesis:
28
Where is specified. The test statistic is based on the result . So test statistic is
Reject if or .
31
Confidence interval for
Regression 1
Residual
Total
Therefore
This is known as the coefficient of determination. This measure is based on the concept that how much
variation in y ’s stated by is explainable by and how much unexplainable part is contained in The ratio
describes the proportion of variability that is explained by regression in relation to the total variability of y .
The ratio describes the proportion of variability that is not covered by the regression.
Cleary.
2.11 Prediction of values of study variable
36
An important use of linear regression modeling is to predict the average and actual values of study variable. The term
prediction of value of study variable corresponds to knowing the value of (in case of average value) and value of y (in
case of actual value) for a given value of explanatory variable. We consider both the cases.
Case 1: Prediction of average value
Suppose we want to predict the value of for a given value of . Then the predictor is given by
Predictive bias
Then the prediction error is given as
When is unknown
39
Note that the width of prediction interval is a function of . The interval width is minimum for . This is
expected also as the best estimates of y to be made at x -values lie near the center of the data and the
precision of estimation to deteriorate as we move to the boundary of the x -space.
Note that the width of prediction interval is a function of . The interval width is minimum for . This is
expected also as the best estimates of y to be made at x -values lie near the center of the data and the
precision of estimation to deteriorate as we move to the boundary of the x -space.
40
When is unknown
Example: Simple Regression Analysis (A Complete Example)
A44random sample of eight drivers insured with a company and having similar auto insurance policies was
selected. The following table lists their driving experiences (in years) and monthly auto insurance
premiums.
Driving Experience (years) Monthly Auto Insurance Premium ($)
5 64
2 87
12 50
9 71
15 44
6 56
25 42
16 60
45
a. Does the insurance premium depend on the driving experience or does the driving experience depend on
the insurance premium? Do you expect a positive or a negative relationship between these two variables?
b. Compute .
c. Find the least squares regression line by choosing appropriate dependent and independent variables based
on your answer in part a.
d. Interpret the meaning of the values of and calculated in part c.
e. Plot the scatter diagram and the regression line.
f. Calculate r and and explain what they mean.
g. Predict the monthly auto insurance premium for a driver with 10 years of driving experience.