Chapter 4

CHAPTER 4
SELECTING THE
BEST
1 REGRESSION MODEL
4.1 Introduction
2
For the multiple linear regression, the number of regression variables is more than one. But some of them can
be are irrelevant and can be removed from the regression equation.
The basic idea behind this finding the best regression model is that, we need to find an appropriate subset of
regressors that can explain the variability in the responsible variable well. And finding this subset regression
variable, this problem is called variable selection problem
While choosing a subset of explanatory variables, there are two possible options:
1. In order to make the model as realistic as possible, the analyst may include as many as possible
explanatory variables.
2. In order to make the model as simple as possible, one way includes only fewer numbers of explanatory
variables.
3
There can be two types of incorrect model specifications.
1. Omission/exclusion of relevant variables.
2. Inclusion of irrelevant variables.
Evaluation of Subset Regression Model
A question arises after the selection of subsets of candidate variables for the model, how to judge which
subset yields better regression model. Various criteria have been proposed in the literature to evaluate and
compare the subset regression models.
4.2 Coefficient of Multiple Determination (
4
The coefficient of determination is the square of multiple correlation coefficient between the study variable
and set of explanatory variables denotes as . The coefficient of determination based on such variables is
Where and are the sum of squares due to regression and residuals, respectively in a subset model based on
explanatory variables. Since there are explanatory variables available and we select only out of them, so there
are − possible choices of subsets.
5
So proceed as follows:
 Choose any appropriate value of , fit the model and obtain
 Add one variable, fit the model and again obtain .
 Obviously . If is small, then stop and choose the value of for subset regression.
 If is high, then keep on adding variables up to a point where an additional variable does not produces a
large change in the value of or the increment in becomes small.
To know such value of p, create a plot of versus .
6
The curve will look like as in the following figure.
7
4.3 Adjusted Coefficient of Determination
The adjusted coefficient of determination has certain advantages over the usual coefficient of
determination. The adjusted coefficient of determination based on p -term model is
An advantage of is that it does not necessarily increases as increases.

8
4.4 Residual Mean Square
A model is said to have a better fit if residuals are small. The residual mean square based on a variable subset
regression model is defined as
So similarly as increases, initially decreases, then stabilizes and finally may increase if the model is not
sufficient to compensate the loss of one degree of freedom in the factor .
When is plotted versus p, the curve look like as in the following figure
9
So
10
 Plot versus .
 Choose corresponding to minimum value of such minimum value of will produce a with maximum
value. So
 Choose p corresponding to which is approximately equal to based on full model.
 Choose near the point where the smallest value of turns upward.
Such minimum value of will produce a with maximum value. So
Thus the two criterion, minimum and maximum are equivalent.

4.5 Mallow’s Statistics:
11
Mallow’s statistic measures the overall bias or Mean Square error in the fitted model.
Where - fitted value
- expected response for the regression model
Which is the Mallow’s statistic for the model
When different subset models are considered, then the models with smallest are considered to be better than
those models with higher. So lower Cp is preferable.
The plot of Cp versus p for each regression equation will be a straight line passing through origin
12
and look like as follows:
13
4.6 Akaike’s Information Criterion (AIC)
The Akaike’s information criterion statistic is given as
Where
The AIC is defined as

14
4.7 Bayesian Information Criterion (BIC)
Similar to AIC, the Bayesian information criterion is based on maximizing the posterior distribution of
model given the observations. In the case of linear regression model, it is defined as
A model with smaller value of BIC is preferable.

4.8 15
Partial F- statistic

The partial F-test is the most common method of testing for a nested normal linear regression model.
"Nested" model is just a fancy way of saying a reduced model in terms of variables included.
For illustration, suppose that you wish to test the hypothesis that coefficients are zero, and thus these
variables can be omitted from the model, and you also have coefficients in the full model (including the
intercept). The test is based on the comparison of the Residual Sum of Squares (RSS) and thus you need to
run two separate regressions and save the RSS from each one. For the full model the RSS will be lower since
the addition of new variables invariably leads to a reduction of the RSS (and an increase in the Explained
Sum of Squares, this is closely related to ). What we are testing therefore is whether the difference is so large
that the removal of the variables will be detrimental to the model. Let's be a little more specific.
The test takes the following form
16
It can be shown that the variables in the numerator and the denominator when scaled by are
independent variables with degrees of freedom and respectively, hence the ratio is an F-distributed random
variable with parameters and . You reject the null hypothesis that the reduced model is appropriate if the
statistic exceeds a critical value from the said distribution which in turn will happen if your model loses too
much explanatory power after removing the variables.
The statistic can actually be derived from a likelihood ratio point of view and therefore has some good
properties when the standard assumptions of the linear model are met, for instance constant variance,
normality and so on. It is also more powerful than a series of individual tests, not to mention that it has the
desired level of significance.
4.9 Computational Techniques for Variable Selection
17
In order to select a subset model, several techniques based on computational procedures and algorithm the
available. They are essentially based on two ideas – select all possible explanatory variables or select the
explanatory variables stepwise.
4.9.1 Use All Possible Explanatory Variables
This methodology is based on following steps:

• Fit a model with one explanatory variable.
• Fit a model with two explanatory variables.
• Fit a model with three explanatory variables.
and so on.
Choose a suitable criterion for model selection and evaluate each of the fitted regression equation with the
selection criterion.
18
4.9.2 Stepwise Regression Techniques
This methodology is based on choosing the explanatory variables in the subset model in steps which can be
either adding one variable at a time or deleting one variable at a times. Based on this, there are three
procedures
- Forward selection
–Backward elimination and
- Stepwise regression.
These procedures are basically computer intensive procedures and are executed using a software.
4.9.3 Forward Selection Procedure:
19
This methodology assumes that there is no explanatory variable in the model except an intercept term. It adds
variables one by one and test the fitted model at each step using some suitable criterion. It has following steps.
 Consider only intercept term and insert one variable at a time.
 All possible models with one regressor are considered and F-statistic for each regressor is computed. The
regressor having highest F statistic value is added to the model if
 Partial F-statistics are computed for all of the remaining regressors in the presence of previously selected
regressors and the one yielding the highest F is added to the model if
 Forward selection terminates when the highest partial F statistic at a particular stage does not exceeds or
when the last candidate regressor is added
4.9.4 Backward Elimination Procedure:
20
The backward elimination methodology begins with all explanatory variables and keeps on deleting one
variable at a time until a suitable model is obtained.
It is based on following steps:
 Start with full model.
 Compute partial F-statistic for each regressor in the presence other regressors in the model.
 Choose a preselected value (− to-remove).
 The regressor with smallest partial F value is removed from model if
 Partial F statistics are computed for this new model and process repeats.
 Backward elimination also terminates when the smallest partial .

4.9.5 Stepwise Regression Procedure:
21
A combination of forward selection and backward elimination procedure is the stepwise regression. It is a
modification of forward selection procedure and has following steps.
 No regressor in the model
 All possible models with one regressors are considered and F-statistic for each regressor is computed. The
regressor having highest F statistic value is added to the model.
 Partial F-statistics are computed for all of the remaining regressors in the presence of previously selected
regressors and the one yielding the highest F is added to the model if
 All variables in the model are evaluated with partial F test to see if each one is still significant. At this step,
any regressor that is no longer significant is dropped from the model.
 The stepwise selection terminates when, no other regressor yields a partial F greater than the threshold value
and all regressors in the model remains significant.
General comments:
22
1. None of the methods among forward selection, backward elimination or stepwise regression guarantees the
best subset model.
2. The order in which the explanatory variables enter or leave the models does not indicate the order of
importance of explanatory variable.
3. In forward selection, no explanatory variable can be removed if entered in the model. Similarly in backward
elimination, no explanatory variable can be added if removed from the model.
4. All procedures may lead to different models.
5. Different model selection criterion may give different subset models.

Example4.1:
23 HALD cement data
The
7 26 6 60 78.5
1 29 15 52 74.3
1 56 8 20 104.3
11 31 8 47 87.6
7 52 6 33 95.9
11 55 9 22 109.2
3 71 17 6 102.7
1 31 22 44 72.5
2 54 18 22 93.1
21 47 4 26 115.9
1 40 23 34 83.8
11 66 9 12 113.3
10 68 8 12 109.4

Chapter 4

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 4

Uploaded by

Copyright:

Available Formats

CHAPTER 4

1. Omission/exclusion of relevant variables.

2. Inclusion of irrelevant variables.

Evaluation of Subset Regression Model

 Choose any appropriate value of , fit the model and obtain

 Add one variable, fit the model and again obtain .

4.3 Adjusted Coefficient of Determination

An advantage of is that it does not necessarily increases as increases.

 Choose p corresponding to which is approximately equal to based on full model.

Such minimum value of will produce a with maximum value. So

Thus the two criterion, minimum and maximum are equivalent.

Where - fitted value

- expected response for the regression model

Which is the Mallow’s statistic for the model

The Akaike’s information criterion statistic is given as

The AIC is defined as

A model with smaller value of BIC is preferable.

4.9.1 Use All Possible Explanatory Variables

This methodology is based on following steps:

–Backward elimination and

 Consider only intercept term and insert one variable at a time.

It is based on following steps:

 Start with full model.

 Choose a preselected value (− to-remove).

 The regressor with smallest partial F value is removed from model if

 Backward elimination also terminates when the smallest partial .

 No regressor in the model

4. All procedures may lead to different models.

5. Different model selection criterion may give different subset models.

You might also like