9 - APM 1205 Linear Model

Variable Selection for
Multiple Linear Models

Variable selection
Definition Objectives
• Variable Selection is finding the Building a regression model that includes only
appropriate subset of the a subset of the available regressors involves
regression model two conflicting objectives:
• It assume to improve model
accuracy and address overfitting • We would like the model to include as many
regressors as possible so that information
• Overfitted models describe content in these factors can influence the
random error or noise instead of predicted value of y (Overfitting)
any underlying relationship
• We want the model to include as few
regressors as possible because the variance
of the prediction y increases as the number
of regressors increases (Parsimony)
Selection process
Steps in Selection Process
a. Specify the maximum model to be considered.
b. Specify a criterion for selection a model.
c. Specify a strategy for selecting variables.
d. Conduct the specified analysis.
e. Evaluate the Validity of the model chosen.
Selection Algorithm
Forward Selection
• This methodology assumes that there is no explanatory
variable in the model except an intercept term
• Include s variables one by one and tests the fitted model at
each step using some suitable criterion
• Each step, the variable showing the biggest improvement
to the model is added and remains in the model
Selection Algorithm
Forward Selection Algorithm
1. Consider only intercept term and insert one variable at a time.
2. Calculate the simple correlations of with
3. Choose which has the largest correlation with .
4. if is the variable which has the highest correlation with , then will produce the largest value
5. Choose a prespecified value of value
6. If , then accept and so enters into the model
7. Adjust the effect of on and re-compute the correlations of remaining with an obtain the partial correlations.
8. Choose with the second-largest correlation with , i.e., the variable with the highest value of partial correlation with
9. Suppose this variable is . Then the largest partial F-statistic is
10. Repeat process. at each step, the partial correlations are computed, and explanatory variable corresponding to the
highest partial correlation with is chosen to be added into the model.
11. Continue with such selection as long as either at a particular step, the partial F-statistic does not exceed or when
the least explanatory variable is added to the model.
Selection Algorithm
Backward Elimination
• Begins with all explanatory variables and keeps on deleting
one variable at a time until a suitable model is obtained
• Variables are then deleted from the model one by one until
all the variables remaining in the model are significant and
exceed certain criteria
• At each step, the variable showing the smallest
improvement to the model is deleted from the model and
never again considered on the next iteration process
Selection Algorithm
Backward Selection Algorithm
1. Consider all explanatory variables and fit the model.

2. Compute partial F-statistic for each explanatory variables as if it were the last variable to enter
3. in the model.
4. Choose a preselected value (F -to-remove)
5. Compare the smallest of the partial F-statistics with . If it is less than , then remove
6. the corresponding explanatory variable from the model.
7. The model will have now explanatory variables.
8. Fit the model with these explanatory variables, compute the partial F-statistic for the
9. new model and compare it with . If it is less them , then remove the corresponding variable from the model.
10. Repeat this procedure.
11. Stop the procedure when the smallest partial F  statistic exceeds FOUT .
Selection Algorithm
Stepwise Selection
• A combination of forward selection and backward elimination
procedure
• Stepwise regression is a modified version of forward regression
that permits reexamination, at every step, of the variables
incorporated in the model in pervious steps
• A variable that entered at an early stage may become
superfluous at a larger stage because of its relationship with
other variables subsequently added to the model
Step wise Selection
Stepwise Selection Algorithm
1. Consider all the explanatory variables entered into the model at the previous step.
2. Add a new variable and regresses it via their partial F-statistics.
3. An explanatory variable that was added at an earlier step may now become insignificant due
to its
4. relationship with currently present explanatory variables in the model.
5. If partial F -statistic for an explanatory variable is smaller than , then this variable is deleted
6. from the model.
7. Stepwise needs two cut-off values, and . Sometimes = or > are
8. considered .
9. The choice > makes relatively more difficult to add an explanatory variable than to delete
one. .
Selection Algorithm in R
Forward Selection Algorithm
step(model, direction=“forward”)
stepAIC(model, direction=“forward”)
Backward Elimination Algorithm
step(model, direction=“backward”)
stepAIC(model, direction=“backward”)
Stepwise Selection Algorithm
step(model, direction=“both”)
stepAIC(model, direction=“both”)
Information Criteria
F-test Statistics: Comparison between reduced model and
full model using:
compared to an F -distribution with k − p +1 and smaller n −

k −1 degrees of freedom. If is not significant, we can use the
(p −1 variables) model.
Mallows : Mallows' Cp statistic estimates the size of the bias
that is introduced into the predicted responses by having an
underspecified model.
Use Mallows' Cp to choose between multiple regression models
where Mallows' Cp is small and close to the number of

predictors in the model plus the constant (p).
Coefficient of Determination
the square of multiple correlation coefficients between the
study variable and set of explanatory variables denotes as .
and are sum of squares due to regression and residuals

respectively
Adjusted Coefficient of Determination
corrected goodness-of-fit (model accuracy) measure for linear
models. It measures the percentage of variance in the target
field that is explained by the model. Denoted as
Adjusted Coefficient of Determination
corrected goodness-of-fit (model accuracy) measure for linear
models. It measures the percentage of variance in the target
field that is explained by the model. Denoted as
Residual Mean Square Error
Sum of squares due to residuals . A model with smaller is
preferable
minimum value of will produce a with maximum value

Akaike’s information Criterion
is a mathematical method for evaluating how well a model fits the data. Determine
the relative information value of the model using the maximum likelihood estimate
and the number of parameters (independent variables) in the model.
It is define as:
Where
Bayesian information Criterion
Similar to AIC, the Bayesian information criterion is based on
maximizing the posterior distribution of the model given the
observations .
It is define as:
A model with a smaller value of BIC is preferable

Predicted Residual Error Sum of Squares (PRESS)
A form of cross-validation used in regression analysis to provide a summary
measure of the fit of a model to a sample of observations that were not themselves
used to estimate the model.
It is define as:
where
is the ith element in This criterion is used on similar lines as in the case of
A subset regression model with a smaller value is preferred
R function
Selection Algorithm Mallows Cp
“step” function ols_mallows_cp function through
(olsrr) package
“stepAIC”function
AIC and BIC
Ftest AIC function
anova function BIC function
RMSE PRESS
RMSE function through (qpcR) PRESS function through (qpcR)
package package

9 - APM 1205 Linear Model

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

9 - APM 1205 Linear Model

Uploaded by

Copyright:

Available Formats

Variable Selection for

Multiple Linear Models

1. Consider all explanatory variables and fit the model.

compared to an F -distribution with k − p +1 and smaller n −

where Mallows' Cp is small and close to the number of

and are sum of squares due to regression and residuals

minimum value of will produce a with maximum value

A model with a smaller value of BIC is preferable

You might also like