Download as pdf or txt
Download as pdf or txt
You are on page 1of 31

Linear Regression Models

Copyr ig ht © SAS Institute Inc . All r ig hts re s e r ve d.


What Is SAS Visual Statistics?
With the SAS Visual Statistics add-on, you can use SAS Visual Analytics
to rapidly build and modify predictive models.

• Use SAS Visual Analytics to prepare and explore data.


• Use predictive and exploratory modeling techniques.
• linear and logistic regression analyses
• generalized linear and additive models
• decision trees
• clustering

2 SAS Visual Statistics Objects


Copyr ig ht © SAS Institute Inc . All r ig hts re s e r ve d.
Objectives
• Create a linear regression model in SAS Visual Statistics.
• Implement predictor variable selection for a linear regression model.
• Apply filters.
• Create interaction variables.
• Assess model goodness of fit using generated statistics and diagnostic
plots.

3
Copyr ig ht © SAS Institute Inc . All r ig hts re s e r ve d.
Regression Models
• Regression models enable you to predict the value of a response variable
(dependent variable) as a linear function of one or more effects
(independent or predictor variables)

• Example: determine trends in data…


• in health care, to study observational data and determine what factors put
people at risk for certain diseases
• in finance for risk assessment
• in economics for modeling inventory, labor supply and demand and developing
spending models.

4
Copyr ig ht © SAS Institute Inc . All r ig hts re s e r ve d.
Linear Regression
• Linear regression models assume that the relationship between the response variable and
the input variables is linear

Y = b0 + b1X1 + b2X2 + … + bkXk + e


response
error term
intercept parameter associated to predictor variable X1

• There is only one response variable: Y


• There must be a least one predictor variable X

5
Copyr ig ht © SAS Institute Inc . All r ig hts re s e r ve d.
Linear Regression
• Linear regression models assume that the relationship between the response variable and
the input variables is linear.

Y = b0 + b1X1 + b2X2 + … + bkXk + e

• Linear regression uses the least squares method to determine parameter estimates

• Model effects or explanatory variables can be one of the following effects:


• continuous (Continuous effects)
• categorical (Classification effects)
• interaction terms (Interaction effects) (e.g. X1X2)

6
Copyr ig ht © SAS Institute Inc . All r ig hts re s e r ve d.
Linear Regression: Least Squares Methods

Goal: Find the line


that fits the data as
close as possible

7
Copyr ig ht © SAS Institute Inc . All r ig hts re s e r ve d.
Linear Regression Roles

• Response – one measure


• Continuous effects – one or more measures
• Classification effects – one or more categories
• Interaction effects – one or more interaction effects

14
Copyr ig ht © SAS Institute Inc . All r ig hts re s e r ve d.
continued...
Linear Regression Options
• Linear Regression
• Informative missingness
- By default, observations with missing values are dropped
- If ticked, SAS models the missing values (indicator variable is
created to denote missing values)
• Variable selection method
- Forward, backward, stepwise
- Significance level at which the effects will remain in the model

• Number of bins
- Used in the assessment plot (Increasing the number of bins
increases the accuracy of the assessment at the expense of
computing time)
• Tolerance (for convergence)

15
Copyr ig ht © SAS Institute Inc . All r ig hts re s e r ve d.
Linear Regression Options
• Model Display
• Plot layout
• Statistic to show
• Use histogram
• Y axis
• Legend visibility
• Plot to show

16
Copyr ig ht © SAS Institute Inc . All r ig hts re s e r ve d.
Examining Predicted vs. Residual (“The Residual Plot”)

17
Copyr ig ht © SAS Institute Inc . All r ig hts re s e r ve d.
Residuals Plots: a Diagnostic Tool

18
Copyr ig ht © SAS Institute Inc . All r ig hts re s e r ve d.
Linear Regression Results
Summary Bar

Residual Plot

Fit Summary Assessment


19
Copyr ig ht © SAS Institute Inc . All r ig hts re s e r ve d.
Analyzing Linear Regression Results
Three panes appear under a summary bar. They can help you analyze
the results of the linear regression model.
• The Fit Summary window displays how significant the effect variables are
to the response variable.
• The Residual Plot window displays the relationship between the predicted and the
residual data.
• The Assessment window displays the values for the observed response
and the model’s predicted response.
• The Influence Plot* window displays the observations that might influence
the overall analysis.
• The Variable Selection Plot* window displays the change in value of the variable
selection statistic as effects are added or removed.

20
Copyr ig ht © SAS Institute Inc . All r ig hts re s e r ve d.
Summary Bar
For all models, general model information appears at the top of the canvas.
• model type
• name of response variable
• model evaluation criteria (selected from model toolbar or options)
• number of observations used to build the model
• number of observations not used

21
Copyr ig ht © SAS Institute Inc . All r ig hts re s e r ve d.
evaluation criteria choices for LR:

22
Copyr ig ht © SAS Institute Inc . All r ig hts re s e r ve d.
Fit Summary Window
• shows how significant the effects (predictor) variables are to the response variable
by displaying their relative importance as p-values.

• The vertical significance level


line plotted is -log(.05)

• The smaller the p-value, the


The significance level is
more significant the effect is .05 by default, but it is
moveable. Every variable
above the threshold is
significant
• If a variable is not significant,
we can remove it
> Keep the model simple!

23
Copyr ig ht © SAS Institute Inc . All r ig hts re s e r ve d.
Residual Plot
• The Residual Plot window displays the relationship between the predicted data and the
residual data using a scatter plot or a heat map

• used to access the quality of the


model and to identify outlier observations

• To change the statistic that is


plotted, right-click the plot

• To show or filter selected data,


right-click the plot

24
Copyr ig ht © SAS Institute Inc . All r ig hts re s e r ve d.
Assessment
• The Assessment window displays a plot of the values for the observed response
and the model’s predicted response

25
Copyr ig ht © SAS Institute Inc . All r ig hts re s e r ve d.
Influence Plot
After choosing a variable selection method…
1. Choose plot

2. Plot appears

26
Copyr ig ht © SAS Institute Inc . All r ig hts re s e r ve d.
Influence Plot
• The Influence Plot window displays the
observations that might influence
the overall analysis.
• To change the statistic that is
plotted, right-click the plot.
• To show or filter selected data,
right-click the plot.
• An observation is influential if
removing it from the model
substantially changes the
regression parameter estimates

27
Copyr ig ht © SAS Institute Inc . All r ig hts re s e r ve d.
If an observation is influential and an outlier, then
investigate it…
… and probably remove it

Select one or more observations, right-click and choose Exclude Selected to remove them

28
Copyr ig ht © SAS Institute Inc . All r ig hts re s e r ve d.
Variable Selection Plot
• The Variable Selection Plot window displays the change in value of the variable
selection statistic (e.g. R2) as effects are added or removed

• Available only if a variable


selection method is specified

29
Copyr ig ht © SAS Institute Inc . All r ig hts re s e r ve d.
Maximize: Linear Regression Details Table
• The details table pane provides detailed statistics about the model
via the different tabs, which are model dependent.
• To display the details table, click (Maximize) from the object toolbar
to enter maximize mode.

30
Copyr ig ht © SAS Institute Inc . All r ig hts re s e r ve d.
Linear Regression: Dimensions
The Dimensions tab displays information about the number of effects
and observations included in the model.

31
Copyr ig ht © SAS Institute Inc . All r ig hts re s e r ve d.
Linear Regression: Overall ANOVA
The Overall ANOVA tab
provides information about
how well the model fits
the data.

• Source – source of the variation in the data


• Deg Freedom – degrees of freedom associated with the source (Model degrees of freedom are the number of effects
(variables in the model) -1)

• Sum of Squares – sum of squared associated with each source of variability


• Mean Square – Sum of Squares ÷ Deg Freedom
• F Value – Model Mean Square ÷ Error Mean Square
• Pr > F – p-value associated with the F statistic
• R-Square – the proportion of variation in the response variable explained by the factors in the model

32
Copyr ig ht © SAS Institute Inc . All r ig hts re s e r ve d.
Linear Regression: Fit Statistics
The Fit Statistics tab displays statistics about the estimated model

compare models with same # of predictors

compare models with different # of predictors

33
Copyr ig ht © SAS Institute Inc . All r ig hts re s e r ve d.
Linear Regression: Parameter Estimates
The Parameter Estimates tab displays the parameter estimates (coefficients)
of each model effect and their associated statistics.

• Check that the estimates of the


parameters make sense

• Check the significance (p-value)

34
Copyr ig ht © SAS Institute Inc . All r ig hts re s e r ve d.
Linear Regression: Selection Info
The Selection Info tab displays a summary of the variable selection
methodology.

35
Copyr ig ht © SAS Institute Inc . All r ig hts re s e r ve d.
Linear Regression: Selection Summary
The Selection Summary tab displays a summary of the variable selection
results at each step in the selection process.

36
Copyr ig ht © SAS Institute Inc . All r ig hts re s e r ve d.
Linear Regression: Assessment
The Assessment tab displays the binned assessment results that are used
to generate the Assessment plot.

37
Copyr ig ht © SAS Institute Inc . All r ig hts re s e r ve d.

You might also like