Professional Documents
Culture Documents
Advanced Regression With JMP PRO Handout
Advanced Regression With JMP PRO Handout
with
JMP PRO
German JMP User Meeting
Holzminden – June 22, 2017
Silvio Miccio
Overview
• Introduction
• Some Details on Parameter Estimation and Model Selection
• Generalized Linear Models
• Penalized Regression Models in JMP PRO
• Example:
• Analysis of Time to Event Data (Parametric Survival Models)
• Classification Model with Missing Informative Data
• Linear Mixed Models in JMP PRO
• Example:
• Nested Intercept
• Repeated Measure (Consumer Research)
Introduction
• Multiple Linear Regression (MLR) is one of the most
commonly used methods in Empirical Modeling
• MLR is high efficient as long as all assumptions are met
• Especially observational data often do not meet the
assumptions, resulting in problems with estimation of
coefficients and model selection and with this in model
validity
• Hence, Advanced Regression Methods, like available in JMP
PRO, have to applied to benefit from the ease of
interpretation of regression methods
Linear Regression
= + + ⋯ +
= + + + + +
+
= + + +
&
p × 1 vector of
= &=
⋮ unknown constants ⋮ n × 1 vector of random errors N (0,σ2)
!
Standard Least Square Estimate
!
: = = >
?
L = &’&
L = (y - Xβ)‘(y - Xβ) (AB)‘ = B‘A‘
L = y’y - β‘X‘y - y‘X β + β‘X‘X β β‘X‘y = y‘X β
L = y’y - 2β‘X‘y + β‘X‘X β Quadratic Function
9: 9:
= −2X < y + 2X < Xβ =0
9 9
X’X β=X‘y
β = (X‘X)-1X‘y
X Matrix (coded) – 23 FF Design 3 Center Points
Int X1 X2 X3 X1X2 X1X3 X2X3
• X matrix of a full factorial
1 -1 -1 -1 1 1 1 23 design with three
1 1 -1 -1 -1 -1 1 center points
1 -1 1 -1 -1 1 -1
• 1st column intercept
1 1 1 -1 1 -1 -1
• 2nd – 4th column main
1 -1 -1 1 1 -1 -1
X= effects
1 1 -1 1 -1 1 -1
1 -1 1 1 -1 -1 1
• 5th – 7th column
1 1 1 1 1 1 1
interactions
1 0 0 0 0 0 0
1 0 0 0 0 0 0
1 0 0 0 0 0 0
X’X (Covariance Matrix)
(X’X)-1 Inverted Covariance Matrix
1/11 0 0 0 0 0 0
0 1/8 0 0 0 0 0
0 0 1/8 0 0 0 0
0 0 0 1/8 0 0 0
0 0 0 0 1/8 0 0
0 0 0 0 0 1/8 0
0 0 0 0 0 0 1/8
• The “degrees of freedom“ for estimating the model coefficients show up at the
diagonal
• This is only true if all off diagonal elements are 0 (all factors are independent of
each other)
• When off diagonal elements are not zero then the factors are correlated
Orthogonal
Multicollinearity
Effects of Multi-Co-linearity
• Singular matrix no solution
• High variance in the coefficients
• High variance in the predictions
• Often high R-square, but (all) factors are insignificant
• Small changes in the data may have a big effect on the
coefficients (not robust)
• Best subset selection i.e. via Stepwise Regression may
become almost impossible
Some Details on Linear
Regression
Model Selection
Model Selection
• Overall goal in Empirical Modeling is to identify the model with
the lowest expected prediction error
Expected Prediction Error =
Irreducible error (inherent noise of the system) +
Squared Bias (depends on model selection) +
Variance (depends on model selection)
• This requires to find the model with optimum complexity (e.g.
number of factors, number of sub-models, functional form of
model terms, modeling method)
• Model Selection: “estimating the performance of different
models in order to choose the (approximate) best one”
Bias-Variance Trade Off
• If model complexity is too low
the model is biased (important
features of the system not
captured by the model)
• If model complexity is too high
the model is fit too hard to the
data, which results in a poor
generalization of the prediction
(high prediction variance)
• Training error: variation in the data • The challenge is to identify the
not explained by the model model with the optimum trade-
• Test error: expected prediction error off between bias and variance
based on independent test data
Methods for Model Selection
• When it is not possible to split the data into a training,
validation and test set (this is the case for designed
experiments or for small data sets) model selection
can be done by via measures that try to approximating
the validation/test error, like AIC, AICc and BIC
• Here the estimated value usually is not of direct
interest, it is the relative size that matters
• Alternative methods based on re-sampling (e.g. cross
validation) provide direct estimate of the expected
prediction error (can be used for model assessment)
Generalized Linear Models
Modeling discrete responses and non-normal distributed
errors
Generalized Linear Model (GLM)
• A GLM is a generalization of a linear model for non-normal
responses where errors being a function of the mean
o Binomial - dichotomous data (yes/no, pass/fail)
o Poisson - count data
o Log Normal - data restricted to non-negative values (transformed data
normally distributed)
o and much more…
• Components of GLM
1. Random Component
2. Systematic Component
3. Link Function
Random Component
Identifies the distribution and variance of the response.
Usually derives from the exponential family of distributions,
but not restricted to it.
Link Function
Specifies the link between random and systematic
components. It is an invertible function that defines the
relation between the response and the linear predictor
A =B
= BC A
Common Variance and Link Functions
Comparison Standard Least Squares vs. GLM
Standard Least Square Regression Iteratively Re-Weighted Least Squares
=" +& = BC A
= " < " C " < A =" +&
= " < E" C " < Ez
y is an n × 1 vector of responses
X is an n × p matrix of the factors variables
β is a p × 1 vector of unknown constants
ε is an n × 1 vector of random errors N (0,σ2)
X′ is the transpose of X
X X′ is a p × p matrix of correlations between the factors
η is the linear predictor
g-1 is the inverse link function
W is a diagonal matrix of weights wi
z is a response vector with entries zi
Generalized Linear Regression
Penalized Regression
Generalized Linear Regression (GLR)
GLR can be seen as extension of GLM, in addition being able
to deal with:
• Multicollienearity and to perform
• Model Selection (p > n/2 as well as for p > n)
This is achieved by penalized regression methods, which
attempt to fit better models by shrinking the parameter
estimates
Although shrunken estimates are biased, the resulting
models are usually better i.e. having a lower prediction error
Ridge Regression
• Ridge Regression was developed as remedy for multicollinearity
• It attempts to minimize the penalized residual sum of squares
!
G HIJ
= KLBMN = − − = O O +P= O
? O? O?
~
• O is the maximum likelihood estimate if existing, for normal
distributed data the least square estimate or for non normal
distributed data the ridge solution
• Adaptive models attempt to ensure Oracle Properties
Identification of true active factors
Correct estimation of parameter estimates
Tuning Parameters (from JMP Help)
• LASSO and Ridge are determined by one tuning parameter (L1 or L2)
• Elastic Nets are determined by two tuning parameters (L1 and L2),
where the Elastic Net Alpha is the weight between the penalties
• The higher the tuning parameter, the higher the penalty (adding a
zero provides the Maximum Likelihood solution (MLE); no penalty)
o When tuning parameter is too small the model is likely to overfit
o When tuning parameter is too big there is bias in the model
• To obtain a solution the tuning parameter is increased over a fine grid
• Optimum solution is where best fit over the entire tuning parameter
grid is achieved
Tuning Parameters – Elastic Net Alpha
from JMP Help)
• Determines the mix
between the L1 and L2
penalty
• Default value is 0.9
meaning (coefficient on L1
penalty is set to 0.9,
coefficient on L2 penalty is
to 0.1
• If Elastic Net Alpha is not
set, the algorithm
computes the Lasso, Elastic
Net, and Ridge fits, in that
order and keeps the “best”
solution
Model Tuning
• Try different Estimation Methods,
settings for Advanced Controls and
Validation Methods to find best model
• All models are displayed in the model
report and can be individually saved as
script and prediction formula
• Note: k-fold or random holdback
validation is not recommended for
DOE data
Data Set 1 - Parametric Survival Analysis
• 4 Factors (E = Equipment Set-Up, P = Process Setting, F1 & F2 Product
Formulation) have been investigated in a designed experiment
• Column Censor includes the censoring variable (0 = no censoring, 1 =
censoring)
• Response is the time the sample resists a force applied to it
• For feasibility reasons the measurement is stopped after a pre-defined
maximum test time. This leads to so called “right censoring”, because not
all samples fail within the maximum test time.
• Objective is to create a model for predicting the survival time of the
sample
• The data file “GLR Survival” contains the scripts for the parametric
survival model for JMP (does not allow for automated model selection)
and JMP PRO
Data 2 - Credit Risk Scoring
• The data set is called Equity.jmp and taken from the JMP Sample Data Library located in the JMP Help
menu
• It is based on historical data gathered to determine whether a customer is a good or bad credit risk for a
home equity loan (watch out: missing data, they are set to Informative Missing in JMP PRO, because
they contain important information)
• Predictors:
• LOAN = how much was the loan
• MORTDUE = how much they need to pay on their mortgage
• VALUE = assessed valuation
• REASON = reason for loan
• JOB = broad job category
• YOJ = years on the job
• DEROG = number of derogatory reports
• DELINQ = number of delinquent trade lines
• CLAGE = age of oldest trade line
• NINQ = number of recent credit enquiries
• CLNO = number to trade lines
• DEBTINC = dept to income ratio
• Response is Credit Risk, predict good and bad credit risks
• Data file “Credit Risk” contains scripts for JMP PRO (GLR for model selection, informative missing,
validation column) and JMP (logistic regression, it is possible to do stepwise and manual informative
missing coding – see JMP home page for details)
Linear Mixed Models
G-Side and R-Side Random Effects
Fixed Factors
• Usually the factors e.g. in a design of experiment
are varied within fixed factor levels
• With fixed factors we can make statistical inferences
within the investigated model space, based on the
factor effects
• When the factor levels are randomly chosen from a
larger population of factor levels, the factor is said
to be a random factor
Random Factors
• Random factors allow to draw conclusions about the entire
population of possible factor levels
• The population of possible factor levels is considered to be
infinite
• Random effects models are of special interest for identifying
sources of variation, because they allow to identify variance
components
• Random Factors
• Random Effects: Machines, Operators, Panelists
• Random Effects also have to be considered for split plot designs,
correlated responses, spatial data, repeated measurements
• y is an vector of responses
Random Effects Model • X is the regression matrix of the fixed effects
• β is a vector of unknown fixed effect
parameters
• Z is the regression matrix of the of the
= " + bc + &
random effects
• γ is a vector of unknown random effects
parameters
c ~ 0, d , & ~ 0, e • ε is a vector of random errors (not required
to be independent or homogenous)
• G is variance-covariance matrix for random
f = " ; h = bdb < + e effects
• R is variance-covariance matrix for model
errors
• G-side effects are specified by the Z matrix
(random effects)
• R-side effects, are specified by the
covariance structure (repeated structure)
Repeated Covariance Structure Requirements
(taken from JMP Help)