AB1202 Statistics and Analysis: Non-Linear Regression

AB1202
Statistics and Analysis

Lecture 11
Non-Linear Regression
Chin Chee Kai

cheekai@ntu.edu.sg
Nanyang Business School
Nanyang Technological University
NBS 2016S1 AB1202 CCK-STAT-018
2
Non-Linear Regression
• Variable Transformation
• Exponential Models
• Power Index Models
• Logistic Regression
3
Non-Linear Regression – Why?

• Many measurements are non-linear, especially
Model Non-Linear Form
in multiple regression where many variables
Type
influence the outcome.
Expon-
• Perhaps underlying theory or experience tells us ential 𝑦 = 𝑏0 𝑏1 𝑥1 𝑏2 𝑥2
certain non-linear relationship exists. Models
• Perhaps trying a non-linear term suddenly Power

explains most of the outcomes, even though we Index 𝑦 = 𝑏0 𝑥1 𝑏1 𝑥2 𝑏2
don’t know why. Models
• Perhaps having a tighter fit to data for Logistic 𝑒 𝑏0 +𝑏1𝑥1+𝑏2𝑥2

prediction is important, where we don’t care Models 𝑝=
1 + 𝑒 𝑏0+𝑏1𝑥1+𝑏2𝑥2
whether the model is linear or non-linear.
Polyno-
• Perhaps we have completely no idea about how mial 𝑦 = 𝑏0 + 𝑏1 𝑥 + 𝑏2 𝑥 2
the outcome should be modeled, and using a Models
non-linear model is more general to begin with.
4
Variable Transformation
• What we want is to re-use (linear) multiple regression
techniques.
• So we transform the variables so the data is more
aligned as a line, plane or hyperplane in higher
dimensions.
• In theory we can use any complicated functions (even
those that perfectly align data to a straight line). But in
practice, we only stick to standard functions in as
simple and understandable a way as possible.
1
▫ 𝑥 → 𝑥 → log⁡(𝑥), 𝑥 → ln⁡(𝑥), 𝑥 → 𝑥, 𝑥 → 𝑥 2 , 𝑥 →
,
𝑥
cos⁡(𝑥), 𝑥 → sin⁡(𝑥), 𝑥 → 𝑒 −𝑥
5
Exponential Models
• In exponential models, the relationship between the
outcome variable 𝑦 and explanatory variables 𝑥1 and 𝑥2 is:
𝑦 = 𝑏0 𝑏1 𝑥1 𝑏2 𝑥2
• Appling ln⁡() function, we get the transformed linear
model:
ln 𝑦 = ln 𝑏0 + 𝑥1 ln 𝑏1 + 𝑥2 ln 𝑏2
• This transformed model is linear in 𝑥1 and 𝑥2 for the

transformed outcome ln(y). So we can apply multiple
regression on this transformed model.
• Notice that the coefficients we get for 𝑥1 and 𝑥2 are not 𝑏1 ,
𝑏2 but ln 𝑏1 , ln 𝑏2 - we have to take exponentials of the
coefficients to get 𝑏1 , 𝑏2 .
6
Exponential Regression – Example

y x ln(y)
8.5 42 2.140066 Consider the data on the left. Find the simple linear
6.5 37 1.871802 regression model. Find the exponential model. Which is
7.1 23 1.960095
9.5 18 2.251292 better? d<-read.delim(
7.6 35 2.028148 textConnection(datatext),
3.4 56 1.223775 header=TRUE, sep="",
6.6 31 1.887070 strip.white=TRUE)
model_lin = lm(d$y ~ d$x)
Call: lm(formula = d$y ~ d$x) summary(model_lin)
Coefficients: lny = log(d$y)
Estimate Std. Error t value Pr(>|t|) d$lny = lny
(Intercept) 10.99757 1.66547 6.603 0.0012 ** model = lm(d$lny ~ d$x)
d$x -0.11481 0.04567 -2.514 0.0536 . summary(model)
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Linear Regression model is:
Residual standard error: 1.402 on 5 degrees of freedom y = 10.9976 – 0.1148 x
Multiple R-squared: 0.5582, Adjusted R-squared: 0.4699 Model significance: 0.0536
F-statistic: 6.318 on 1 and 5 DF, p-value: 0.05359  Model is NOT significant at
Call: lm(formula = d$lny ~ d$x) 5%.
Coefficients:
Estimate Std. Error t value Pr(>|t|)
Exponential model is:
(Intercept) 2.61343 0.27493 9.506 0.000218 ***
d$x -0.02038 0.00754 -2.703 0.042632 * ln(y) = 2.6134 – 0.0204 x. So,
--- 𝒚 = 𝟏𝟑. 𝟔𝟒𝟓𝟒 × 𝟎. 𝟗𝟕𝟗𝟖𝒙 .
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Model significance: 0.04263
Residual standard error: 0.2314 on 5 degrees of freedom  Model is significant at 5%.
Multiple R-squared: 0.5937, Adjusted R-squared: 0.5124 𝑅 2 and adj-𝑅 2 are better than
F-statistic: 7.306 on 1 and 5 DF, p-value: 0.04263
linear model.
7
Power Index Models

• In power index models, the relationship between the
outcome variable 𝑦 and explanatory variables 𝑥1 and
𝑥2 is:
𝑦 = 𝑏0 𝑥1 𝑏1 𝑥2 𝑏2
• Again, appling ln⁡() function, we get the transformed
linear model:
ln 𝑦 = ln 𝑏0 + 𝑏1 ln 𝑥1 + 𝑏2 ln 𝑥2
• This transformed model is linear in ln(𝑥1 ) and ln(𝑥2 )

for the transformed outcome ln(y). So we can apply
multiple regression on this transformed model.
• This time, differently from exponential models, the
coefficients we get are directly 𝑏1 and 𝑏2 we need in
the non-linear model.
8
Power Index Regression – Example

y x ln(y) ln(x)
8.5 42 2.140066 3.737670 Consider again the data on the left. Find
6.5 37 1.871802 3.610918
7.1 23 1.960095 3.135494 the power index model of the data.
9.5 18 2.251292 2.890372
7.6 35 2.028148 3.555348
3.4 56 1.223775 4.025352
6.6 31 1.887070 3.433987
d<-read.delim( textConnection(datatext),
header=TRUE, sep="",
Power Index model is:
strip.white=TRUE) ln(y) = 4.0625 – 0.6181
lny = log(d$y)
lnx = log(d$x)
ln(x)
d$lny = lny
d$lnx = lnx
model = lm(d$lny ~ d$lnx)
So, 𝒚 = 𝟓𝟖. 𝟏𝟏𝟗𝟒 × 𝒙−𝟎.𝟔𝟏𝟖𝟏 .
summary(model)
Model significance: 0.07736
Call: lm(formula = d$lny ~ d$lnx)  Model is NOT significant
Coefficients:
Estimate Std. Error t value Pr(>|t|) at 5%, but significant at
(Intercept) 4.0625 0.9760 4.162 0.0088 ** 10%.
d$lnx -0.6181 0.2787 -2.218 0.0774 .
--- 𝑅2 and adj-𝑅2 are very low
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 and not acceptable.
Residual standard error: 0.2577 on 5 degrees of freedom
Multiple R-squared: 0.4958, Adjusted R-squared: 0.395
F-statistic: 4.918 on 1 and 5 DF, p-value: 0.07736
9
Logistic Regression
• Logistic regression is used in finding a model to predict the
likely binary outcomes when given other known data.
• Eg, campaign voting usually end up having two outcomes;
Brexit: Leave-vs-Remain, US Presidential Election: Clinton-
vs-Trump, NTU Student Union Presidential Election: Keller
vs Wayne
• If we know demographics of sample voters and their vote,
can we find a model that describes the voting outcome of any
given individual?
• Other applications:
▫ Predicting rain/no-rain knowing past wind speeds, cloud
heights, sunny days, etc
▫ Predicting business or investment success/failure, knowing
revenue, profit, asset, liability, management experience, etc
▫ Predicting surgery success/failure, knowing patient vital signs,
medical treatment done, etc
10
Logistic Models – Challenges

• The outcome variable 𝑦 is coded as 1 for success and 0 for
failure. But that’s the problem!
Multiple regression like 𝑦 = 𝑏0 + 𝑏1 𝑥1 + 𝑏2 𝑥2 does not stick to
strictly 0 and 1.
• We try using probability of 𝑦 being 1 or 0, 𝑝 = 𝑃(𝑦 = 1).

▫ Still has problem – linear model can give negative or larger-than-
one probability.
• We try using odds of 𝑦 being 1. Odds are success-vs-failure
ratio.
▫ Eg P 𝑦 = 1 = 0.2 = 𝑝 means odds of success is 0.2: (1-0.2)1:4
or ¼:1.
𝑝
▫ In general, odds of success is 𝑝: (1 − 𝑝) or :1
1−𝑝
▫ Given odds of success is 𝑢: 𝑣, we recover probability of success
𝑢 𝑣
𝑝= (and failure probability 1 − 𝑝 = )
𝑢+𝑣 𝑢+𝑣
▫ Still has problem – the odds of success sky-rockets too rapidly
when 𝑝 is close to 1. Taking reciprocal just shifts the problem.
11
The Logit Way

𝑝
• We take the log of it, aptly called the logit, ln , and regress it
1−𝑝
𝑝
against explanatory variables: ln = 𝑏0 + 𝑏1 𝑥1 + 𝑏2 𝑥2
1−𝑝
• In the 1-variable case, this looks like:
𝑝 𝑝
⁡ln = 𝑏0 + 𝑏1 𝑥1 ⁡⁡⁡⁡⁡⁡ 1 ⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡ ⟺ ⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡ = 𝑒 𝑏0 +𝑏1 𝑥1 ⁡⁡⁡⁡[2]
1−𝑝 1−𝑝
Note that the model
𝑒 𝑏0 +𝑏1 𝑥1 predicts the logit of
⇒𝑝= ⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡[3]
1 + 𝑒 𝑏0 +𝑏1𝑥1 success when given 𝑥1 ,
not the probability.
𝑝
• In logistic regression, we say “regress the logit ⁡ln against the
1−𝑝
explanatory variable 𝑥1 (and 𝑥2 , …, 𝑥𝑘 )”.
• Use equation [1] to get logit of
success when given 𝑥1 .
• Use equation [2] to get odds of
success when given 𝑥1 .
• Use equation [3] to get probability
of success when given 𝑥1 .
12
Logistic Regression – Example

x y A bank has records of mortgage loans (𝑥 in $millions) and default status (𝑦,
1.63 0
1.39 0 1 means defaulted). A person is applying for $1 million loan. How likely is
0.41 0 this person going to default based purely on loan quantum? If > 0.4
0.69 1
1.03 1 probability of default is not acceptable to bank, should bank lend?
0.02 1
1.71 1 d<-read.delim(textConnection(datatext), Logistic Regression model is:
1.61 1 header=TRUE,
1.48 0 sep="",
Logit = -2.1948 + 1.7182 x
0.87 1 strip.white=TRUE) Model significance:
0.49 0 log_model = glm(d$y ~ d$x, family="binomial") 𝑃 𝜒 2 > 41.054 − 35.186 =
0.11 0 summary(log_model)
1.54 1
0.0154 (d.f. = 1)  Model is
0.50 0 Deviance Residuals: significant at 5%.
1.61 1 Min 1Q Median 3Q Max
1.56 1
0.24 0
-1.4431 -0.9863 -0.5511 0.9462 2.1305 Loan is $1 mil, x=1, and logit =
0.28 0 Coefficients: -2.1948 + 1.7182 = -0.4766.
1.27 0 Estimate Std. Error z value Pr(>|z|) Odds = 𝑒 −0.4766 = 0.6209
1.78 1 (Intercept) -2.1948 1.0144 -2.164 0.0305 * 0.6209
0.21 0 P(default | x=1) = =
d$x 1.7182 0.7876 2.181 0.0291 * 1+0.6209
0.57 0 --- 0.3831
1.39 1 Signif. codes:
1.94 1 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
1.07 0 For this person’s application,
1.45 0 Null deviance: 41.054 on 29 degrees of freedom the probability of default is <
1.85 1 Residual deviance: 35.186 on 28 degrees of freedom
0.80 0 AIC: 39.186
0.4. So the bank can lend to
1.55 0 the person based on quantum
1.08 0 Number of Fisher Scoring iterations: 4 criteria.
13
Residual & Null Deviances

Null deviance: 41.054 on 29 degrees of freedom
Residual deviance: 35.186 on 28 degrees of freedom
AIC: 39.186
• R’s output contains the Residual Deviance value after

applying the model.
▫ Can think of Residual Deviance as similar role as SSE,
though the value and calculations are done differently.
▫ Smaller Residual Deviance means more data fluctuations is
explained by model, and so the better is the model.
• Null Deviance is the residual deviance after applying the
null model, which is just 𝑦𝑛𝑢𝑙𝑙 = 𝑏0,𝑛𝑢𝑙𝑙 which is
essentially a constant.
▫ Because we don’t even attempt to explain the fluctuations in
null model, Null Deviance is larger than Residual Deviance
which is the deviance left after our model explained the
fluctuations.
14
Testing Logistic Model

Null deviance: 41.054 on 29 degrees of freedom
Residual deviance: 35.186 on 28 degrees of freedom
AIC: 39.186
• The difference of Null and Residual Deviances is:

𝐷𝑑 = 𝑁𝑢𝑙𝑙⁡𝐷𝑒𝑣𝑖𝑎𝑛𝑐𝑒⁡ − 𝑅𝑒𝑠𝑖𝑑𝑢𝑎𝑙⁡𝐷𝑒𝑣𝑖𝑎𝑛𝑐𝑒
• The corresponding difference in degree of freedom is:
𝑑𝑓𝑑 = 𝑑𝑓𝑁𝑢𝑙𝑙 − 𝑑𝑓𝑅𝑒𝑠𝑖𝑑𝑢𝑎𝑙
• 𝐷𝑑 follows Chi-Squared distribution with degree of freedom 𝑑𝑓𝑑 , ie
𝐷𝑑 ~𝜒 2 (𝑑𝑓𝑑 )
To test for significance of test statistic 𝐷𝑑 , we 𝐻0 : Logistic⁡Model⁡NOT⁡significant
𝐻1 : Logistic⁡Model⁡significant
find: 𝑝 − 𝑣𝑎𝑙𝑢𝑒 = 𝑃 𝜒 2 > 𝐷𝑑 ⁡⁡⁡⁡⁡at⁡df=𝑑𝑓𝑑
Example: 𝐷𝑑 = 41.054 − 35.186 = 5.868

𝑑𝑓𝑑 = 29 − 28 = 1
𝑝 − 𝑣𝑎𝑙𝑢𝑒 = 𝑃 𝜒 2 > 5.868 = 0.01542 𝛼
Since p-value = 0.01542 < 𝛼 = 0.05, we REJECT 0 2
𝜒𝑐,𝛼 𝐷𝑑 𝜒2
𝐻0 and conclude that the logistic regression 2
model is significant at 𝛼 = 0.05 𝜒𝑐,0.05 = 3.8415

AB1202 Statistics and Analysis: Non-Linear Regression

Uploaded by

Copyright:

Available Formats

You might also like

AB1202 Statistics and Analysis: Non-Linear Regression

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

AB1202 Statistics and Analysis: Non-Linear Regression

Uploaded by

Copyright:

Available Formats

AB1202

Statistics and Analysis

Chin Chee Kai

Non-Linear Regression – Why?

• Perhaps trying a non-linear term suddenly Power

• Perhaps having a tighter fit to data for Logistic 𝑒 𝑏0 +𝑏1𝑥1+𝑏2𝑥2

• This transformed model is linear in 𝑥1 and 𝑥2 for the

Exponential Regression – Example

Power Index Models

• This transformed model is linear in ln(𝑥1 ) and ln(𝑥2 )

Power Index Regression – Example

Logistic Models – Challenges

• We try using probability of 𝑦 being 1 or 0, 𝑝 = 𝑃(𝑦 = 1).

The Logit Way

Logistic Regression – Example

Residual & Null Deviances

• R’s output contains the Residual Deviance value after

Testing Logistic Model

• The difference of Null and Residual Deviances is:

Example: 𝐷𝑑 = 41.054 − 35.186 = 5.868

You might also like