Applying Regression Analysis: Jean-Philippe Gauvin Université de Montréal

Applying Regression Analysis
Jean-Philippe Gauvin
Université de Montréal
January 7 2016
Goals for Today
What is regression? How do we do it?
First hour: OLS Second hour: MLE/GLMs

I Bivariate regression
I Logit
I Multiple regression
I Interactions
I Ordered logit
I Regression diagnostics I Multinomial logit
All material is on my website: www.jpgauvin.com

Introducing Regression
Part I – The World of OLS

Bivariate Regression
Multiple Regression
Predicted Values
Categorical Predictors
Interactions
Regression Diagnostics
Part II – The World of MLE

Introducing MLE
Logit Models
Ordered Logit Models
Multinomial Logit Models
What is a regression?
In statistical modeling, regression analysis is a statistical process

for estimating the relationships among variables. It includes many
techniques for modeling and analyzing several variables, when the
focus is on the relationship between a dependent variable and one
or more independent variables (or ’predictors’).
What is a regression?
In statistical modeling, regression analysis is a statistical process

for estimating the relationships among variables. It includes many
techniques for modeling and analyzing several variables, when the
focus is on the relationship between a dependent variable and one
or more independent variables (or ’predictors’).
Ordinary least square (OLS) regression is a family of regression

models that fit a line by minimizing the residual sum of squares. It
can only be used when dealing with a continuous dependent
variable.
The scatterpoint
Figure: Effect of Age on Social Ideology
.5
Gauche-Droite Sociale
-.5
-1
20 40 60 80 100
Age
The Basic Linear Model (OLS)
.5
-.5
-1
20 40 60 80 100
Age
Why a Line?
I It’s accurate – It gives an idea of the general pattern of the

data
I It’s easy to estimate – It’s a low computational cost, as a
single formula can be used.
I It’s easy to describe – It can summarize the relationship
between variables with only one number: the slope.
In other words, it’s a very good summary of "what happens".

The Components of OLS
The model equation can be expressed as such:
Yi = α + βXi + i
Where:
I y is the dependent variable
I α is the intercept
I β is the slope
I x is the predictor
I is the error term
The Components of OLS
The OLS equation then can be re-expressed like this:
Ŷi = α + βXi
Where is the error term? The OLS aims to keep the residual as
small as possible. In other words,
Yi = Ŷi + i
i = Yi − Ŷi
The OLS is the line that minimizes residuals.

What the Software Does
If the OLS is the line that minimizes the residual sum of squares
(RSS)... how does it do that? How do we get a slope (β) and an
intercept (α)?
+-------------------------+
X | X Y Xbar Ybar |
RSS = 2i |-------------------------|
X 1. | 2 10 4.375 12.625 |
= (Yi − Ŷi )2
2. | 5 15 4.375 12.625 |
3. | 6 16 4.375 12.625 |
4. | 4 12 4.375 12.625 |
(Xi − X̄ )(Yi − Ȳ )
P
β̂ = 5. | 8 14 4.375 12.625 |
(Xi − X̄ )2
P
|-------------------------|
6. | 1 12 4.375 12.625 |
7. | 4 12 4.375 12.625 |
α̂ = Ȳ − β̂1 X̄ 8. | 5 10 4.375 12.625 |
+-------------------------+
Estimating the Regression Line
. egen Xbar = mean(X)

. egen Ybar = mean(Y)
. egen sumXY = sum((X-Xbar)*(Y-Ybar))
. egen sumX2 = sum((X-Xbar)^2)
.
. disp "Beta is " sumXY/sumX2
Beta is .56457565
. disp "Alpha is " Ybar - (sumXY/sumX2)*Xbar

Alpha is 10.154982
Estimating the Regression Line
. egen Xbar = mean(X)

. egen Ybar = mean(Y)
. egen sumXY = sum((X-Xbar)*(Y-Ybar))
. egen sumX2 = sum((X-Xbar)^2)
.
. disp "Beta is " sumXY/sumX2
Beta is .56457565
. disp "Alpha is " Ybar - (sumXY/sumX2)*Xbar

Alpha is 10.154982
Or let Stata do it!

. reg Y X
--------------------------------------------------------------------------
Y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+----------------------------------------------------------------
X | .5645756 .3369605 1.68 0.145 -.259937 1.389088
_cons | 10.15498 1.629127 6.23 0.001 6.168652 14.14131
--------------------------------------------------------------------------
α is 10. β is .56, which means Ŷi = 10.15 + .56(Xi )

The Basic Linear Model (OLS)
.5
beta=.004
0
alpha = -.45
-.5
-1
10 30 50 70 90
Age
The Assumptions of OLS
Sorry: The goal is not to get stars.
Instead, we want the Best Linear Unbiased estimators (BLUE),

stated by five assumptions:
1. The relationship is linear

2. The errors are normally distributed.
3. The errors have constant variance (no heteroskedasticy, no
autocorrelation)
4. X is fixed on repeated sampling (no selection bias)
5. No exact linear relationships between independent variables
and more observations than independent variables
Other Consideration: Model fit
R2 is the most used measure of fit. It is expressed as the ratio of

the explained variance to the total variance.
In other words R 2 = sŶ2 /sY2 .
But what’s a good fit in social sciences? What about other

measures (AIC, BIC, etc.)?
Example 1: The Bivariate Regression
Stata: scatter harper leftright
R: plot(harper ~ leftright, main="Example 1",

xlab="Ideology ", ylab="Feelings for Harper ")
100
Feelings about Stephen Harper
80
60
40
20
0
0 2 4 6 8 10
Left/right: Where would you place yourself on the scale below?
Example 1: The Bivariate Regression
Stata:
twoway (scatter harper leftright, jitter(20)) (lfit harper leftright)
R: plot(jitter(harper,10) ~ jitter(leftright, 4), main="Example 1",

xlab="Ideology ", ylab="Feelings for Harper ")
abline(lm(harper~leftright), col="red")
100
80
60
40
20
0
0 2 4 6 8 10
Feelings about Stephen Harper

Fitted values
Example 1: Stata
. reg harper leftright
Source | SS df MS Number of obs = 1467

-------------+------------------------------ F( 1, 1465) = 410.77
Model | 322914.797 1 322914.797 Prob > F = 0.0000
Residual | 1151659.74 1465 786.115861 R-squared = 0.2190
-------------+------------------------------ Adj R-squared = 0.2185
Total | 1474574.53 1466 1005.84893 Root MSE = 28.038
------------------------------------------------------------------------------
harper | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
leftright | 7.452856 .3677241 20.27 0.000 6.731535 8.174178
_cons | 5.717891 1.995533 2.87 0.004 1.803484 9.632297
------------------------------------------------------------------------------
Example 1: Stata
. reg harper leftright

-------------+------------------------------ F( 1, 1465) = 410.77
Model | 322914.797 1 322914.797 Prob > F = 0.0000
-------------+------------------------------ Adj R-squared = 0.2185
Total | 1474574.53 1466 1005.84893 Root MSE = 28.038
------------------------------------------------------------------------------
-------------+----------------------------------------------------------------
leftright | 7.452856 .3677241 20.27 0.000 6.731535 8.174178
_cons | 5.717891 1.995533 2.87 0.004 1.803484 9.632297
------------------------------------------------------------------------------
FH
d = 5.71 + 7.45(LR)
Example 1: R
> m1 <- (lm(harper~leftright, data = data))

> summary(m1)
Call:
lm(formula = harper ~ leftright)
Residuals:
Min 1Q Median 3Q Max
-79.246 -22.982 2.112 21.924 79.376
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.7179 1.9955 2.865 0.00423 **
leftright 7.4529 0.3677 20.268 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 28.04 on 1465 degrees of freedom

(2841 observations deleted due to missingness)
Multiple R-squared: 0.219,Adjusted R-squared: 0.2185
F-statistic: 410.8 on 1 and 1465 DF, p-value: < 2.2e-16
Multiple regression
In a bivariate model, the OLS fits a line in a 2D space. But in a

multiple regression, it fits a line for each covariate in
N-dimensional space. You could theoretically draw a plane in a 3D
space, but what about 4, 5 or 20 dimensions?
However, we can extend the bivariate logic to multiple predictors,

where the coefficients lead to multiple slopes piercing through
N-dimension.
The Multiple Regression Equation
When extending the bivariate model equation to multiple

predictors, we get this:
Yi = β0 + β1 X1 + β2 X2 + ... + βn Xn + i
The Multiple Regression Equation
When extending the bivariate model equation to multiple

predictors, we get this:
Yi = β0 + β1 X1 + β2 X2 + ... + βn Xn + i
Ex2. Does age also have an effect on feelings toward Harper?

The OLS equation should read
FH = β0 + β1 LR + β2 Age +
Example 2. The Multiple Regression
100 100
80 80
60 60
40 40 B= 0.12
B= 7.45
20 20
0 0
0 2 4 6 8 10 20 40 60 80 100
ideology age
Example 2. Stata
. reg harper leftright age

----------+------------------------------ F( 2, 1437) = 208.47
Model | 325425.809 2 162712.904 Prob > F = 0.0000
----------+------------------------------ Adj R-squared = 0.2238
Total | 1447005.99 1439 1005.56358 Root MSE = 27.937
---------------------------------------------------------------------------
----------+----------------------------------------------------------------
leftright | 7.488689 .3689223 20.30 0.000 6.765005 8.212373
age | .0417154 .0500968 0.83 0.405 -.0565554 .1399862
_cons | 2.990736 3.432887 0.87 0.384 -3.74327 9.724742
---------------------------------------------------------------------------
Example 2. R
> m2 <- lm(harper ~ leftright + age, data=data)
> summary(m2)
Call:
lm(formula = harper ~ leftright + age, data = data)
Residuals:
-79.339 -21.988 2.211 21.543 79.612
Coefficients:
(Intercept) 2.99074 3.43289 0.871 0.384
leftright 7.48869 0.36892 20.299 <2e-16 ***
age 0.04172 0.05010 0.833 0.405
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Predicting Ŷ ?
What if we want to predict a value on this new slope? Can we just

plug in the equation? What if we want to predict Ŷ when
leftright is a 1 and our age is 50? Is it this easy?
Predicting Ŷ ?

FH
d = 2.99 + 7.49(LR) + .04(Age)
= 2.99 + 7.49 ∗ 1 + .04 ∗ 50...
= 12.48
Predicting Ŷ ?

FH
d = 2.99 + 7.49(LR) + .04(Age)
= 2.99 + 7.49 ∗ 1 + .04 ∗ 50...
= 12.48
In Stata, we could easily check it after running the regression.

. quietly: reg harper leftright age
. disp _b[_cons] + _b[leftright]*1 + _b[age]*50
12.565196
Other Ways of Predicting Ŷ in Stata
*Predicting yhat for leftright=1 and age=50
. predict yhat
(option xb assumed; fitted values)
(2849 missing values generated)
. sum yhat if leftright==1 & age==50 /*lucky we have one obs = 12.56*/
Variable | Obs Mean Std. Dev. Min Max

-------------+--------------------------------------------------------
yhat | 1 12.5652 . 12.5652 12.5652
*Or simply use margins command

. margins, at(leftright=1 age=50)
at : leftright = 1
age = 50
------------------------------------------------------------------------------
| Delta-method
| Margin Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_cons | 12.5652 1.691971 7.43 0.000 9.246198 15.88419
------------------------------------------------------------------------------
Predicting Ŷ in R
In R, you can easily specify your scenario values through a new

dataframe.
> m3 <- (lm(harper~leftright + age , data=data))
> newdat <- data.frame(
+ leftright = 1,
+ age = 50)
> predict(m3, newdat, interval="confidence")
fit lwr upr
1 12.5652 9.246198 15.88419
>
Note that for a predicted binary predictor, we could store in

newdat something like male="Male" depending on the label (if
dummy is factor). If you wanted multiple values, you could predict
using a vector like leftright=c(1,2,3,4,5)
Using Categorical Predictors
When adding variables, it is possible to add categorical variables.

This is done by adding binary variables.
One common example is gender:

. reg harper leftright male
---------------------------------------------------------------------------
----------+----------------------------------------------------------------
leftright | 7.463482 .3686871 20.24 0.000 6.740271 8.186694
male | -.6208031 1.470247 -0.42 0.673 -3.504819 2.263213
_cons | 5.957086 2.07492 2.87 0.004 1.886952 10.02722
---------------------------------------------------------------------------
But what does the coefficient mean? Being a male, rather than a
female, decreases feelings for Harper by 0.62 (not statistically
significant).
Adding a Binary Variable
Y = a + b1*LR + b2*Gender + e
30
25
Linear Prediction
20
15
10
b2{ Female intercept =b0
Male intercept = b0 + b2
5
0 .5 1 1.5 2 2.5 3
Ex 4. Categorical Variables in Stata
In Stata, you can use tab varname, gen(newvar) to
automatically create dummies, or you can simply use the prefix i.
or even specify the baseline with b1. b2. b3. etc.
. reg harper i.votechoice /*liberal is baseline*/
------------------------------------------------------------------------------
-------------+----------------------------------------------------------------
votechoice |
Tories | 45.08578 1.168005 38.60 0.000 42.79546 47.3761
NDP | .1385281 1.388459 0.10 0.921 -2.584075 2.861131
BQ | -3.868295 1.912745 -2.02 0.043 -7.61896 -.1176298
Greens | -1.171874 2.515343 -0.47 0.641 -6.104162 3.760414
_cons | 29.37576 .9241736 31.79 0.000 27.56356 31.18795
------------------------------------------------------------------------------
. reg harper b2.votechoice /*Conservative as baseline*/

------------------------------------------------------------------------------
-------------+----------------------------------------------------------------
votechoice |
Libs | -45.08578 1.168005 -38.60 0.000 -47.3761 -42.79546
NDP | -44.94725 1.258515 -35.71 0.000 -47.41505 -42.47945
BQ | -48.95408 1.820614 -26.89 0.000 -52.52408 -45.38407
Greens | -46.25765 2.446016 -18.91 0.000 -51.054 -41.46131
_cons | 74.46154 .7142404 104.25 0.000 73.061 75.86208
------------------------------------------------------------------------------
Ex 4. Categorical Variables in R
As with Stata, R automatically recognizes categorical variables.
> m4 <- lm(harper~votechoice, data=data)
> summary(m4)
Call:
lm(formula = harper ~ votechoice, data = data)
Residuals:
-74.462 -19.462 0.538 15.538 70.624
Coefficients:
(Intercept) 29.3758 0.9242 31.786 <2e-16 ***
votechoiceTories 45.0858 1.1680 38.601 <2e-16 ***
votechoiceNDP 0.1385 1.3885 0.100 0.9205
votechoiceBQ -3.8683 1.9127 -2.022 0.0432 *
votechoiceGreens -1.1719 2.5153 -0.466 0.6413
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Ex 4. Categorical Variables in R
In R, you need to change the order of the labels to change the

baseline.
levels(data$votechoice) #current order
data$votechoice2 = factor(data$votechoice, c("Tories","Libs","NDP",
+ "BQ","Greens"))
levels(data$votechoice2)
m4.2 <- lm(harper~votechoice2, data=data)

summary(m4.2)
You can also test if a variable is a factor by using

is.factor(male). If it isn’t, you could specify it with
male.f <- factor(male, labels = c("male", "female"))
Interactions
We might have reasons to believe that the effect of a variable varies
on the range of another variable. This is called an interaction.
Y = a + b1*Soc + b2*Gender + e Y = a + b1*Soc + b2*Gender + b3Soc*Gender + e

70 70
60 60
Linear Prediction
Linear Prediction
50 50
40 40
male male
female female
writing score writing score
30 30
20 25 30 35 40 45 50 55 60 65 70 20 25 30 35 40 45 50 55 60 65 70
social studies score social studies score
Example 5. Stata
In Stata, the # sign identifies the interaction. Keep in mind that

by default, Stata thinks of factorial interactions. Continuous terms
need the prefix c. in order to work properly.
regress write i.female##c.socst
-------------------------------------------------------------
write | Coef. Std. Err. t P>|t|
----------------+--------------------------------------------
female | 15.00001 5.09795 2.94 0.004
socst | .6247968 .0670709 9.32 0.000
|
female#c.socst | -.2047288 .0953726 -2.15 0.033
|
_cons | 17.7619 3.554993 5.00 0.000
-------------------------------------------------------------
Example 5. R
In R, variables are already defined as factors or not. Interactions
are thus handled automatically.
> m5 <- lm(write~ female*socst, data=data2)
> summary(m5)
Call:
lm(formula = write ~ female * socst, data = data2)
Residuals:
-18.6265 -4.3108 -0.0645 5.0429 16.4974
Coefficients:
(Intercept) 17.76190 3.55499 4.996 1.29e-06 ***
femalefemale 15.00001 5.09795 2.942 0.00365 **
socst 0.62480 0.06707 9.315 < 2e-16 ***
femalefemale:socst -0.20473 0.09537 -2.147 0.03305 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Remember the main assumptions:
1. The relationship is linear

2. The expected value of the error term is zero.
3. The errors have identical, normal distributions (no
heteroskedasticy, no autocorrelation)
To get unbiased and efficient estimators, we must make sure we

don’t violate these assumptions. We’ll focus on distribution of
error and outliers.
Normally distributed and constant variance
Example 6. Heteroskedasticity in Stata
You can test heteroskedasticity with estat hettest or
imtest, white, or simply look at the residuals.
100
50
Residuals
-50
-100
0 20 40 60 80
Fitted values
reg harper leftright age male

estat hettest
imtest, white
rvfplot, yline(0, lcolor(red))
reg harper leftright age male, vce(robust)
Example 6. Heteroskedasticity in R
In R, you can test heteroskedasticity with the non constant
variance test called ncvTest in the car package. The plot can be
obtained with the spreadLevelPlot(m6) command (where m6 is
the object containing your model.
However, getting robust standard errors like Stata is a bit more

involved. You’ll need the sandwich and lmtest packages.
> library(sandwich)
> library(lmtest)
> coeftest(m6, vcov = vcovHC(m6, "HC1"))
t test of coefficients:

(Intercept) 3.309256 3.401537 0.9729 0.3308
leftright 7.502763 0.344853 21.7564 <2e-16 ***
age 0.042331 0.049701 0.8517 0.3945
maleMale -0.903626 1.477110 -0.6118 0.5408
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
Example 6. Normality of Residuals in Stata
You can plot the residuals against a theoretical normal distribution.
1.00
0.75
Normal F[(r-m)/s]
0.50
0.25
0.00
0.00 0.25 0.50 0.75 1.00
Empirical P[i] = i/(N+1)
quietly: reg harper leftright age male

predict r, residual
pnorm r
Example 6. Normality of Residuals in R
The same thing can be done in R with the command qqPlot(m6)
3
2
Studentized Residuals(m6)
1
0
−1
−2
−3
−3 −2 −1 0 1 2 3
t Quantiles
Example 6. Outlier Plot in Stata
You can identify the outliers by graphing a leverage plot with
lvr2plot
.01
.008
Leverage
.006
.004
.002
0 .002 .004 .006

Normalized residual squared
Example 6. Outlier Plot in R
You can do the same thing in R with
influencePlot(m6,id.method="identify")
3
Influence Plot
3810
2
Studentized Residuals
1
0
3084
−1
−2
−3
0.002 0.004 0.006 0.008 0.010

Hat−Values
Circle size is proportial to Cook's Distance
Introducing Regression
Part I – The World of OLS

Bivariate Regression
Multiple Regression
Predicted Values
Categorical Predictors
Interactions
Part II – The World of MLE

Introducing MLE
Logit Models
Ordered Logit Models
Categorical Outcomes: The Problem with OLS
OLS is pretty useful. However, it fails at explaining non-continuous

dependent variables. Imagine a simple model where Yi is binary.
Yi = β0 + β1 X1 + β2 X2 + ui
OLS estimates won’t be BLU in some respects:

I OLS gives an unbounded linear prediction, while Yi can only
be 0 or 1.
I OLS assumes ui to be normally distributed. But since
ui = Yi − Ŷi , the residuals can only take the values 1 − Ŷi or
−Yˆi
I OLS assumes constant variance of ui (homoskedasticity),
while the variance of a binary choice will not permit that.
Constant Variance is Impossible
Residuals .5
-.5
-1
-.5 0 .5 1
Fitted values
Maximum Likelihood Estimation
The world of MLE is the world of frequentist probability. Formally,
a probability is given by:
Pr (Y |M) = Pr (Data|Model)
Ideally, we would compute the inverse probability Pr (Model|Data),
but this is impossible.
Maximum Likelihood Estimation
The world of MLE is the world of frequentist probability. Formally,
a probability is given by:
Pr (Y |M) = Pr (Data|Model)
Ideally, we would compute the inverse probability Pr (Model|Data),
but this is impossible. Luckily, the likelihood function helps us a
lot.
N
Y
f (Y1 , Y2 , ..., Yn |θ) = f (Yi |θ) = L(θ|Y )
i=1
p(Y |θ) = L(θ|Y )
In other words, there is a fixed value of θ and we maximize the

likelihood to estimate θ and make assumptions to generate
uncertainty about the estimate.
Log Likelihood
We usually work with the log likelihood function:

N
X
ln L(θ|Y ) = ln f (Yi |θ)
i=1
The software then tries to maximize the function over the

likelihood function, by taking derivatives at multiple iterations.
When the derivative slope is 0, it found the maximum and
therefore has estimate the "most likely" θ parameters.
ML estimation can be extended to a variety of functions, including

logistic regression.
Logistic Regression for Binary Outcome
X
Pi = Pr (Yi = 1) = βk Xik
Linear Prediction Logistic Regression

1 1
.8 .8
Linear Prediction
.6
Pr(Party2)
.6
.4
.4
.2
.2
0
0
really dislike 20 40 60 80 really like really dislike 20 40 60 80 really like
Feelings about Stephen Harper Feelings about Stephen Harper
Example 7. Logit in Stata
Logistic regression draw from a log odds distribution by maximizing
the log likelihood. Coefficients are thus hard to interpret.
. logit party2 harper leftright i.male age
Iteration 0: log likelihood = -691.57458

Logistic regression Number of obs = 1021

LR chi2(4) = 764.91
Prob > chi2 = 0.0000
Log likelihood = -309.11808 Pseudo R2 = 0.5530
------------------------------------------------------------------------------
party2 | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
harper | .068045 .0046875 14.52 0.000 .0588576 .0772324
leftright | .5628441 .0684852 8.22 0.000 .4286156 .6970726
|
male |
Male | .1405912 .2059007 0.68 0.495 -.2629667 .5441492
age | .0091524 .0070297 1.30 0.193 -.0046257 .0229304
_cons | -7.549818 .6537195 -11.55 0.000 -8.831084 -6.268551
------------------------------------------------------------------------------
Odd ratios can also be used.

. logit party2 harper leftright i.male age, or // to get odd ratios

Logistic regression Number of obs = 1021

LR chi2(4) = 764.91
Prob > chi2 = 0.0000
------------------------------------------------------------------------------
party2 | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
harper | 1.070413 .0050176 14.52 0.000 1.060624 1.080293
leftright | 1.755659 .1202366 8.22 0.000 1.535131 2.007866
|
male |
Male | 1.150954 .2369822 0.68 0.495 .7687675 1.723142
age | 1.009194 .0070944 1.30 0.193 .995385 1.023195
_cons | .0005262 .000344 -11.55 0.000 .0001461 .001895
------------------------------------------------------------------------------
Example 7. Logit in R
> m7 <- glm(party2~harper+leftright+male+age, data=data, family="binomial")
> summary(m7)
Call:
glm(formula = party2 ~ harper + leftright + male + age, family = "binomial",
data = data)
Deviance Residuals:
-2.4018 -0.3470 -0.1040 0.4138 3.1348
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -7.549818 0.653717 -11.549 <2e-16 ***
harper 0.068045 0.004688 14.516 <2e-16 ***
leftright 0.562844 0.068485 8.219 <2e-16 ***
maleMale 0.140591 0.205900 0.683 0.495
age 0.009152 0.007030 1.302 0.193
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1383.15 on 1020 degrees of freedom

Residual deviance: 618.24 on 1016 degrees of freedom
AIC: 628.24
Number of Fisher Scoring iterations: 6

Example 7. Logit in R
Odds ratios can be extracted easily with exp(coef(m7)). To get

CI, simply state:
> exp(cbind(OR = coef(m7), confint(m7)))
Waiting for profiling to be done...
OR 2.5 % 97.5 %
(Intercept) 0.000526206 0.0001387989 0.001807294
harper 1.070413461 1.0610310197 1.080745614
leftright 1.755658711 1.5399522657 2.015034731
maleMale 1.150954049 0.7684742050 1.724720106
age 1.009194408 0.9954186183 1.023275717
Example 7. Predicted Probabilities in Stata
In GLM terms, marginal effects are more easy to understand.
However, only the ME for discrete variable behave as expected.
ME for continuous predictors give the instant rate of change at the
mean, which could vary across values.
. margins, dydx(*) atmeans // to get marginal effects
Conditional marginal effects Number of obs = 1021

Model VCE : OIM
Expression : Pr(party2), predict()

dy/dx w.r.t. : harper leftright 1.male age
at : harper = 45.00098 (mean)
leftright = 5.127326 (mean)
0.male = .5220372 (mean)
1.male = .4779628 (mean)
age = 58.86876 (mean)
------------------------------------------------------------------------------
| Delta-method
| dy/dx Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
harper | .0134041 .0008582 15.62 0.000 .011722 .0150861
leftright | .1108737 .0135569 8.18 0.000 .0843027 .1374448
male | .0277301 .040754 0.68 0.496 -.0521462 .1076065
age | .0018029 .0013874 1.30 0.194 -.0009164 .0045222
------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.
Example 7. Predicted Probabilities in Stata
You should then plot the predicted probabilities over a continous
predictor, either with margins or mcp, which will give you this
graph:
1
.8
Pr(party2), predict()
.6
.4
.2
0
0 20 40 60 80 100
harper
margins, at(harper=(1(1)100))
marginsplot
*or simply
mcp harper
Example 7. Marginal Effects in R
You can get marginal effects with the mfx package
install.packages("mfx")
library(mfx)
logitmfx(party2~harper+leftright+male+age, data=data, atmean =T)
Call:
logitmfx(formula = party2 ~ harper + leftright + male + age,
data = data, atmean = T)
Marginal Effects:
dF/dx Std. Err. z P>|z|
harper 0.0134041 0.0008582 15.6188 < 2.2e-16 ***
leftright 0.1108737 0.0135569 8.1784 2.876e-16 ***
maleMale 0.0277301 0.0407539 0.6804 0.4962
age 0.0018029 0.0013874 1.2995 0.1938
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
dF/dx is for discrete change for the following variables:
[1] "maleMale"
Ordered Logit
Sometimes, we have categorical variables that are still ordered. For

example, a statement reads "We have gone too far pushing
bilinguism in this country" and is coded :
I 1 for strongly disagree

I 2 for agree
I 3 for disagree
I 4 for strongly disagree
Be careful! The assumption over ordered logit is that the

conceptual cut between 1 and 2 is the same as between 2 and 3,
etc.
Example 8. Ordered Logit in Stata
Coefficients (here odd ratios) are the odds of increasing of one
outcome.
. ologit toobilingual i.votechoice harper leftright male age, or
Ordered logistic regression Number of obs = 970

LR chi2(8) = 172.97
Prob > chi2 = 0.0000
------------------------------------------------------------------------------
toobilingual | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
votechoice |
Tories | 2.302053 .462271 4.15 0.000 1.553056 3.412271
NDP | 1.08919 .1980942 0.47 0.639 .7625939 1.555658
BQ | .3370468 .0866534 -4.23 0.000 .2036337 .5578672
Greens | 1.267178 .3692448 0.81 0.416 .7158219 2.243213
|
harper | .999143 .0027409 -0.31 0.755 .9937854 1.00453
leftright | 1.20156 .0428846 5.14 0.000 1.12038 1.288622
male | 1.223467 .1457727 1.69 0.090 .9686658 1.545292
age | 1.010642 .0041668 2.57 0.010 1.002508 1.018842
-------------+----------------------------------------------------------------
/cut1 | -.0463056 .3173242 -.6682495 .5756383
/cut2 | 2.024309 .3232185 1.390812 2.657806
/cut3 | 3.722098 .3387192 3.058221 4.385976
------------------------------------------------------------------------------
Be careful: predicted probabilities now need to be computed on all
outcomes
Predictive Margins with 95% CIs Predictive Margins with 95% CIs
.4 .5
Pr(Toobilingual==1)
Pr(Toobilingual==2)
.3
.4
.2
.3
.1
0 .2
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Predictive Margins with 95% CIs Predictive Margins with 95% CIs
.4
Pr(Toobilingual==3)
Pr(Toobilingual==4)
.4
.35 .3
.3
.2
.25
.1
.2
.15 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
margins, at(leftright=(0(1)10)) predict(outcome(1))

Example 8. Ordered Logit in R
Ordered logit can be used with the MASS package.
install.packages("MASS")
require(MASS)
data$toobil.f <- factor(data$toobilingual)

is.factor(data$toobil.f)
m8 <- polr(toobil.f ~votechoice+harper+leftright+male+age, data = data, Hess=TRUE)
summary(m8)
Call:
polr(formula = toobil.f ~ votechoice + harper + leftright + male +
age, data = data, Hess = TRUE)
Coefficients:
Value Std. Error t value
votechoiceTories 0.8338049 0.200806 4.1523
votechoiceNDP 0.0854366 0.181873 0.4698
votechoiceBQ -1.0875314 0.257097 -4.2300
votechoiceGreens 0.2367981 0.291392 0.8126
harper -0.0008574 0.002744 -0.3125
leftright 0.1836207 0.035691 5.1448
maleMale 0.2016890 0.119147 1.6928
age 0.0105862 0.004124 2.5672
Intercepts:
Value Std. Error t value
1|2 -0.0463 0.3174 -0.1459
2|3 2.0243 0.3232 6.2626
3|4 3.7221 0.3387 10.9880
Residual Deviance: 2379.302

AIC: 2401.302
Finally, what if a categorical is unordered? In political science,
such situation often arise when study vote choice, where the
dependent variable might be coded like this:
I 1 for Liberals
I 2 for Tories
I 3 for NDP
I 4 for BQ
I 5 for Greens
Multinomial logit estimates the probability of choosing one

outcome over the other, just like a categorical variable would work.
Keep in mind that there are assumptions linked to that, one of

which is Independence of Irrelevant Alternatives (IIA). This needs
to be tested for.
Example 9. Multinomial Logit in Stata
mlogit votechoice harper leftright male age, rr
Multinomial logistic regression Number of obs = 1021
LR chi2(16) = 812.47
Prob > chi2 = 0.0000
------------------------------------------------------------------------------
votechoice | RRR Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
Libs |
harper | .9331527 .0046223 -13.97 0.000 .924137 .9422562
leftright | .6151022 .0443692 -6.74 0.000 .5340077 .7085118
male | .8724498 .1928675 -0.62 0.537 .5656791 1.345584
age | 1.003075 .0077044 0.40 0.689 .9880881 1.01829
_cons | 331.5598 228.216 8.43 0.000 86.03424 1277.77
-------------+----------------------------------------------------------------
Tories | (base outcome)
-------------+----------------------------------------------------------------
NDP |
harper | .935527 .0050303 -12.39 0.000 .9257195 .9454383
leftright | .5065051 .0400814 -8.60 0.000 .4337361 .5914828
male | .7520023 .1827952 -1.17 0.241 .4669934 1.210954
age | .9872603 .0082014 -1.54 0.123 .9713161 1.003466
_cons | 1235.203 891.2959 9.87 0.000 300.2823 5080.975
-------------+----------------------------------------------------------------
BQ |
harper | .9345699 .0061696 -10.25 0.000 .9225556 .9467407
leftright | .5878576 .0555462 -5.62 0.000 .4884756 .7074593
male | 1.078164 .3300636 0.25 0.806 .5917011 1.964569
age | .9714043 .01007 -2.80 0.005 .9518667 .991343
_cons | 520.7184 433.5257 7.51 0.000 101.8433 2662.4
-------------+----------------------------------------------------------------
Greens |
harper | .9379658 .0070879 -8.47 0.000 .9241761 .9519612
Example 9. Multinomial Logit in Stata
Change base outcome with

mlogit votechoice harper leftright male age, b(1) You
can also test for iia with the spost package
net install spost9_ado.pkg
quietly: mlogit votechoice harper leftright male age, b(1) //
. mlogtest, iia
**** Hausman tests of IIA assumption (N=1021)
Ho: Odds(Outcome-J vs Outcome-K) are independent of other alternatives.
Omitted | chi2 df P>chi2 evidence

---------+------------------------------------
Tories | 22.757 15 0.089 for Ho
NDP | -13.273 15 --- ---
BQ | -0.535 15 --- ---
Greens | -0.547 15 --- ---
----------------------------------------------
Note: If chi2<0, the estimated model does not
meet asymptotic assumptions of the test.
Example 9. Multinomial Logit in R
Multinomial logits can be used with the nnet package.
install.packages("nnet")
library(nnet)
m10 <- multinom(votechoice ~ harper + leftright + age + male, data = data)

summary(m10)
Call:
multinom(formula = votechoice ~ harper + leftright + age + male,
data = data)
Coefficients:
(Intercept) harper leftright age maleMale
Tories -5.8035817 0.069186831 0.48596775 -0.003074969 0.1365095
NDP 1.3150999 0.002542445 -0.19424343 -0.015891507 -0.1485712
BQ 0.4514183 0.001519217 -0.04532072 -0.032082301 0.2116362
Greens 1.4727807 0.005145085 -0.22967514 -0.046365245 0.2165980
Std. Errors:
(Intercept) harper leftright age maleMale
Tories 0.6883018 0.004953379 0.07213321 0.007680753 0.2210648
NDP 0.4518466 0.003892178 0.05559792 0.006537470 0.1906751
BQ 0.6231527 0.005509566 0.07768518 0.009155438 0.2705429
Greens 0.6837026 0.006601243 0.09366259 0.010719374 0.3188646
Residual Deviance: 1970.903

AIC: 2010.903
#choose baseline
data$vote2 <- relevel(data$votechoice, ref = "Tories")
m10.2 <- multinom(vote2 ~ harper + leftright + age + male, data = data)
summary(m10.2)
Thank You!
Any questions?
Don’t hesitate to write if you do.
jean-philippe.gauvin@umontreal.ca

Applying Regression Analysis: Jean-Philippe Gauvin Université de Montréal

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Applying Regression Analysis: Jean-Philippe Gauvin Université de Montréal

Uploaded by

Copyright:

Available Formats

Applying Regression Analysis

What is regression? How do we do it?

First hour: OLS Second hour: MLE/GLMs

All material is on my website: www.jpgauvin.com

Part I – The World of OLS

Part II – The World of MLE

In statistical modeling, regression analysis is a statistical process

In statistical modeling, regression analysis is a statistical process

Ordinary least square (OLS) regression is a family of regression

Figure: Effect of Age on Social Ideology

Figure: Effect of Age on Social Ideology

I It’s accurate – It gives an idea of the general pattern of the

In other words, it’s a very good summary of "what happens".

The model equation can be expressed as such:

The OLS equation then can be re-expressed like this:

The OLS is the line that minimizes residuals.

. egen Xbar = mean(X)

. disp "Alpha is " Ybar - (sumXY/sumX2)*Xbar

. egen Xbar = mean(X)

. disp "Alpha is " Ybar - (sumXY/sumX2)*Xbar

Or let Stata do it!

α is 10. β is .56, which means Ŷi = 10.15 + .56(Xi )

Figure: Effect of Age on Social Ideology

Sorry: The goal is not to get stars.

Instead, we want the Best Linear Unbiased estimators (BLUE),

1. The relationship is linear

R2 is the most used measure of fit. It is expressed as the ratio of

But what’s a good fit in social sciences? What about other

R: plot(harper ~ leftright, main="Example 1",

R: plot(jitter(harper,10) ~ jitter(leftright, 4), main="Example 1",

Feelings about Stephen Harper

. reg harper leftright

Source | SS df MS Number of obs = 1467

. reg harper leftright

Source | SS df MS Number of obs = 1467

> m1 <- (lm(harper~leftright, data = data))

Residual standard error: 28.04 on 1465 degrees of freedom

In a bivariate model, the OLS fits a line in a 2D space. But in a

However, we can extend the bivariate logic to multiple predictors,

When extending the bivariate model equation to multiple

When extending the bivariate model equation to multiple

Ex2. Does age also have an effect on feelings toward Harper?

. reg harper leftright age

Source | SS df MS Number of obs = 1440

Residual standard error: 27.94 on 1437 degrees of freedom

What if we want to predict a value on this new slope? Can we just

What if we want to predict a value on this new slope? Can we just

What if we want to predict a value on this new slope? Can we just

In Stata, we could easily check it after running the regression.

Variable | Obs Mean Std. Dev. Min Max

*Or simply use margins command

In R, you can easily specify your scenario values through a new

Note that for a predicted binary predictor, we could store in

When adding variables, it is possible to add categorical variables.

One common example is gender:

. reg harper b2.votechoice /*Conservative as baseline*/

In R, you need to change the order of the labels to change the

m4.2 <- lm(harper~votechoice2, data=data)

You can also test if a variable is a factor by using

Y = a + b1*Soc + b2*Gender + e Y = a + b1*Soc + b2*Gender + b3Soc*Gender + e

In Stata, the # sign identifies the interaction. Keep in mind that

Residual standard error: 7.212 on 196 degrees of freedom

. reg harper b2.votechoice /Conservative as baseline/

Y = a + b1Soc + b2Gender + e Y = a + b1Soc + b2Gender + b3Soc*Gender + e