Download as pdf or txt
Download as pdf or txt
You are on page 1of 71

Applying Regression Analysis

Jean-Philippe Gauvin
Université de Montréal

January 7 2016
Goals for Today

What is regression? How do we do it?

First hour: OLS Second hour: MLE/GLMs


I Bivariate regression
I Logit
I Multiple regression
I Interactions
I Ordered logit
I Regression diagnostics I Multinomial logit

All material is on my website: www.jpgauvin.com


Introducing Regression

Part I – The World of OLS


Bivariate Regression
Multiple Regression
Predicted Values
Categorical Predictors
Interactions
Regression Diagnostics

Part II – The World of MLE


Introducing MLE
Logit Models
Ordered Logit Models
Multinomial Logit Models
What is a regression?

In statistical modeling, regression analysis is a statistical process


for estimating the relationships among variables. It includes many
techniques for modeling and analyzing several variables, when the
focus is on the relationship between a dependent variable and one
or more independent variables (or ’predictors’).
What is a regression?

In statistical modeling, regression analysis is a statistical process


for estimating the relationships among variables. It includes many
techniques for modeling and analyzing several variables, when the
focus is on the relationship between a dependent variable and one
or more independent variables (or ’predictors’).

Ordinary least square (OLS) regression is a family of regression


models that fit a line by minimizing the residual sum of squares. It
can only be used when dealing with a continuous dependent
variable.
The scatterpoint

Figure: Effect of Age on Social Ideology

.5
Gauche-Droite Sociale

-.5

-1
20 40 60 80 100
Age
The Basic Linear Model (OLS)

Figure: Effect of Age on Social Ideology

.5
Gauche-Droite Sociale

-.5

-1
20 40 60 80 100
Age
Why a Line?

I It’s accurate – It gives an idea of the general pattern of the


data
I It’s easy to estimate – It’s a low computational cost, as a
single formula can be used.
I It’s easy to describe – It can summarize the relationship
between variables with only one number: the slope.

In other words, it’s a very good summary of "what happens".


The Components of OLS

The model equation can be expressed as such:

Yi = α + βXi + i

Where:
I y is the dependent variable
I α is the intercept
I β is the slope
I x is the predictor
I  is the error term
The Components of OLS

The OLS equation then can be re-expressed like this:

Ŷi = α + βXi

Where is the error term? The OLS aims to keep the residual  as
small as possible. In other words,

Yi = Ŷi + i
i = Yi − Ŷi

The OLS is the line that minimizes residuals.


What the Software Does
If the OLS is the line that minimizes the residual sum of squares
(RSS)... how does it do that? How do we get a slope (β) and an
intercept (α)?

+-------------------------+
X | X Y Xbar Ybar |
RSS = 2i |-------------------------|
X 1. | 2 10 4.375 12.625 |
= (Yi − Ŷi )2
2. | 5 15 4.375 12.625 |
3. | 6 16 4.375 12.625 |
4. | 4 12 4.375 12.625 |
(Xi − X̄ )(Yi − Ȳ )
P
β̂ = 5. | 8 14 4.375 12.625 |
(Xi − X̄ )2
P
|-------------------------|
6. | 1 12 4.375 12.625 |
7. | 4 12 4.375 12.625 |
α̂ = Ȳ − β̂1 X̄ 8. | 5 10 4.375 12.625 |
+-------------------------+
Estimating the Regression Line

. egen Xbar = mean(X)


. egen Ybar = mean(Y)
. egen sumXY = sum((X-Xbar)*(Y-Ybar))
. egen sumX2 = sum((X-Xbar)^2)
.
. disp "Beta is " sumXY/sumX2
Beta is .56457565

. disp "Alpha is " Ybar - (sumXY/sumX2)*Xbar


Alpha is 10.154982
Estimating the Regression Line

. egen Xbar = mean(X)


. egen Ybar = mean(Y)
. egen sumXY = sum((X-Xbar)*(Y-Ybar))
. egen sumX2 = sum((X-Xbar)^2)
.
. disp "Beta is " sumXY/sumX2
Beta is .56457565

. disp "Alpha is " Ybar - (sumXY/sumX2)*Xbar


Alpha is 10.154982

Or let Stata do it!


. reg Y X
--------------------------------------------------------------------------
Y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+----------------------------------------------------------------
X | .5645756 .3369605 1.68 0.145 -.259937 1.389088
_cons | 10.15498 1.629127 6.23 0.001 6.168652 14.14131
--------------------------------------------------------------------------

α is 10. β is .56, which means Ŷi = 10.15 + .56(Xi )


The Basic Linear Model (OLS)

Figure: Effect of Age on Social Ideology

.5
Gauche-Droite Sociale

beta=.004
0

alpha = -.45
-.5

-1
10 30 50 70 90
Age
The Assumptions of OLS

Sorry: The goal is not to get stars.

Instead, we want the Best Linear Unbiased estimators (BLUE),


stated by five assumptions:

1. The relationship is linear


2. The errors are normally distributed.
3. The errors have constant variance (no heteroskedasticy, no
autocorrelation)
4. X is fixed on repeated sampling (no selection bias)
5. No exact linear relationships between independent variables
and more observations than independent variables
Other Consideration: Model fit

R2 is the most used measure of fit. It is expressed as the ratio of


the explained variance to the total variance.
In other words R 2 = sŶ2 /sY2 .

But what’s a good fit in social sciences? What about other


measures (AIC, BIC, etc.)?
Example 1: The Bivariate Regression
Stata: scatter harper leftright

R: plot(harper ~ leftright, main="Example 1",


xlab="Ideology ", ylab="Feelings for Harper ")

100
Feelings about Stephen Harper

80

60

40

20

0
0 2 4 6 8 10
Left/right: Where would you place yourself on the scale below?
Example 1: The Bivariate Regression
Stata:
twoway (scatter harper leftright, jitter(20)) (lfit harper leftright)

R: plot(jitter(harper,10) ~ jitter(leftright, 4), main="Example 1",


xlab="Ideology ", ylab="Feelings for Harper ")
abline(lm(harper~leftright), col="red")

100

80

60

40

20

0
0 2 4 6 8 10
Left/right: Where would you place yourself on the scale below?

Feelings about Stephen Harper


Fitted values
Example 1: Stata

. reg harper leftright

Source | SS df MS Number of obs = 1467


-------------+------------------------------ F( 1, 1465) = 410.77
Model | 322914.797 1 322914.797 Prob > F = 0.0000
Residual | 1151659.74 1465 786.115861 R-squared = 0.2190
-------------+------------------------------ Adj R-squared = 0.2185
Total | 1474574.53 1466 1005.84893 Root MSE = 28.038

------------------------------------------------------------------------------
harper | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
leftright | 7.452856 .3677241 20.27 0.000 6.731535 8.174178
_cons | 5.717891 1.995533 2.87 0.004 1.803484 9.632297
------------------------------------------------------------------------------
Example 1: Stata

. reg harper leftright

Source | SS df MS Number of obs = 1467


-------------+------------------------------ F( 1, 1465) = 410.77
Model | 322914.797 1 322914.797 Prob > F = 0.0000
Residual | 1151659.74 1465 786.115861 R-squared = 0.2190
-------------+------------------------------ Adj R-squared = 0.2185
Total | 1474574.53 1466 1005.84893 Root MSE = 28.038

------------------------------------------------------------------------------
harper | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
leftright | 7.452856 .3677241 20.27 0.000 6.731535 8.174178
_cons | 5.717891 1.995533 2.87 0.004 1.803484 9.632297
------------------------------------------------------------------------------

FH
d = 5.71 + 7.45(LR)
Example 1: R

> m1 <- (lm(harper~leftright, data = data))


> summary(m1)

Call:
lm(formula = harper ~ leftright)

Residuals:
Min 1Q Median 3Q Max
-79.246 -22.982 2.112 21.924 79.376

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.7179 1.9955 2.865 0.00423 **
leftright 7.4529 0.3677 20.268 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 28.04 on 1465 degrees of freedom


(2841 observations deleted due to missingness)
Multiple R-squared: 0.219,Adjusted R-squared: 0.2185
F-statistic: 410.8 on 1 and 1465 DF, p-value: < 2.2e-16
Multiple regression

In a bivariate model, the OLS fits a line in a 2D space. But in a


multiple regression, it fits a line for each covariate in
N-dimensional space. You could theoretically draw a plane in a 3D
space, but what about 4, 5 or 20 dimensions?

However, we can extend the bivariate logic to multiple predictors,


where the coefficients lead to multiple slopes piercing through
N-dimension.
The Multiple Regression Equation

When extending the bivariate model equation to multiple


predictors, we get this:

Yi = β0 + β1 X1 + β2 X2 + ... + βn Xn + i
The Multiple Regression Equation

When extending the bivariate model equation to multiple


predictors, we get this:

Yi = β0 + β1 X1 + β2 X2 + ... + βn Xn + i

Ex2. Does age also have an effect on feelings toward Harper?


The OLS equation should read

FH = β0 + β1 LR + β2 Age + 
Example 2. The Multiple Regression

100 100

80 80

60 60

40 40 B= 0.12

B= 7.45

20 20

0 0
0 2 4 6 8 10 20 40 60 80 100
ideology age
Example 2. Stata

. reg harper leftright age

Source | SS df MS Number of obs = 1440


----------+------------------------------ F( 2, 1437) = 208.47
Model | 325425.809 2 162712.904 Prob > F = 0.0000
Residual | 1121580.18 1437 780.501169 R-squared = 0.2249
----------+------------------------------ Adj R-squared = 0.2238
Total | 1447005.99 1439 1005.56358 Root MSE = 27.937

---------------------------------------------------------------------------
harper | Coef. Std. Err. t P>|t| [95% Conf. Interval]
----------+----------------------------------------------------------------
leftright | 7.488689 .3689223 20.30 0.000 6.765005 8.212373
age | .0417154 .0500968 0.83 0.405 -.0565554 .1399862
_cons | 2.990736 3.432887 0.87 0.384 -3.74327 9.724742
---------------------------------------------------------------------------
Example 2. R
> m2 <- lm(harper ~ leftright + age, data=data)
> summary(m2)

Call:
lm(formula = harper ~ leftright + age, data = data)

Residuals:
Min 1Q Median 3Q Max
-79.339 -21.988 2.211 21.543 79.612

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.99074 3.43289 0.871 0.384
leftright 7.48869 0.36892 20.299 <2e-16 ***
age 0.04172 0.05010 0.833 0.405
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 27.94 on 1437 degrees of freedom


(2868 observations deleted due to missingness)
Multiple R-squared: 0.2249,Adjusted R-squared: 0.2238
F-statistic: 208.5 on 2 and 1437 DF, p-value: < 2.2e-16
Predicting Ŷ ?

What if we want to predict a value on this new slope? Can we just


plug in the equation? What if we want to predict Ŷ when
leftright is a 1 and our age is 50? Is it this easy?
Predicting Ŷ ?

What if we want to predict a value on this new slope? Can we just


plug in the equation? What if we want to predict Ŷ when
leftright is a 1 and our age is 50? Is it this easy?

FH
d = 2.99 + 7.49(LR) + .04(Age)
= 2.99 + 7.49 ∗ 1 + .04 ∗ 50...
= 12.48
Predicting Ŷ ?

What if we want to predict a value on this new slope? Can we just


plug in the equation? What if we want to predict Ŷ when
leftright is a 1 and our age is 50? Is it this easy?

FH
d = 2.99 + 7.49(LR) + .04(Age)
= 2.99 + 7.49 ∗ 1 + .04 ∗ 50...
= 12.48

In Stata, we could easily check it after running the regression.


. quietly: reg harper leftright age
. disp _b[_cons] + _b[leftright]*1 + _b[age]*50
12.565196
Other Ways of Predicting Ŷ in Stata
*Predicting yhat for leftright=1 and age=50
. predict yhat
(option xb assumed; fitted values)
(2849 missing values generated)

. sum yhat if leftright==1 & age==50 /*lucky we have one obs = 12.56*/

Variable | Obs Mean Std. Dev. Min Max


-------------+--------------------------------------------------------
yhat | 1 12.5652 . 12.5652 12.5652

*Or simply use margins command


. margins, at(leftright=1 age=50)
at : leftright = 1
age = 50

------------------------------------------------------------------------------
| Delta-method
| Margin Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_cons | 12.5652 1.691971 7.43 0.000 9.246198 15.88419
------------------------------------------------------------------------------
Predicting Ŷ in R

In R, you can easily specify your scenario values through a new


dataframe.
> m3 <- (lm(harper~leftright + age , data=data))
> newdat <- data.frame(
+ leftright = 1,
+ age = 50)
> predict(m3, newdat, interval="confidence")
fit lwr upr
1 12.5652 9.246198 15.88419
>

Note that for a predicted binary predictor, we could store in


newdat something like male="Male" depending on the label (if
dummy is factor). If you wanted multiple values, you could predict
using a vector like leftright=c(1,2,3,4,5)
Using Categorical Predictors

When adding variables, it is possible to add categorical variables.


This is done by adding binary variables.

One common example is gender:


. reg harper leftright male
---------------------------------------------------------------------------
harper | Coef. Std. Err. t P>|t| [95% Conf. Interval]
----------+----------------------------------------------------------------
leftright | 7.463482 .3686871 20.24 0.000 6.740271 8.186694
male | -.6208031 1.470247 -0.42 0.673 -3.504819 2.263213
_cons | 5.957086 2.07492 2.87 0.004 1.886952 10.02722
---------------------------------------------------------------------------

But what does the coefficient mean? Being a male, rather than a
female, decreases feelings for Harper by 0.62 (not statistically
significant).
Adding a Binary Variable

Y = a + b1*LR + b2*Gender + e
30

25
Linear Prediction

20

15

10
b2{ Female intercept =b0
Male intercept = b0 + b2
5
0 .5 1 1.5 2 2.5 3
Left/right: Where would you place yourself on the scale below?
Ex 4. Categorical Variables in Stata
In Stata, you can use tab varname, gen(newvar) to
automatically create dummies, or you can simply use the prefix i.
or even specify the baseline with b1. b2. b3. etc.
. reg harper i.votechoice /*liberal is baseline*/
------------------------------------------------------------------------------
harper | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
votechoice |
Tories | 45.08578 1.168005 38.60 0.000 42.79546 47.3761
NDP | .1385281 1.388459 0.10 0.921 -2.584075 2.861131
BQ | -3.868295 1.912745 -2.02 0.043 -7.61896 -.1176298
Greens | -1.171874 2.515343 -0.47 0.641 -6.104162 3.760414
_cons | 29.37576 .9241736 31.79 0.000 27.56356 31.18795
------------------------------------------------------------------------------

. reg harper b2.votechoice /*Conservative as baseline*/


------------------------------------------------------------------------------
harper | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
votechoice |
Libs | -45.08578 1.168005 -38.60 0.000 -47.3761 -42.79546
NDP | -44.94725 1.258515 -35.71 0.000 -47.41505 -42.47945
BQ | -48.95408 1.820614 -26.89 0.000 -52.52408 -45.38407
Greens | -46.25765 2.446016 -18.91 0.000 -51.054 -41.46131
_cons | 74.46154 .7142404 104.25 0.000 73.061 75.86208
------------------------------------------------------------------------------
Ex 4. Categorical Variables in R
As with Stata, R automatically recognizes categorical variables.
> m4 <- lm(harper~votechoice, data=data)
> summary(m4)

Call:
lm(formula = harper ~ votechoice, data = data)

Residuals:
Min 1Q Median 3Q Max
-74.462 -19.462 0.538 15.538 70.624

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 29.3758 0.9242 31.786 <2e-16 ***
votechoiceTories 45.0858 1.1680 38.601 <2e-16 ***
votechoiceNDP 0.1385 1.3885 0.100 0.9205
votechoiceBQ -3.8683 1.9127 -2.022 0.0432 *
votechoiceGreens -1.1719 2.5153 -0.466 0.6413
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Ex 4. Categorical Variables in R

In R, you need to change the order of the labels to change the


baseline.
levels(data$votechoice) #current order
data$votechoice2 = factor(data$votechoice, c("Tories","Libs","NDP",
+ "BQ","Greens"))
levels(data$votechoice2)

m4.2 <- lm(harper~votechoice2, data=data)


summary(m4.2)

You can also test if a variable is a factor by using


is.factor(male). If it isn’t, you could specify it with
male.f <- factor(male, labels = c("male", "female"))
Interactions
We might have reasons to believe that the effect of a variable varies
on the range of another variable. This is called an interaction.

Y = a + b1*Soc + b2*Gender + e Y = a + b1*Soc + b2*Gender + b3Soc*Gender + e


70 70

60 60
Linear Prediction

Linear Prediction
50 50

40 40

male male
female female
writing score writing score
30 30
20 25 30 35 40 45 50 55 60 65 70 20 25 30 35 40 45 50 55 60 65 70
social studies score social studies score
Example 5. Stata

In Stata, the # sign identifies the interaction. Keep in mind that


by default, Stata thinks of factorial interactions. Continuous terms
need the prefix c. in order to work properly.
regress write i.female##c.socst

-------------------------------------------------------------
write | Coef. Std. Err. t P>|t|
----------------+--------------------------------------------
female | 15.00001 5.09795 2.94 0.004
socst | .6247968 .0670709 9.32 0.000
|
female#c.socst | -.2047288 .0953726 -2.15 0.033
|
_cons | 17.7619 3.554993 5.00 0.000
-------------------------------------------------------------
Example 5. R
In R, variables are already defined as factors or not. Interactions
are thus handled automatically.
> m5 <- lm(write~ female*socst, data=data2)
> summary(m5)

Call:
lm(formula = write ~ female * socst, data = data2)

Residuals:
Min 1Q Median 3Q Max
-18.6265 -4.3108 -0.0645 5.0429 16.4974

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 17.76190 3.55499 4.996 1.29e-06 ***
femalefemale 15.00001 5.09795 2.942 0.00365 **
socst 0.62480 0.06707 9.315 < 2e-16 ***
femalefemale:socst -0.20473 0.09537 -2.147 0.03305 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 7.212 on 196 degrees of freedom


Multiple R-squared: 0.4299,Adjusted R-squared: 0.4211
F-statistic: 49.26 on 3 and 196 DF, p-value: < 2.2e-16
Regression Diagnostics

Remember the main assumptions:

1. The relationship is linear


2. The expected value of the error term is zero.
3. The errors have identical, normal distributions (no
heteroskedasticy, no autocorrelation)

To get unbiased and efficient estimators, we must make sure we


don’t violate these assumptions. We’ll focus on distribution of
error and outliers.
Normally distributed and constant variance
Example 6. Heteroskedasticity in Stata
You can test heteroskedasticity with estat hettest or
imtest, white, or simply look at the residuals.

100

50
Residuals

-50

-100
0 20 40 60 80
Fitted values

reg harper leftright age male


estat hettest
imtest, white
rvfplot, yline(0, lcolor(red))
reg harper leftright age male, vce(robust)
Example 6. Heteroskedasticity in R
In R, you can test heteroskedasticity with the non constant
variance test called ncvTest in the car package. The plot can be
obtained with the spreadLevelPlot(m6) command (where m6 is
the object containing your model.

However, getting robust standard errors like Stata is a bit more


involved. You’ll need the sandwich and lmtest packages.
> library(sandwich)
> library(lmtest)
> coeftest(m6, vcov = vcovHC(m6, "HC1"))

t test of coefficients:

Estimate Std. Error t value Pr(>|t|)


(Intercept) 3.309256 3.401537 0.9729 0.3308
leftright 7.502763 0.344853 21.7564 <2e-16 ***
age 0.042331 0.049701 0.8517 0.3945
maleMale -0.903626 1.477110 -0.6118 0.5408
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

>
Example 6. Normality of Residuals in Stata
You can plot the residuals against a theoretical normal distribution.

1.00

0.75
Normal F[(r-m)/s]

0.50

0.25

0.00
0.00 0.25 0.50 0.75 1.00
Empirical P[i] = i/(N+1)

quietly: reg harper leftright age male


predict r, residual
pnorm r
Example 6. Normality of Residuals in R
The same thing can be done in R with the command qqPlot(m6)

3
2
Studentized Residuals(m6)
1
0
−1
−2
−3

−3 −2 −1 0 1 2 3
t Quantiles
Example 6. Outlier Plot in Stata
You can identify the outliers by graphing a leverage plot with
lvr2plot

.01

.008
Leverage

.006

.004

.002

0 .002 .004 .006


Normalized residual squared
Example 6. Outlier Plot in R
You can do the same thing in R with
influencePlot(m6,id.method="identify")

3
Influence Plot
3810
2
Studentized Residuals
1
0

3084
−1
−2
−3

0.002 0.004 0.006 0.008 0.010


Hat−Values
Circle size is proportial to Cook's Distance
Introducing Regression

Part I – The World of OLS


Bivariate Regression
Multiple Regression
Predicted Values
Categorical Predictors
Interactions
Regression Diagnostics

Part II – The World of MLE


Introducing MLE
Logit Models
Ordered Logit Models
Multinomial Logit Models
Categorical Outcomes: The Problem with OLS

OLS is pretty useful. However, it fails at explaining non-continuous


dependent variables. Imagine a simple model where Yi is binary.

Yi = β0 + β1 X1 + β2 X2 + ui

OLS estimates won’t be BLU in some respects:


I OLS gives an unbounded linear prediction, while Yi can only
be 0 or 1.
I OLS assumes ui to be normally distributed. But since
ui = Yi − Ŷi , the residuals can only take the values 1 − Ŷi or
−Yˆi
I OLS assumes constant variance of ui (homoskedasticity),
while the variance of a binary choice will not permit that.
Constant Variance is Impossible

Residuals .5

-.5

-1

-.5 0 .5 1
Fitted values
Maximum Likelihood Estimation
The world of MLE is the world of frequentist probability. Formally,
a probability is given by:

Pr (Y |M) = Pr (Data|Model)
Ideally, we would compute the inverse probability Pr (Model|Data),
but this is impossible.
Maximum Likelihood Estimation
The world of MLE is the world of frequentist probability. Formally,
a probability is given by:

Pr (Y |M) = Pr (Data|Model)
Ideally, we would compute the inverse probability Pr (Model|Data),
but this is impossible. Luckily, the likelihood function helps us a
lot.
N
Y
f (Y1 , Y2 , ..., Yn |θ) = f (Yi |θ) = L(θ|Y )
i=1
p(Y |θ) = L(θ|Y )

In other words, there is a fixed value of θ and we maximize the


likelihood to estimate θ and make assumptions to generate
uncertainty about the estimate.
Log Likelihood

We usually work with the log likelihood function:


N
X
ln L(θ|Y ) = ln f (Yi |θ)
i=1

The software then tries to maximize the function over the


likelihood function, by taking derivatives at multiple iterations.
When the derivative slope is 0, it found the maximum and
therefore has estimate the "most likely" θ parameters.

ML estimation can be extended to a variety of functions, including


logistic regression.
Logistic Regression for Binary Outcome
X
Pi = Pr (Yi = 1) = βk Xik

Linear Prediction Logistic Regression


1 1

.8 .8
Linear Prediction

.6

Pr(Party2)
.6

.4
.4

.2
.2

0
0
really dislike 20 40 60 80 really like really dislike 20 40 60 80 really like
Feelings about Stephen Harper Feelings about Stephen Harper
Example 7. Logit in Stata
Logistic regression draw from a log odds distribution by maximizing
the log likelihood. Coefficients are thus hard to interpret.
. logit party2 harper leftright i.male age

Iteration 0: log likelihood = -691.57458


Iteration 1: log likelihood = -330.15422
Iteration 2: log likelihood = -309.76953
Iteration 3: log likelihood = -309.11928
Iteration 4: log likelihood = -309.11808
Iteration 5: log likelihood = -309.11808

Logistic regression Number of obs = 1021


LR chi2(4) = 764.91
Prob > chi2 = 0.0000
Log likelihood = -309.11808 Pseudo R2 = 0.5530

------------------------------------------------------------------------------
party2 | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
harper | .068045 .0046875 14.52 0.000 .0588576 .0772324
leftright | .5628441 .0684852 8.22 0.000 .4286156 .6970726
|
male |
Male | .1405912 .2059007 0.68 0.495 -.2629667 .5441492
age | .0091524 .0070297 1.30 0.193 -.0046257 .0229304
_cons | -7.549818 .6537195 -11.55 0.000 -8.831084 -6.268551
------------------------------------------------------------------------------
Example 7. Logit in Stata

Odd ratios can also be used.


. logit party2 harper leftright i.male age, or // to get odd ratios

Iteration 0: log likelihood = -691.57458


Iteration 1: log likelihood = -330.15422
Iteration 2: log likelihood = -309.76953
Iteration 3: log likelihood = -309.11928
Iteration 4: log likelihood = -309.11808
Iteration 5: log likelihood = -309.11808

Logistic regression Number of obs = 1021


LR chi2(4) = 764.91
Prob > chi2 = 0.0000
Log likelihood = -309.11808 Pseudo R2 = 0.5530

------------------------------------------------------------------------------
party2 | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
harper | 1.070413 .0050176 14.52 0.000 1.060624 1.080293
leftright | 1.755659 .1202366 8.22 0.000 1.535131 2.007866
|
male |
Male | 1.150954 .2369822 0.68 0.495 .7687675 1.723142
age | 1.009194 .0070944 1.30 0.193 .995385 1.023195
_cons | .0005262 .000344 -11.55 0.000 .0001461 .001895
------------------------------------------------------------------------------
Example 7. Logit in R
> m7 <- glm(party2~harper+leftright+male+age, data=data, family="binomial")
> summary(m7)

Call:
glm(formula = party2 ~ harper + leftright + male + age, family = "binomial",
data = data)

Deviance Residuals:
Min 1Q Median 3Q Max
-2.4018 -0.3470 -0.1040 0.4138 3.1348

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -7.549818 0.653717 -11.549 <2e-16 ***
harper 0.068045 0.004688 14.516 <2e-16 ***
leftright 0.562844 0.068485 8.219 <2e-16 ***
maleMale 0.140591 0.205900 0.683 0.495
age 0.009152 0.007030 1.302 0.193
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 1383.15 on 1020 degrees of freedom


Residual deviance: 618.24 on 1016 degrees of freedom
(3287 observations deleted due to missingness)
AIC: 628.24

Number of Fisher Scoring iterations: 6


Example 7. Logit in R

Odds ratios can be extracted easily with exp(coef(m7)). To get


CI, simply state:
> exp(cbind(OR = coef(m7), confint(m7)))
Waiting for profiling to be done...
OR 2.5 % 97.5 %
(Intercept) 0.000526206 0.0001387989 0.001807294
harper 1.070413461 1.0610310197 1.080745614
leftright 1.755658711 1.5399522657 2.015034731
maleMale 1.150954049 0.7684742050 1.724720106
age 1.009194408 0.9954186183 1.023275717
Example 7. Predicted Probabilities in Stata
In GLM terms, marginal effects are more easy to understand.
However, only the ME for discrete variable behave as expected.
ME for continuous predictors give the instant rate of change at the
mean, which could vary across values.
. margins, dydx(*) atmeans // to get marginal effects

Conditional marginal effects Number of obs = 1021


Model VCE : OIM

Expression : Pr(party2), predict()


dy/dx w.r.t. : harper leftright 1.male age
at : harper = 45.00098 (mean)
leftright = 5.127326 (mean)
0.male = .5220372 (mean)
1.male = .4779628 (mean)
age = 58.86876 (mean)

------------------------------------------------------------------------------
| Delta-method
| dy/dx Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
harper | .0134041 .0008582 15.62 0.000 .011722 .0150861
leftright | .1108737 .0135569 8.18 0.000 .0843027 .1374448
male | .0277301 .040754 0.68 0.496 -.0521462 .1076065
age | .0018029 .0013874 1.30 0.194 -.0009164 .0045222
------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.
Example 7. Predicted Probabilities in Stata
You should then plot the predicted probabilities over a continous
predictor, either with margins or mcp, which will give you this
graph:
1

.8

Pr(party2), predict()
.6

.4

.2

0
0 20 40 60 80 100
harper

margins, at(harper=(1(1)100))
marginsplot

*or simply
mcp harper
Example 7. Marginal Effects in R
You can get marginal effects with the mfx package
install.packages("mfx")
library(mfx)
logitmfx(party2~harper+leftright+male+age, data=data, atmean =T)

Call:
logitmfx(formula = party2 ~ harper + leftright + male + age,
data = data, atmean = T)

Marginal Effects:
dF/dx Std. Err. z P>|z|
harper 0.0134041 0.0008582 15.6188 < 2.2e-16 ***
leftright 0.1108737 0.0135569 8.1784 2.876e-16 ***
maleMale 0.0277301 0.0407539 0.6804 0.4962
age 0.0018029 0.0013874 1.2995 0.1938
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

dF/dx is for discrete change for the following variables:

[1] "maleMale"
Ordered Logit

Sometimes, we have categorical variables that are still ordered. For


example, a statement reads "We have gone too far pushing
bilinguism in this country" and is coded :

I 1 for strongly disagree


I 2 for agree
I 3 for disagree
I 4 for strongly disagree

Be careful! The assumption over ordered logit is that the


conceptual cut between 1 and 2 is the same as between 2 and 3,
etc.
Example 8. Ordered Logit in Stata
Coefficients (here odd ratios) are the odds of increasing of one
outcome.
. ologit toobilingual i.votechoice harper leftright male age, or

Ordered logistic regression Number of obs = 970


LR chi2(8) = 172.97
Prob > chi2 = 0.0000
Log likelihood = -1189.651 Pseudo R2 = 0.0678

------------------------------------------------------------------------------
toobilingual | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
votechoice |
Tories | 2.302053 .462271 4.15 0.000 1.553056 3.412271
NDP | 1.08919 .1980942 0.47 0.639 .7625939 1.555658
BQ | .3370468 .0866534 -4.23 0.000 .2036337 .5578672
Greens | 1.267178 .3692448 0.81 0.416 .7158219 2.243213
|
harper | .999143 .0027409 -0.31 0.755 .9937854 1.00453
leftright | 1.20156 .0428846 5.14 0.000 1.12038 1.288622
male | 1.223467 .1457727 1.69 0.090 .9686658 1.545292
age | 1.010642 .0041668 2.57 0.010 1.002508 1.018842
-------------+----------------------------------------------------------------
/cut1 | -.0463056 .3173242 -.6682495 .5756383
/cut2 | 2.024309 .3232185 1.390812 2.657806
/cut3 | 3.722098 .3387192 3.058221 4.385976
------------------------------------------------------------------------------
Example 8. Logit in Stata
Be careful: predicted probabilities now need to be computed on all
outcomes

Predictive Margins with 95% CIs Predictive Margins with 95% CIs
.4 .5
Pr(Toobilingual==1)

Pr(Toobilingual==2)
.3
.4
.2
.3
.1

0 .2
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Left/right: Where would you place yourself on the scale below?
Left/right: Where would you place yourself on the scale below?

Predictive Margins with 95% CIs Predictive Margins with 95% CIs
.4
Pr(Toobilingual==3)

Pr(Toobilingual==4)
.4
.35 .3
.3
.2
.25
.1
.2
.15 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Left/right: Where would you place yourself on the scale below?
Left/right: Where would you place yourself on the scale below?

margins, at(leftright=(0(1)10)) predict(outcome(1))


margins, at(leftright=(0(1)10)) predict(outcome(2))
margins, at(leftright=(0(1)10)) predict(outcome(3))
margins, at(leftright=(0(1)10)) predict(outcome(4))
Example 8. Ordered Logit in R
Ordered logit can be used with the MASS package.
install.packages("MASS")
require(MASS)

data$toobil.f <- factor(data$toobilingual)


is.factor(data$toobil.f)
m8 <- polr(toobil.f ~votechoice+harper+leftright+male+age, data = data, Hess=TRUE)
summary(m8)
Call:
polr(formula = toobil.f ~ votechoice + harper + leftright + male +
age, data = data, Hess = TRUE)

Coefficients:
Value Std. Error t value
votechoiceTories 0.8338049 0.200806 4.1523
votechoiceNDP 0.0854366 0.181873 0.4698
votechoiceBQ -1.0875314 0.257097 -4.2300
votechoiceGreens 0.2367981 0.291392 0.8126
harper -0.0008574 0.002744 -0.3125
leftright 0.1836207 0.035691 5.1448
maleMale 0.2016890 0.119147 1.6928
age 0.0105862 0.004124 2.5672

Intercepts:
Value Std. Error t value
1|2 -0.0463 0.3174 -0.1459
2|3 2.0243 0.3232 6.2626
3|4 3.7221 0.3387 10.9880

Residual Deviance: 2379.302


AIC: 2401.302
(3338 observations deleted due to missingness)
Multinomial Logit Models
Finally, what if a categorical is unordered? In political science,
such situation often arise when study vote choice, where the
dependent variable might be coded like this:
I 1 for Liberals
I 2 for Tories
I 3 for NDP
I 4 for BQ
I 5 for Greens

Multinomial logit estimates the probability of choosing one


outcome over the other, just like a categorical variable would work.

Keep in mind that there are assumptions linked to that, one of


which is Independence of Irrelevant Alternatives (IIA). This needs
to be tested for.
Example 9. Multinomial Logit in Stata
mlogit votechoice harper leftright male age, rr
Multinomial logistic regression Number of obs = 1021
LR chi2(16) = 812.47
Prob > chi2 = 0.0000
Log likelihood = -985.4517 Pseudo R2 = 0.2919

------------------------------------------------------------------------------
votechoice | RRR Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
Libs |
harper | .9331527 .0046223 -13.97 0.000 .924137 .9422562
leftright | .6151022 .0443692 -6.74 0.000 .5340077 .7085118
male | .8724498 .1928675 -0.62 0.537 .5656791 1.345584
age | 1.003075 .0077044 0.40 0.689 .9880881 1.01829
_cons | 331.5598 228.216 8.43 0.000 86.03424 1277.77
-------------+----------------------------------------------------------------
Tories | (base outcome)
-------------+----------------------------------------------------------------
NDP |
harper | .935527 .0050303 -12.39 0.000 .9257195 .9454383
leftright | .5065051 .0400814 -8.60 0.000 .4337361 .5914828
male | .7520023 .1827952 -1.17 0.241 .4669934 1.210954
age | .9872603 .0082014 -1.54 0.123 .9713161 1.003466
_cons | 1235.203 891.2959 9.87 0.000 300.2823 5080.975
-------------+----------------------------------------------------------------
BQ |
harper | .9345699 .0061696 -10.25 0.000 .9225556 .9467407
leftright | .5878576 .0555462 -5.62 0.000 .4884756 .7074593
male | 1.078164 .3300636 0.25 0.806 .5917011 1.964569
age | .9714043 .01007 -2.80 0.005 .9518667 .991343
_cons | 520.7184 433.5257 7.51 0.000 101.8433 2662.4
-------------+----------------------------------------------------------------
Greens |
harper | .9379658 .0070879 -8.47 0.000 .9241761 .9519612
Example 9. Multinomial Logit in Stata

Change base outcome with


mlogit votechoice harper leftright male age, b(1) You
can also test for iia with the spost package
net install spost9_ado.pkg
quietly: mlogit votechoice harper leftright male age, b(1) //

. mlogtest, iia

**** Hausman tests of IIA assumption (N=1021)

Ho: Odds(Outcome-J vs Outcome-K) are independent of other alternatives.

Omitted | chi2 df P>chi2 evidence


---------+------------------------------------
Tories | 22.757 15 0.089 for Ho
NDP | -13.273 15 --- ---
BQ | -0.535 15 --- ---
Greens | -0.547 15 --- ---
----------------------------------------------
Note: If chi2<0, the estimated model does not
meet asymptotic assumptions of the test.
Example 9. Multinomial Logit in R
Multinomial logits can be used with the nnet package.
install.packages("nnet")
library(nnet)

m10 <- multinom(votechoice ~ harper + leftright + age + male, data = data)


summary(m10)

Call:
multinom(formula = votechoice ~ harper + leftright + age + male,
data = data)

Coefficients:
(Intercept) harper leftright age maleMale
Tories -5.8035817 0.069186831 0.48596775 -0.003074969 0.1365095
NDP 1.3150999 0.002542445 -0.19424343 -0.015891507 -0.1485712
BQ 0.4514183 0.001519217 -0.04532072 -0.032082301 0.2116362
Greens 1.4727807 0.005145085 -0.22967514 -0.046365245 0.2165980

Std. Errors:
(Intercept) harper leftright age maleMale
Tories 0.6883018 0.004953379 0.07213321 0.007680753 0.2210648
NDP 0.4518466 0.003892178 0.05559792 0.006537470 0.1906751
BQ 0.6231527 0.005509566 0.07768518 0.009155438 0.2705429
Greens 0.6837026 0.006601243 0.09366259 0.010719374 0.3188646

Residual Deviance: 1970.903


AIC: 2010.903

#choose baseline
data$vote2 <- relevel(data$votechoice, ref = "Tories")
m10.2 <- multinom(vote2 ~ harper + leftright + age + male, data = data)
summary(m10.2)
Thank You!

Any questions?

Don’t hesitate to write if you do.

jean-philippe.gauvin@umontreal.ca

You might also like