Professional Documents
Culture Documents
Logistic Regression
Logistic Regression
• The following table shows the relationship between race and sentence:
Odds for Independent Variable Groups
• We can compute the odds of receiving a death penalty for each of the
groups:
Logistic Regression
Describing Relationships
Classification Accuracy
Sample Problems
Logistic regression
• Logistic regression is used to analyze relationships between a
dichotomous dependent variable and metric or dichotomous
independent variables. (SPSS now supports Multinomial Logistic
Regression that can be used with more than two groups, but our
focus here is on binary logistic regression for two groups.)
• Logistic regression combines the independent variables to estimate
the probability that a particular event will occur, i.e. a subject will
be a member of one of the groups defined by the dichotomous
dependent variable. In SPSS, the model is always constructed to
predict the group with higher numeric code. If responses are
coded 1 for Yes and 2 for No, SPSS will predict membership in the
No category. If responses are coded 1 for No and 2 for Yes, SPSS will
predict membership in the Yes category. We will refer to the
predicted event for a particular analysis as the modeled event.
• This will create some awkward wording in our problems. Our only
option for changing this is to recode the variable.
What logistic regression predicts
• The variate or value produced by logistic regression is a probability
value between 0.0 and 1.0.
• For any given case, logistic regression computes the probability that a
case with a particular set of values for the independent variable is a
member of the modeled category.
• SW388R7
Level of measurement requirements
• Logistic regression analysis requires that the
dependent variable be dichotomous.
Classification Tablea
Predicted
EXPECT U.S. IN
WORLD WAR IN 10
YEARS Percentage
Observed YES NO Correct
Step 1 EXPECT U.S. IN WORLD YES 20 34 37.0
WAR IN 10 YEARS NO 10 72 87.8
Overall Percentage 67.6
a. The cut value is .500
Classification Tablea,b
Predicted
EXPECT U.S. IN
WORLD WAR IN 10
YEARS Percentage
Observed YES NO Correct
Step 0 EXPECT U.S. IN WORLD YES 0 54 .0
WAR IN 10 YEARS NO 0 82 100.0
Overall Percentage 60.3
a. Constant is included in the model.
b. The cut value is .500
The variables "age" [age], "sex" [sex], and "liberal or conservative political views" [polviews] were useful
predictors for distinguishing between groups based on responses to "seen x-rated movie in last year"
[xmovie]. These predictors differentiate survey respondents who have not seen an x-rated movie from
survey respondents who have seen an x-rated movie.
Survey respondents who were older were more likely to have not seen an x-rated movie. A one unit
increase in age increased the odds that survey respondents have not seen an x-rated movie by 3.9%.
Survey respondents who were female were approximately six and three quarters times more likely to have
not seen an x-rated movie. Survey respondents who were more conservative were more likely to have not
seen an x-rated movie. A one unit increase in liberal or conservative political views increased the odds that
survey respondents have not seen an x-rated movie by approximately one and a quarter times.
1. True
2. True with caution
3. False
4. Inappropriate application of a statistic
Dissecting problem 1 - 1
In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a statistic?
Assume that there is no problem with missing data, outliers, or influential cases, and that the validation
analysis will confirm the generalizability of the results. Use a level of significance of 0.05 for evaluating
the statistical relationship.
The variables "age" [age], "sex" [sex], and "liberal or conservative political views" [polviews] were useful
For these problems, we will
predictors for distinguishing between groups based on responses to "seen x-rated movie in last year"
assume that there is no problem
[xmovie]. These predictors differentiate survey respondents
with missing data,who have not
outliers, or seen an x-rated movie from
survey respondents who have seen an x-rated movie.cases, and that the
influential
validation analysis will confirm
Survey respondents who were older werethe generalizability of the
more likely to have not seen an x-rated movie. A one unit
results
increase in age increased the odds that survey respondents have not seen an x-rated movie by 3.9%.
Survey respondents who were female were Inapproximately
this problem, six
weand
arethree quarters times more likely to have
told to
not seen an x-rated movie. Survey respondents whoas
use 0.05 were more
alpha forconservative
the logistic were more likely to have not
seen an x-rated movie. A one unit increaseregression.
in liberal or conservative political views increased the odds that
survey respondents have not seen an x-rated movie by approximately one and a quarter times.
1. True
2. True with caution
3. False
4. Inappropriate application of a statistic
SW388R7
Data Analysis & Computers II
Dissecting problem 1 - 2
In theThe variables
dataset listed first
GSS2000.sav, in the
is the problem
following statement true, false, or an incorrect application of a statistic?
statement are the independent variables
Assume that there is no problem with missing data, outliers, or influential cases, and that the validation
(IVs): "age" [age], "sex" [sex], and "liberal
analysis will confirm the generalizability of the results. Use a level of significance of 0.05 for evaluating the
or conservative political views" [polviews].
statistical relationship.
The variables "age" [age], "sex" [sex], and "liberal or conservative political views" [polviews] were
useful predictors for distinguishing between groups based on responses to "seen x-rated movie in last
year" [xmovie]. These predictors differentiate survey respondents who have not seen an x-rated movie
from survey respondents who have seen an x-rated movie.
Survey respondents
The variable usedwho were older were more likely to have not seen an x-rated movie. A one unit
to define
increase in age increased the odds that survey respondents have not seen an x-rated movie by 3.9%.
groups is the dependent
variable
Survey (DV): "seen
respondents whox-rated
were female were approximately six and three quarters times more likely to have
movie in last year" [xmovie].
not seen an x-rated movie. Survey respondents who were more conservative were more likely to have not
When a problem states that a list of
seen an x-rated movie. A one unit increase in liberal or conservative political views increased the odds that
independent variables can distinguish
survey respondents have not seen an x-rated movie by approximately
among groupsoneandanddoes
a quarter times.
not identify
control variable or an order of
importance for the variables, we do a
logistic regression entering all of the
variables simultaneously.
SW388R7
Data Analysis & Computers II
Dissecting problem 1 - 3
SPSS logistic regression models the relationship by computing
the changes in the likelihood of falling in the category of the
dependent variable which had the highest numerical code.
The variables "age" [age], "sex" [sex], and "liberal or conservative political views" [polviews] were useful
predictors for distinguishing between groups based on responses to "seen x-rated movie in last year"
[xmovie]. These predictors differentiate survey respondents who have not seen an x-rated movie from
survey respondents who have seen an x-rated movie.
Survey respondents who were older were more likely to have not seen an x-rated movie. A one unit
increase in age increased the odds that survey respondents have not seen an x-rated movie by 3.9%.
Survey respondents who were female were approximately six and three quarters times more likely to
have not seen an x-rated movie. Survey respondents who were more conservative were more likely to
The statements of the specific relationships
have not seen an x-rated movie. A one unit increase in liberal or conservative political views increased
between independent variables and the
the odds that survey respondents have not
dependent seen anare
variable x-rated movie by
all phrased approximately one and a quarter
in terms
times. of impact on not seeing an x-rated movie. SW388R7
Data Analysis & Computers II
Dissecting problem 1 - 4
In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a statistic?
Assume that there is no problem with missing data, outliers, or influential cases, and that the validation
analysis will confirm the generalizability
Theof the results.
specific Use a level
relationships forofthe
significance of 0.05 for evaluating the
independent
statistical relationship. variables listed in the problem indicate the direction
of the relationship, increasing or decreasing the
The variables "age" [age], "sex" [sex],likelihood of or
and "liberal falling in the modeled
conservative politicalgroup,
views" and the were useful
[polviews]
predictors for distinguishing betweenamount of change
groups based in the odds
on responses associated
to "seen x-ratedwith a in last year"
movie
one-unit change in the independent variable.
[xmovie]. These predictors differentiate survey respondents who have not seen an x-rated movie from
survey respondents who have seen an x-rated movie.
Survey respondents who were older were more likely to have not seen an x-rated movie. A one unit
increase in age increased the odds that survey respondents have not seen an x-rated movie by 3.9%.
Survey respondents who were female were approximately six and three quarters times more likely to
have not seen an x-rated movie. Survey respondents who were more conservative were more likely to
have not seen an x-rated movie. A one unit increase in liberal or conservative political views increased
the odds that survey respondents have not seen an x-rated movie by approximately one and a quarter
times.
The variables "age" [age], "sex" [sex], and "liberal or conservative political views" [polviews] were useful
predictors for distinguishing between groups based on responses to "seen x-rated movie in last year"
[xmovie]. These predictors differentiate survey respondents who have not seen an x-rated movie from
survey respondents who have seen an x-rated movie.
Survey respondents who were older were more likely to have not seen an x-rated movie. A one unit
Logistic
increase in age increased regression
the odds requires
that survey that the have
respondents dependent
not seen an x-rated movie by 3.9%.
variable be non-metric and the independent
Survey respondents who were female were approximately six and three quarters times more likely to have
variables
not seen an x-rated movie. be respondents
Survey metric or dichotomous. "seen
who were more x-
conservative were more likely to have not
rated movie in last year" [xmovie] is an
seen an x-rated movie. A one unit increase in liberal or conservative political views increased the odds that
dichotomous variable, which satisfies the level of
survey respondents have not seen an x-rated movie by approximately one and a quarter times.
measurement requirement.
SW388R7
Data Analysis & Computers II
LEVEL OF MEASUREMENT - 2
"Age" [age] is an interval level "Sex" [sex] is a dichotomous
variable, which satisfies the level
In the dataset GSS2000.sav, is the following statement true, false, or an incorrectnominal
or dummy-coded application of a statistic?
of Assume that there
measurement is no problem
requirements forwith missing data, outliers, or influential
variable whichcases,
may andbethat the validation
analysis
logistic will confirm
regression the generalizability of the results. Use a level
analysis. of significance
included in logistic of 0.05 for evaluating the
regression.
statistical relationship.
The variables "age" [age], "sex" [sex], and "liberal or conservative political views" [polviews] were useful
predictors for distinguishing between groups based on responses to "seen x-rated movie in last year"
[xmovie]. These predictors differentiate survey respondents who have not seen an x-rated movie from
survey respondents who have seen an x-rated movie.
Survey respondents who were older were more likely to have not seen an x-rated movie. A one unit
"Liberal or
increase in age increased the odds that survey respondents haveconservative
not seen anpolitical views"by 3.9%.
x-rated movie
[polviews]
Survey respondents who were female were approximately is an
six and ordinal
three level times
quarters variable.
moreIflikely to have
not seen an x-rated movie. Survey respondents who we were more
follow theconservative
conventionwere more likely to have not
of treating
seen an x-rated movie. A one unit increase in liberalordinal
or conservative political
level variables asviews
metricincreased the odds that
survey respondents have not seen an x-rated movievariables,
by approximately
the levelone and a quarter times.
of measurement
requirement for logistic regression
1. True analysis is satisfied. Since some data
2. True with caution analysts do not agree with this
convention, a note of caution should be
3. False
included in our interpretation.
4. Inappropriate application of a statistic
SW388R7
Data Analysis & Computers II
Request simultaneous logistic regression
SW388R7
Data Analysis & Computers II
Selecting the dependent variable
SW388R7
Data Analysis & Computers II
Selecting the independent variables
SW388R7
Data Analysis & Computers II
Specifying the method for including variables
Click on the OK
button to request
the output for the
logistic regression.
SW388R7
Data Analysis & Computers II
OVERALL RELATIONSHIP BETWEEN
INDEPENDENT AND DEPENDENT VARIABLES
Omnibus Tests of Model Coefficients
Chi-square df Sig.
Step 1 Step 39.668 3 .000
Block 39.668 3 .000
Model 39.668 3 .000
SW388R7
Data Analysis & Computers II
NUMERICAL PROBLEMS
Variables in the Equation
SW388R7
Data Analysis & Computers II
RELATIONSHIP OF INDIVIDUAL INDEPENDENT
VARIABLES TO DEPENDENT VARIABLE - 3
The probability of the Wald statistic for the variable liberal or
conservative political views was 0.024, less than or equal to the
level of significance of 0.05. The null hypothesis that the b
coefficient for liberal or conservative political views was equal to
zero was rejected. This supports the relationship that "survey
respondents who were more conservative were more likely to have
not seen an x-rated movie." Liberal or conservative political views is
an ordinal variable that is coded so that higher numeric values are
associated with survey respondents who were more conservative.
Classification Tablea,b
Predicted
SEEN X-RATED MOVIE
IN LAST YEAR Percentage
Observed YES NO Correct
Step 0 SEEN X-RATED MOVIE YES 0 45 .0
IN LAST YEAR NO 0 132 100.0
Overall Percentage 74.6
a. Constant is included in the model.
The proportional by chance accuracy rate was computed by first
b. The cut value is .500
calculating the proportion of cases for each group based on the number
of cases in each group in the classification table at Step 0. The
proportion in the "YES" group is 45/177 = 0.254. The proportion in the
"No" group is 132/177 = 0.746.
Then, we square and sum the proportion of cases in each group (0.254² SW388R7
Data Analysis
+ 0.746² = 0.621). 0.621 is the proportional by chance accuracy rate. & Computers II
CLASSIFICATION USING THE LOGISTIC REGRESSION MODEL:
criteria for classification accuracy
Classification Tablea
Predicted
SEEN X-RATED MOVIE
IN LAST YEAR Percentage
Observed YES NO Correct
Step 1 SEEN X-RATED MOVIE YES 19 26 42.2
IN LAST YEAR NO 9 123 93.2
Overall Percentage 80.2
a. The cut value is .500
SW388R7
Data Analysis & Computers II
Answering the question in problem 1 - 1
In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a statistic?
Assume that there is no problem with missing data, outliers, or influential cases, and that the validation
analysis will confirm the generalizability of the results. Use a level of significance of 0.05 for evaluating the
statistical relationship.
The variables "age" [age], "sex" [sex], and "liberal or conservative political views" [polviews] were
useful predictors for distinguishing between groups based on responses to "seen x-rated movie in last
year" [xmovie]. These predictors differentiate survey respondents who have not seen an x-rated movie
from survey respondents who have seen an x-rated movie.
Survey respondents who were older were more likely to have not seen an x-rated movie. A one unit
increase in age increased the odds thatWe
survey
found respondents havesignificant
a statistically not seen an x-rated movie by 3.9%.
overall
Survey respondents who were female relationship
were approximately
between sixthe
andcombination
three quarters
of times more likely to have
not seen an x-rated movie. Survey respondents
independentwhovariables
were more conservative
and were more likely to have not
the dependent
variable.
seen an x-rated movie. A one unit increase in liberal or conservative political views increased the odds that
survey respondents have not seen an x-rated movie by approximately one and a quarter times.
There was no evidence of numerical problems in
the solution.
Survey respondents who were older were more likely to have not seen an x-rated movie. A one unit
increase in age increased the odds that survey respondents have not seen an x-rated movie by 3.9%.
Survey respondents who were female were approximately six and three quarters times more likely to
have not seen an x-rated movie. Survey respondents who were more conservative were more likely to
have not seen an x-rated movie. A one unit increase in liberal or conservative political views increased
the odds that survey respondents have not seen an x-rated movie by approximately one and a quarter
times.
1. True
The answer to the question is true
2. True with caution with caution.
3. False
4. Inappropriate application of a statistic A caution is added because of the
inclusion of ordinal level variables.
SW388R7
Data Analysis & Computers II
Problem 2
In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a statistic?
Assume that there is no problem with missing data, outliers, or influential cases, and that the validation
analysis will confirm the generalizability of the results. Use a level of significance of 0.05 for evaluating the
statistical relationship.
After controlling for the effect of the variable "sex" [sex] on "should marijuana be made legal" [grass], the
variable "general happiness" [happy] and "confidence in the executive branch of the federal government"
[confed] were useful predictors for distinguishing between groups based on responses to "should
marijuana be made legal" [grass]. These predictors differentiate survey respondents who have been less
supportive that the use of marijuana should be made legal from survey respondents who have been more
supportive that the use of marijuana should be made legal.
Survey respondents who were less happy overall were less likely to have been less supportive that the use
of marijuana should be made legal. A one unit increase in general happiness decreased the odds that
survey respondents have been less supportive that the use of marijuana should be made legal by 66.9%.
Survey respondents who had less confidence in the executive branch of the federal government were less
likely to have been less supportive that the use of marijuana should be made legal. A one unit increase in
confidence in the executive branch of the federal government decreased the odds that survey respondents
have been less supportive that the use of marijuana should be made legal by 42.8%.
1. True
2. True with caution
3. False
4. Inappropriate application of a statistic
SW388R7
Data Analysis & Computers II
Dissecting problem 2 - 1
In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a statistic?
Assume that there is no problem with missing data, outliers, or influential cases, and that the validation
analysis will confirm the generalizability of the results. Use a level of significance of 0.05 for evaluating
the statistical relationship.
After controlling for the effect of the variable "sex" [sex] on "should marijuana be made legal" [grass], the
variable "general happiness" [happy] and For "confidence in the executive
these problems, we willbranch of the federal government"
[confed] were useful predictors for distinguishing between groups based on responses to "should
assume that there is no problem
marijuana be made legal" [grass]. These predictors differentiate survey respondents who have been less
with missing data, outliers, or
supportive that the use of marijuana should be made legal from survey respondents who have been more
supportive that the use of marijuana should influential
be madecases,
legal. and that the
validation analysis will confirm
the generalizability of the
Survey respondents who were less happy results
overall were less likely to have been less supportive that the use
of marijuana should be made legal. A one unit increase in general happiness decreased the odds that
survey respondents have been less supportive that the use of marijuana should be made legal by 66.9%.
Survey respondents who had less confidence In this problem,
in the webranch
executive are told to federal government were less
of the
use 0.05 as alpha for the logistic
likely to have been less supportive that the use of marijuana should be made legal. A one unit increase in
confidence in the executive branch of the regression.
federal government decreased the odds that survey respondents
have been less supportive that the use of marijuana should be made legal by 42.8%.
1. True
2. True with caution
3. False
4. Inappropriate application of a statistic
SW388R7
Data Analysis & Computers II
Dissecting problem 2 - 2
The variables listed first in the problem statement are
the independent variables (IVs): "sex" [sex] , "general
happiness" [happy], and "confidence in the executive
branch of the federal government" [confed].
After controlling for the effect of the variable "sex" [sex] on "should marijuana be made legal" [grass],
the variable "general happiness" [happy] and "confidence in the executive branch of the federal
government" [confed] were useful predictors for distinguishing between groups based on responses to
"should marijuana be made legal" [grass]. These predictors differentiate survey respondents who have
been less supportive that the use of marijuana should be made legal from survey respondents who have
been more supportive that the use of marijuana should be made legal.
The variable used to define groups
Survey respondents who were less happy overall were less likely to have been less supportive that the use
is the dependent variable (DV):
of marijuana should be made legal. A one unit increase in general happiness decreased the odds that
"should marijuana be
survey respondents havemade
beenlegal"
less supportive that the use of marijuana should be made legal by 66.9%.
[grass].
Survey respondents who had less confidence in the executive branch of the federal government were less
When
likely to have been less supportive that the use of marijuana a problem
should be madeidentifies control
legal. A one unit increase in
confidence in the executive branch of the federal government decreased the odds that survey respondents
variables, we do a hierarchical
have been less supportive that the use of marijuana should be made
logistic legal by 42.8%.
regression entering the
variables in SPSS blocks.
SW388R7
Data Analysis & Computers II
Dissecting problem 2 - 3
SPSS logistic regression models the relationship by computing
the changes in the likelihood of falling in the category of the
dependent variable which had the highest numerical code.
Survey respondents who were less happy overall were less likely to have been less supportive that the use
of marijuana should be made legal. A one unit increase in general happiness decreased the odds that
survey respondents have been less supportive that the use of marijuana should be made legal by 66.9%.
Survey respondents who had less confidence in the executive branch of the federal government were less
likely to have been less supportive that the use of marijuana should be made legal. A one unit increase in
confidence in the executive
The branch of theoffederal
statements government
the specific decreased
relationships the odds that survey respondents
between
have been less supportive that the use of marijuana should be made legal
independent variables and the dependent variable by 42.8%.
are all
phrased in terms of impact on being less supportive of SW388R7
legalizing marijuana. Data Analysis & Computers II
Dissecting problem 2 - 4
In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a statistic?
Assume that there is no problem with missing data, outliers, or influential cases, and that the validation
analysis will confirm the generalizability of the results. Use a level of significance of 0.05 for evaluating the
statistical relationship.
The specific relationships for the independent
After controlling for the effect of thevariables
variable "sex" [sex]
listed on "should
in the problemmarijuana be made
indicate the legal" [grass], the
direction
variable "general happiness" [happy] and "confidence in the executive branch of the federal government"
of the relationship, increasing or decreasing the
[confed] were useful predictors for distinguishing
likelihood of between
falling ingroups based on
the modeled responses
group, to "should
and the
marijuana be made legal" [grass]. Theseamount of change in the odds associated with a have been less
predictors differentiate survey respondents who
supportive that the use of marijuanaone-unit
should bechange
made legal from
in the survey respondents
independent variable. who have been more
supportive that the use of marijuana should be made legal.
Survey respondents who were less happy overall were less likely to have been less supportive that the
use of marijuana should be made legal. A one unit increase in general happiness decreased the odds
that survey respondents have been less supportive that the use of marijuana should be made legal by
66.9%. Survey respondents who had less confidence in the executive branch of the federal government
were less likely to have been less supportive that the use of marijuana should be made legal. A one unit
increase in confidence in the executive branch of the federal government decreased the odds that
survey respondents have been less supportive that the use of marijuana should be made legal by 42.8%.
1. True
In order for the logistic regression question to be true, the
2. True with caution
relationship between the predictors and the dependent variable
3. False must be statistically significant after entering the control variables
4. Inappropriate application of a statistic
in a previous stage, there must be no evidence of a flawed
numerical analysis, the classification accuracy rate must be
substantially better than could be obtained by chance alone, and
each significant relationship must be interpreted correctly. SW388R7
Data Analysis & Computers II
LEVEL OF MEASUREMENT - 1
In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a statistic?
Assume that there is no problem with missing data, outliers, or influential cases, and that the validation
analysis will confirm the generalizability of the results. Use a level of significance of 0.05 for evaluating the
statistical relationship.
After controlling for the effect of the variable "sex" [sex] on "should marijuana be made legal" [grass], the
variable "general happiness" [happy] and "confidence in the executive branch of the federal government"
[confed] were useful predictors for distinguishing between groups based on responses to "should
marijuana be made legal" [grass]. These predictors differentiate survey respondents who have been less
supportive that the use of marijuana should be made legal from survey respondents who have been
more supportive that the use of marijuana should be made legal.
Survey respondents who were less happy overall were less likely to have been less supportive that the use
of marijuana should be made legal. A one unit increase in general happiness decreased the odds that
survey respondents have regression
Logistic been less supportive that the use
analysis requires thatofthe
marijuana
dependentshould be made legal by 66.9%.
Survey respondents who had
variable less confidence
be dichotomous in the executive
and independentbranch of the federal government were less
variables
likely to have been
be less supportive
metric that the use
or dichotomous. of marijuana
"Should should
marijuana bebe made legal. A one unit increase in
made
confidence in thelegal"
executive branch of the federal government decreased
[grass] is a dichotomous variable, which satisfies the odds that survey respondents
have been less supportive that
the level of the use of marijuana
measurement shouldfor
requirement bethe
made legal by 42.8%.
dependent
variable.
1. True
2. It contains two categories:
True with caution
3. False •survey respondents who have been less supportive that
4. Inappropriatethe use of marijuana
application should be made legal
of a statistic
•survey respondents who have been more supportive that
the use of marijuana should be made legal
SW388R7
Data Analysis & Computers II
LEVEL OF MEASUREMENT - 2
"Sex" [sex] is a dichotomous or
In the datasetdummy-coded
GSS2000.sav, isnominal variable which
the following statement true, false, or an incorrect application of a statistic?
may be included in logistic
Assume that there is no problem with missing regression
data, outliers, or influential cases, and that the validation
analysis.
analysis will confirm the generalizability of the results. Use a level of significance of 0.05 for evaluating the
statistical relationship.
After controlling for the effect of the variable "sex" [sex] on "should marijuana be made legal" [grass],
the variable "general happiness" [happy] and "confidence in the executive branch of the federal
government" [confed] were useful predictors for distinguishing between groups based on responses to
"should marijuana be made legal" [grass]. These predictors differentiate survey respondents who have
been less supportive that the use of marijuana should be made legal from survey respondents who have
been more supportive that the use of marijuana should be made legal.
SW388R7
Data Analysis & Computers II
Request hierarchical logistic regression
SW388R7
Data Analysis & Computers II
Selecting the dependent variable
First, highlight the
dependent variable
grass in the list of
variables.
SW388R7
Data Analysis & Computers II
Selecting the control independent variables
SW388R7
Data Analysis & Computers II
Adding the predictor independent variables
SW388R7
Data Analysis & Computers II
Specifying the method for including variables
In our hierarchical
regression, we will specify
that all of the variables in
each block be entered
simultaneously when the
block is entered.
SW388R7
Data Analysis & Computers II
Completing the logistic regression request
Click on the OK
button to request
the output for the
logistic regression.
SW388R7
Data Analysis & Computers II
OVERALL RELATIONSHIP BETWEEN
INDEPENDENT AND DEPENDENT VARIABLES
The probability of the Wald statistic for the variable confidence in the
executive branch of the federal government was 0.038, less than or
equal to the level of significance of 0.05. The null hypothesis that the
b coefficient for confidence in the executive branch of the federal
government was equal to zero was rejected. This supports the
relationship that "survey respondents who had less confidence in the
executive branch of the federal government were less likely to have
been less supportive that the use of marijuana should be made legal."
Confidence in the executive branch of the federal government is an
ordinal variable that is coded so that lower numeric values are
associated with survey respondents who had more confidence in the
executive branch of the federal government.
Variables in the Equation
Classification Tablea,b
Predicted
SHOULD MARIJUANA BE
MADE LEGAL Percentage
Observed LEGAL NOT LEGAL Correct
Step 0 SHOULD MARIJUANA LEGAL 0 57 .0
BE MADE LEGAL NOT LEGAL 0 106 100.0
Overall Percentage 65.0
a. Constant is included in the model.
b. The cut value
The is .500 by chance accuracy rate was computed by
proportional
calculating the proportion of cases for each group based on
the number of cases in each group in the classification table
at Step 0, and then squaring and summing the proportion of
cases in each group (0.350² + 0.650² = 0.545). SW388R7
Data Analysis & Computers II
CLASSIFICATION USING THE LOGISTIC REGRESSION MODEL:
criteria for classification accuracy
Classification Tablea
Predicted
SHOULD MARIJUANA BE
MADE LEGAL Percentage
Observed LEGAL NOT LEGAL Correct
Step 1 SHOULD MARIJUANA LEGAL 18 39 31.6
BE MADE LEGAL NOT LEGAL 13 93 87.7
Overall Percentage 68.1
a. The cut value is .500
SW388R7
Data Analysis & Computers II
Answering the question in problem 2 - 1
In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a statistic?
Assume that there is no problem with missing data, outliers, or influential cases, and that the validation
analysis will confirm the generalizability of the results. Use a level of significance of 0.05 for evaluating the
statistical relationship.
After controlling for the effect of the variable "sex" [sex] on "should marijuana be made legal" [grass], the
variable "general happiness" [happy] and "confidence in the executive branch of the federal government"
[confed] were useful predictors for distinguishing between groups based on responses to "should
marijuana be made legal" [grass]. These predictors differentiate survey respondents who have been less
supportive that the use of marijuana should be made legal from survey respondents who have been more
supportive that the use of marijuana should be made legal.
Survey respondents who were less happy overall were less likely to have been less supportive that the use
of marijuana should be made legal. A one unit increase in general happiness decreased the odds that
survey respondents have been less supportive that the use of marijuana should be made legal by 66.9%.
Survey respondents who had less confidence
We foundin the executive branch
a statistically of theoverall
significant federal government were less
likely to have been less supportive that the use of marijuana should be made legal. A one unit increase in
relationship between the predictor independent
confidence in the executive branch of variables
the federaland
government decreased
the dependent the odds that survey respondents
variable.
have been less supportive that the use of marijuana should be made legal by 42.8%.
There was no evidence of numerical problems in
1. True the solution.
2. True with caution
Moreover, the classification accuracy surpassed
3. False
the proportional by chance accuracy criteria,
4. Inappropriate application of a statistic
supporting the utility of the model.
SW388R7
Data Analysis & Computers II
Answering the question in problem 2 - 2
In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a statistic?
Assume that there is no problem with missing data, outliers, or influential cases, and that the validation
analysis will confirm the generalizability of the results. Use a level of significance of 0.05 for evaluating the
statistical relationship.
We verified that each statement about the
relationship between an independent variable and
After controlling for the effect ofthe
thedependent
variable "sex" [sex] on "should marijuana be made legal" [grass], the
variable was correct in both
variable "general happiness" [happy] and "confidence in the executive branch of the federal government"
direction of the relationship and the change in
[confed] were useful predictors for distinguishing between groups based on responses to "should
marijuana be made legal" [grass]. likelihood associated
These predictors with a one-unit
differentiate change of the
survey respondents who have been less
independent variable.
supportive that the use of marijuana should be made legal from survey respondents who have been more
supportive that the use of marijuana should be made legal.
Survey respondents who were less happy overall were less likely to have been less supportive that the use
of marijuana should be made legal. A one unit increase in general happiness decreased the odds that
survey respondents have been less supportive that the use of marijuana should be made legal by 66.9%.
Survey respondents who had less confidence in the executive branch of the federal government were less
likely to have been less supportive that the use of marijuana should be made legal. A one unit increase in
confidence in the executive branch of the federal government decreased the odds that survey respondents
have been less supportive that the use of marijuana should be made legal by 42.8%.
1. True
2. True with caution The answer to the question is true
3. False with caution.
4. Inappropriate application of a statistic
A caution is added because of the
inclusion of ordinal level variables.
SW388R7
Data Analysis & Computers II
Problem 3
In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a statistic?
Assume that there is no problem with missing data, outliers, or influential cases, and that the validation
analysis will confirm the generalizability of the results. Use a level of significance of 0.05 for evaluating the
statistical relationship.
From the list of variables "highest academic degree" [degree], "total family income" [income98], and
"satisfaction with financial situation" [satfin], the most useful predictor for distinguishing between groups
based on responses to "expect u.s. in world war in 10 years" [uswary] was "total family income"
[income98]. These predictors differentiate survey respondents who have been less positive that the United
States would fight in another world war within the next ten years from survey respondents who have been
more positive that the United States would fight in another world war within the next ten years.
The most important predictor for identifying survey respondents who have been less positive that the
United States would fight in another world war within the next ten years was total family income.
Survey respondents who had higher total family incomes were more likely to have been less positive that
the United States would fight in another world war within the next ten years. A one unit increase in total
family income increased the odds that survey respondents have been less positive that the United States
would fight in another world war within the next ten years by 10.0%.
1. True
2. True with caution
3. False
4. Inappropriate application of a statistic
SW388R7
Data Analysis & Computers II
Dissecting Problem 3 - 1
In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a statistic?
Assume that there is no problem with missing data, outliers, or influential cases, and that the validation
analysis will confirm the generalizability of the results. Use a level of significance of 0.05 for evaluating
the statistical relationship.
From the list of variables "highest academic degree" [degree], "total family income" [income98], and
"satisfaction with financial situation" [satfin],
For the most
these useful predictor
problems, we will for distinguishing between groups
based on responses to "expect u.s. in world war in 10 years" [uswary] was "total family income"
assume that there is no problem
[income98]. These predictors differentiate survey respondents who have been less positive that the United
with missing data, outliers, or
States would fight in another world war within the next ten years from survey respondents who have been
more positive that the United States would influential cases, world
fight in another and that
warthe
within the next ten years.
validation analysis will confirm
the generalizability of the
The most important predictor for identifying survey respondents who have been less positive that the
results
United States would fight in another world war within the next ten years was total family income.
In this problem, we are told to
Survey respondents who had higher total use
family incomes
0.05 wereformore
as alpha the likely to have been less positive that
logistic
the United States would fight in another world war within
regression. the next ten years. A one unit increase in total
family income increased the odds that survey respondents have been less positive that the United States
would fight in another world war within the next ten years by 10.0%.
1. True
2. True with caution
3. False
4. Inappropriate application of a statistic
SW388R7
Data Analysis & Computers II
Dissecting Problem 3 - 2
The variables listed first in the The variable used to
problem statement are the define groups is the
independent variables (IVs): "highest dependent variable (DV):
academic degree" [degree], "total "expect u.s. in world war
family income" [income98], and inincorrect
10 years" [uswary].of a statistic?
In the dataset GSS2000.sav, is the following statement true, false, or an application
"satisfaction with financial situation"
Assume that there is no problem with missing data, outliers, or influential cases, and that the validation
analysis[satfin].
will confirm the generalizability of the results. Use a level of significance of 0.05 for evaluating the
statistical relationship.
From the list of variables "highest academic degree" [degree], "total family income" [income98], and
"satisfaction with financial situation" [satfin], the most useful predictor for distinguishing between
groups based on responses to "expect u.s. in world war in 10 years" [uswary] was "total family income"
[income98]. These predictors differentiate survey respondents who have been less positive that the
United States would fight in another world war within the next ten years from survey respondents who
have been more positive that the United States would fight in another world war within the next ten years.
The most important predictor for identifying survey respondents who have been less positive that the
United States would fight in another world war within the next ten years was total family income.
Survey respondents who had higher total family incomes were more likely to have been less positive that
the United States would fight in another world war within the nextSince
ten years. A one unit
the problem increase in total
identifies
family income increased the odds that survey respondents have beenthe less
mostpositive
usefulthat
of the United States
would fight in another world war within the next ten years by 10.0%.
important predictor, we do
a stepwise logistic
regression. SW388R7
Data Analysis & Computers II
Dissecting Problem 3 - 3
In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a statistic?
Assume that there is no problem with missing data, outliers, or influential cases, and that the validation
analysis will confirm the generalizability of the results. Use a level of significance of 0.05 for evaluating the
statistical relationship.
From the list of variables "highest academic degree" [degree], "total family income" [income98], and
"satisfaction with financial situation" [satfin], the most useful predictor for distinguishing between groups
based on responses to "expect u.s. in world war in 10 years" [uswary] was "total family income"
[income98]. These predictors differentiate survey respondents who have been less positive that the
United States would fight in another world war within the next ten years from survey respondents who
have been more positive that the United States would fight in another world war within the next ten
years.
The most important predictor for identifying survey respondents who have been less positive that the
United States would fight in another world war within the next ten years was total family income.
The SPSS output will model the changes in the likelihood of being
less positive that the United States would fight in another world
war within the next ten years.
SW388R7
Data Analysis & Computers II
Dissecting Problem 3 - 4
In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a statistic?
Assume that there is no problem with missing data, outliers, or influential cases, and that the validation
analysis will confirm the generalizability of the results. Use a level of significance of 0.05 for evaluating the
statistical relationship. The statements of the specific
relationships between independent
variables and the dependent variable are
From the list of variables "highest academic degree" [degree], "total family income" [income98], and
allmost
"satisfaction with financial situation" [satfin], the phrased in predictor
useful terms of for
impact on being between groups
distinguishing
less positive that the United States
based on responses to "expect u.s. in world war in 10 years" [uswary] was "total family would
income"
fight in another world war within the next
[income98]. These predictors differentiate survey respondents who have been less positive that the United
States would fight in another world war within the ten next
years.
ten years from survey respondents who have been
more positive that the United States would fight in another world war within the next ten years.
The most important predictor for identifying survey respondents who have been less positive that the
United States would fight in another world war within the next ten years was total family income.
Survey respondents who had higher total family incomes were more likely to have been less positive
that the United States would fight in another world war within the next ten years. A one unit increase in
total family income increased the odds that survey respondents have been less positive that the United
States would fight in another world war within the next ten years by 10.0%.
SW388R7
Data Analysis & Computers II
Dissecting Problem 3 - 5
From the list of variables "highest academic degree" [degree], "total family income" [income98], and
"satisfaction with financial situation" [satfin], the most useful predictor for distinguishing between groups
based on responses to "expect u.s. in world war in 10 years" [uswary] was "total family income"
[income98]. These predictors differentiate survey respondents who have been less positive that the United
States would fight in another world warThewithin
specifictherelationships for
next ten years the survey
from independent
respondents who have been
variables listed in the problem indicate
more positive that the United States would fight in another world war within the thenext
direction
ten years.
of the relationship, increasing or decreasing the
likelihood of falling in the modeled group, and the
The most important predictor for identifying survey respondents who have been less positive that the
United States would fight in another amount of within
world war change associated
the with was
next ten years a one-unit
total family income.
change in the independent variable.
Survey respondents who had higher total family incomes were more likely to have been less positive
that the United States would fight in another world war within the next ten years. A one unit increase in
total family income increased the odds that survey respondents have been less positive that the United
States would fight in another world war within the next ten years by 10.0%.
1. True
2. True with caution
3. False
4. Inappropriate application of a statistic
In order for the logistic regression question to be true, the
relationship between the predictors selected for inclusion and the
dependent variable must be statistically significant, there must be
no evidence of a flawed numerical analysis, the classification
accuracy rate must be substantially better than could be obtained
by chance alone, and the order of entry and each significant
relationship must be interpreted correctly. SW388R7
Data Analysis & Computers II
LEVEL OF MEASUREMENT - 1
In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a statistic?
Assume that there is no problem with missing data, outliers, or influential cases, and that the validation
analysis will confirm the generalizability of the results. Use a level of significance of 0.05 for evaluating the
statistical relationship.
From the list of variables "highest academic degree" [degree], "total family income" [income98], and
"satisfaction with financial situation" [satfin], the most useful predictor for distinguishing between groups
based on responses to "expect u.s. in world war in 10 years" [uswary] was "total family income"
[income98]. These predictors differentiate survey respondents who have been less positive that the United
States would fight in another world war within the next ten years from survey respondents who have been
more positive that the United States would fight in another world war within the next ten years.
The most important predictor for identifying survey respondents who have been less positive that the
United States would fight in another world war within the next ten years was total family income.
Logistic regression analysis requires that the dependent variable
Survey respondents who had higher total family incomes were more likely to have been less positive that
the United Statesbewould
dichotomous and the
fight in another independent
world war withinvariables
the next be
tenmetric orone unit increase in total
years. A
dichotomous.
family income increased "Expect
the odds u.s. inrespondents
that survey world war in 10 been
have years" [uswary]
less positive is a the United States
that
dichotomous variable, which satisfies the level
would fight in another world war within the next ten years by 10.0%. of measurement
requirement for the dependent variable.
1. True
2. True with caution
It contains two categories:
3. False survey respondents who have been less positive that the United
4. InappropriateStates
application
wouldoffight
a statistic
in another world war within the next ten years
survey respondents who have been more positive that the United
States would fight in another world war within the next ten years.
SW388R7
Data Analysis & Computers II
LEVEL OF MEASUREMENT - 2
"Highest academic degree" [degree], "total family
income" [income98], and "satisfaction with financial
situation" [satfin] are ordinal level variables. If we
follow the convention of treating ordinal level
variables as metric variables, the level of
measurement requirement for logistic regression
In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a statistic?
Assume that there is no problem withanalysis is satisfied.
missing Since
data, outliers, orsome datacases,
influential analysts
anddo not
that the validation
agree with this convention, a note of caution should
analysis will confirm the generalizability of the results. Use a level of significance of 0.05 for evaluating the
statistical relationship. be included in our interpretation.
From the list of variables "highest academic degree" [degree], "total family income" [income98], and
"satisfaction with financial situation" [satfin], the most useful predictor for distinguishing between groups
based on responses to "expect u.s. in world war in 10 years" [uswary] was "total family income"
[income98]. These predictors differentiate survey respondents who have been less positive that the United
States would fight in another world war within the next ten years from survey respondents who have been
more positive that the United States would fight in another world war within the next ten years.
The most important predictor for identifying survey respondents who have been less positive that the
United States would fight in another world war within the next ten years was total family income.
Survey respondents who had higher total family incomes were more likely to have been less positive that
the United States would fight in another world war within the next ten years. A one unit increase in total
family income increased the odds that survey respondents have been less positive that the United States
would fight in another world war within the next ten years by 10.0%.
SW388R7
Data Analysis & Computers II
Request stepwise logistic regression
SW388R7
Data Analysis & Computers II
Selecting the dependent variable
SW388R7
Data Analysis & Computers II
Adding the independent variables
SW388R7
Data Analysis & Computers II
Specifying the method for including variables
SW388R7
Data Analysis & Computers II
Adding options to the output
SW388R7
Data Analysis & Computers II
Specifications for stepwise method
Click on the
Continue button to
close the dialog box.
SW388R7
Data Analysis & Computers II
Completing the logistic regression request
Click on the OK
button to request
the output for the
logistic regression.
SW388R7
Data Analysis & Computers II
Sample size – ratio of cases to variables
Case Processing Summary
a
Unweighted Cases N Percent
Selected Cases Included in Analysis 136 50.4
Missing Cases 134 49.6
Total 270 100.0
Unselected Cases 0 .0
Total 270 100.0
a. If weight is in effect, see classification table for the total
number of cases.
SW388R7
Data Analysis & Computers II
NUMERICAL PROBLEMS
Variables in the Equation
The probability of the Wald statistic for the variable total family
income was 0.004, less than or equal to the level of significance
of 0.05. The null hypothesis that the b coefficient for total family
income was equal to zero was rejected. This supports the
relationship that "survey respondents who had higher total family
incomes were more likely to have been less positive that the
United States would fight in another world war within the next
ten years." Total family income is an ordinal variable that is
coded so that higher numeric values are associated with survey
respondents who had higher total family incomes.
Step Summarya,b
Classification Tablea,b
Predicted
EXPECT U.S. IN
WORLD WAR IN 10
YEARS Percentage
Observed YES NO Correct
Step 0 EXPECT U.S. IN WORLD YES 0 54 .0
WAR IN 10 YEARS NO 0 82 100.0
Overall Percentage 60.3
a. Constant is included in the model.
The
b. The cutproportional
value is .500 by chance accuracy rate was computed by
calculating the proportion of cases for each group based on
the number of cases in each group in the classification table
at Step 0, and then squaring and summing the proportion of SW388R7
cases in each group (0.397² + 0.603² = 0.521). Data Analysis & Computers II
CLASSIFICATION USING THE LOGISTIC REGRESSION MODEL:
criteria for classification accuracy
Classification Tablea
Predicted
EXPECT U.S. IN
WORLD WAR IN 10
YEARS Percentage
Observed YES NO Correct
Step 1 EXPECT U.S. IN WORLD YES 20 34 37.0
WAR IN 10 YEARS NO 10 72 87.8
Overall Percentage 67.6
a. The cut value is .500
SW388R7
Data Analysis & Computers II
Answering the question in problem 3 - 1
In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a statistic?
Assume that there is no problem with missing data, outliers, or influential cases, and that the validation
analysis will confirm the generalizability of the results. Use a level of significance of 0.05 for evaluating the
statistical relationship.
From the list of variables "highest academic degree" [degree], "total family income" [income98], and
"satisfaction with financial situation" [satfin], the most useful predictor for distinguishing between groups
based on responses to "expect u.s. in world war in 10 years" [uswary] was "total family income"
[income98]. These predictors differentiate survey respondents who have been less positive that the United
States would fight in another world war within the next ten years from survey respondents who have been
more positive that the United States would fight in another world war within the next ten years.
The most important predictor for identifying survey respondents who have been less positive that the
United States would fight in another world war within the next ten years was total family income.
Survey respondents who had higher total family incomes were more likely to have been less positive that
the United States would fight in another world war within the next ten years. A one unit increase in total
family income increased the odds that survey respondents have been less positive that the United States
would fight in another world war within the next ten years by 10.0%.
The answer to the question is true
with caution.
1. True
2. True with caution A caution is added to the findings
3. False because of the inclusion of ordinal
4. Inappropriate application of a statistic level independent variables. A
caution is added to the findings
because of the preferred sample SW388R7
size is not met. Data Analysis & Computers II
Steps in binary logistic regression:
level of measurement and initial sample size
Yes
Yes
SW388R7
Data Analysis & Computers II
Steps in logistic regression:
overall relationship and numerical problems
Hierarchical method of
entry used to include
independent variables?
No Yes
Standard errors of
Yes
coefficients indicate
False
presence of numerical
problems (s.e. > 2.0)?
SW388R7
No
Data Analysis & Computers II
Steps in logistic regression:
relationships between IV's and DV
No
Entry order of variables
interpreted correctly?
No
False
Yes
Relationships between No
individual IVs and DV groups False
interpreted correctly?
Yes
SW388R7
Data Analysis & Computers II
Steps in logistic regression:
classification accuracy and adding cautions
Yes
Yes
No
SW388R7
Data Analysis & Computers II
True