Logistic Regression

Logistic Regression
Logistic Regression and Odds

Ratios
Example of Odds Ratio

Using Relationship between
Death Penalty and Race
Probability and Odds
• We begin with a frequency distribution for the variable “Death Penalty
for Crime”
 The probability of receiving a death sentence is 0.34 or 34%

(50/147)
 The odds of receiving a death sentence = death

sentence/not death sentence = 50/97 = 0.5155
Interpreting Odds
• The odds of 0.5155 can be stated in different ways:
– Defendants can expect to receive a death sentence
instead of life imprisonment in about half of their
trials
– Receiving a death sentence is half as likely as
receiving a sentence of life imprisonment
• Or, inverting the odds,

– Receiving a life imprisonment sentence is twice as
likely as receiving the death penalty.
Impact of an Independent Variable
• If an independent variable impacts or has a relationship to a dependent
variable, it will change the odds of being in the key dependent variable
group, e.g. death sentence.
• The following table shows the relationship between race and sentence:
Odds for Independent Variable Groups
• We can compute the odds of receiving a death penalty for each of the
groups:
• The odds of receiving a death sentence if the defendant was Black =

28/45 = 0.6222
• The odds of receiving a death sentence if the defendant was not Black =
22/52 = 0.4231
The Odds Ratio Measures the Effect
• The impact of being black on receiving a death penalty is measured by the odds
ratio which equals:
= the odds if black ÷ the odds if not black
= 0.6222 ÷ 0.4231 = 1.47
• Which we interpret as:

• Blacks are 1.47 times more likely to receive a death sentence as non
blacks
• The risk of receiving a death sentence are 1.47 times greater for blacks
than non blacks
• The odds of a death sentence for blacks are 47% higher than the odds
of a death sentence for non blacks. (1.47 - 1.00)
• The predicted odds for black defendants are 1.47 times the odds for
non black defendants.
• A one unit change in the independent variable race (nonblack to black)
increases the odds of receiving a death penalty by a factor of 1.47.
SPSS Output for this Relationship
The Exp(B) output using SPSS is the

change in the odds ratio.
The odds ratio is output in SPSS in the

column labeled Exp(B).
Variable s in the Equation
B S.E. Wald df Sig. Exp(B)

Step
a
BLACKD .386 .350 1.213 1 .271 1.471
1 Constant -.860 .254 11.439 1 .001 .423
a. Variable(s) entered on step 1: BLACKD.
Logistic Regression – Basic Relationships
Logistic Regression
Describing Relationships
Classification Accuracy
Sample Problems
Logistic regression
• Logistic regression is used to analyze relationships between a
dichotomous dependent variable and metric or dichotomous
independent variables. (SPSS now supports Multinomial Logistic
Regression that can be used with more than two groups, but our
focus here is on binary logistic regression for two groups.)
• Logistic regression combines the independent variables to estimate
the probability that a particular event will occur, i.e. a subject will
be a member of one of the groups defined by the dichotomous
dependent variable. In SPSS, the model is always constructed to
predict the group with higher numeric code. If responses are
coded 1 for Yes and 2 for No, SPSS will predict membership in the
No category. If responses are coded 1 for No and 2 for Yes, SPSS will
predict membership in the Yes category. We will refer to the
predicted event for a particular analysis as the modeled event.
• This will create some awkward wording in our problems. Our only
option for changing this is to recode the variable.
What logistic regression predicts
• The variate or value produced by logistic regression is a probability
value between 0.0 and 1.0.
• If the probability for group membership in the modeled category is

above some cut point (the default is 0.50), the subject is predicted to
be a member of the modeled group. If the probability is below the
cut point, the subject is predicted to be a member of the other group.
• For any given case, logistic regression computes the probability that a
case with a particular set of values for the independent variable is a
member of the modeled category.
• SW388R7
Level of measurement requirements
• Logistic regression analysis requires that the
dependent variable be dichotomous.
• If an independent variable is nominal level and not

dichotomous, the logistic regression procedure in SPSS
has a option to dummy code the variable for you.
• If an independent variable is ordinal, we will attach

the usual caution.
Assumptions
• Logistic regression does not make any
assumptions of normality, linearity, and
homogeneity of variance for the independent
variables.
• Because it does not impose these

requirements, it is preferred to discriminant
analysis when the data does not satisfy these
assumptions.
Sample size requirements
• The minimum number of cases per independent
variable is 10, using a guideline provided by Hosmer
and Lemeshow, authors of Applied Logistic
Regression, one of the main resources for Logistic
Regression.
• For preferred case-to-variable ratios, we will use 20

to 1 for simultaneous and hierarchical logistic
regression and 50 to 1 for stepwise logistic
regression.
Methods for including variables
• There are three methods available for including
variables in the regression equation:
– the simultaneous method in which all independents are included at the same time
– The hierarchical method in which control variables are entered in the analysis before
the predictors whose effects we are primarily concerned with.
– The stepwise method (forward conditional in SPSS) in which variables are selected in
the order in which they maximize the statistically significant contribution to the model.
• For all methods, the contribution to the model is

measures by model chi-square is a statistical measure of
the fit between the dependent and independent
variables, like R².
Computational method
• Multiple regression uses the least-squares method to find the
coefficients for the independent variables in the regression equation,
i.e. it computed coefficients that minimized the residuals for all cases.
• Logistic regression uses maximum-likelihood estimation to compute
the coefficients for the logistic regression equation. This method finds
attempts to find coefficients that match the breakdown of cases on
the dependent variable.
• The overall measure of how will the model fits is given by the
likelihood value, which is similar to the residual or error sum of
squares value for multiple regression. A model that fits the data well
will have a small likelihood value. A perfect model would have a
likelihood value of zero.
• Maximum-likelihood estimation is an interative procedure that
successively tries works to get closer and closer to the correct answer.
When SPSS reports the "iterations," it is telling us how may cycles it
took to get the answer.
Overall test of relationship
• The overall test of relationship among the independent variables
and groups defined by the dependent is based on the reduction in
the likelihood values for a model which does not contain any
independent variables and the model that contains the
independent variables.
• This difference in likelihood follows a chi-square distribution, and

is referred to as the model chi-square.
• The significance test for the model chi-square is our statistical

evidence of the presence of a relationship between the dependent
variable and the combination of the independent variables.
Beginning logistic regression model
• The SPSS output for logistic

regression begins with output for a
model that contains no
independent variables. It labels
this output "Block 0: Beginning
Block" and (if we request the
optional iteration history) reports
the initial -2 Log Likelihood, which
we can think of as a measure of the
error associated trying to predict
the dependent variable without The initial -2 log
using any information from the likelihood is 213.891.
independent variables.
We will not routinely request

the iteration history because
it does not usually yield us
additional useful information.
Ending logistic regression model
• After the independent variables
are entered in Block 1, the -2
log likelihood is again measured
(180.267 in this problem).
• The difference between ending
and beginning -2 log likelihood
is the model chi-square that is
used in the test of overall
statistical significance.
• In this problem, the model chi- Model chi-square is
square is 33.625 (213.891 – 33.625, significant at
p < 0.001.
180.267), which is statistically
significant at p<0.001.
Relationship of Individual Independent
Variables and Dependent Variable
• There is a test of significance for the relationship between an individual
independent variable and the dependent variable, a significance test of the
Wald statistic .
• The individual coefficients represent change in the probability of being a

member of the modeled category. Individual coefficients are expressed in log
units and are not directly interpretable. However, if the b coefficient is used as
the power to which the base of the natural logarithm (2.71828) is raised, the
result represents the change in the odds of the modeled event associated with
a one-unit change in the independent variable.
• If a coefficient is positive, its transformed log value will be greater than one,
meaning that the modeled event is more likely to occur. If a coefficient is
negative, its transformed log value will be less than one, and the odds of the
event occurring decrease. A coefficient of zero (0) has a transformed log value
of 1.0, meaning that this coefficient does not change the odds of the event one
way or the other.
Numerical problems
• The maximum likelihood method used to calculate logistic regression is
an iterative fitting process that attempts to cycle through repetitions to
find an answer.
• Sometimes, the method will break down and not be able to converge or
find an answer.
• Sometimes the method will produce wildly improbable results, reporting
that a one-unit change in an independent variable increases the odds of
the modeled event by hundreds of thousands or millions. These
implausible results can be produced by multicollinearity, categories of
predictors having no cases or zero cells, and complete separation
whereby the two groups are perfectly separated by the scores on one or
more independent variables.
• The clue that we have numerical problems and should not interpret the
results are standard errors for some independent variables that are
larger than 2.0.
Strength of logistic regression relationship
• While logistic regression does compute correlation
measures to estimate the strength of the
relationship (pseudo R square measures, such as
Nagelkerke's R²), these correlations measures do
not really tell us much about the accuracy or errors
associated with the model.
• A more useful measure to assess the utility of a

logistic regression model is classification accuracy,
which compares predicted group membership
based on the logistic model to the actual, known
group membership, which is the value for the
dependent variable.
Evaluating usefulness for logistic models
• The benchmark that we will use to characterize a logistic regression
model as useful is a 25% improvement over the rate of accuracy
achievable by chance alone.
• Even if the independent variables had no relationship to the groups

defined by the dependent variable, we would still expect to be
correct in our predictions of group membership some percentage
of the time. This is referred to as by chance accuracy.
• The estimate of by chance accuracy that we will use is the

proportional by chance accuracy rate, computed by summing the
squared percentage of cases in each group.
Comparing accuracy rates
• To characterize our model as useful, we compare the overall percentage
accuracy rate produced by SPSS at the last step in which variables are entered
to 25% more than the proportional by chance accuracy. (Note: SPSS does not
compute a cross-validated accuracy rate for logistic regression.)
Classification Tablea
Predicted
EXPECT U.S. IN
WORLD WAR IN 10
YEARS Percentage
Observed YES NO Correct
Step 1 EXPECT U.S. IN WORLD YES 20 34 37.0
WAR IN 10 YEARS NO 10 72 87.8
Overall Percentage 67.6
a. The cut value is .500
SPSS reports the overall accuracy rate in

the footnotes to the table "Classification
Table." The overall accuracy rate
computed by SPSS was 67.6%.
Computing by chance accuracy
The number of cases in each group is found in the Classification Table at Step 0
(before any independent variables are included). The proportion of cases in the
largest group is equal to the overall percentage (60.3%).
Classification Tablea,b
Predicted
EXPECT U.S. IN
WORLD WAR IN 10
YEARS Percentage
Step 0 EXPECT U.S. IN WORLD YES 0 54 .0
a. Constant is included in the model.
b. The cut value is .500
The proportional by chance accuracy rate was computed by

calculating the proportion of cases for each group based on the
number of cases in each group in the classification table at Step
0, and then squaring and summing the proportion of cases in
each group (0.397² + 0.603² = 0.521).
The proportional by chance accuracy criteria is 65.2% (1.25 x

52.1% = 65.2%).
Problem 1
In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a statistic?
Assume that there is no problem with missing data, outliers, or influential cases, and that the validation
analysis will confirm the generalizability of the results. Use a level of significance of 0.05 for evaluating the
statistical relationship.
The variables "age" [age], "sex" [sex], and "liberal or conservative political views" [polviews] were useful
predictors for distinguishing between groups based on responses to "seen x-rated movie in last year"
[xmovie]. These predictors differentiate survey respondents who have not seen an x-rated movie from
survey respondents who have seen an x-rated movie.
Survey respondents who were older were more likely to have not seen an x-rated movie. A one unit
increase in age increased the odds that survey respondents have not seen an x-rated movie by 3.9%.
Survey respondents who were female were approximately six and three quarters times more likely to have
not seen an x-rated movie. Survey respondents who were more conservative were more likely to have not
seen an x-rated movie. A one unit increase in liberal or conservative political views increased the odds that
survey respondents have not seen an x-rated movie by approximately one and a quarter times.
1. True
2. True with caution
3. False
4. Inappropriate application of a statistic
Dissecting problem 1 - 1
analysis will confirm the generalizability of the results. Use a level of significance of 0.05 for evaluating
the statistical relationship.
For these problems, we will
assume that there is no problem
[xmovie]. These predictors differentiate survey respondents
with missing data,who have not
outliers, or seen an x-rated movie from
survey respondents who have seen an x-rated movie.cases, and that the
influential
validation analysis will confirm
Survey respondents who were older werethe generalizability of the
more likely to have not seen an x-rated movie. A one unit
results
Survey respondents who were female were Inapproximately
this problem, six
weand
arethree quarters times more likely to have
told to
not seen an x-rated movie. Survey respondents whoas
use 0.05 were more
alpha forconservative
the logistic were more likely to have not
seen an x-rated movie. A one unit increaseregression.
in liberal or conservative political views increased the odds that
1. True
3. False
SW388R7
Data Analysis & Computers II
In theThe variables
dataset listed first
GSS2000.sav, in the
is the problem
following statement true, false, or an incorrect application of a statistic?
statement are the independent variables
(IVs): "age" [age], "sex" [sex], and "liberal
or conservative political views" [polviews].
The variables "age" [age], "sex" [sex], and "liberal or conservative political views" [polviews] were
useful predictors for distinguishing between groups based on responses to "seen x-rated movie in last
year" [xmovie]. These predictors differentiate survey respondents who have not seen an x-rated movie
from survey respondents who have seen an x-rated movie.
Survey respondents
The variable usedwho were older were more likely to have not seen an x-rated movie. A one unit
to define
groups is the dependent
variable
Survey (DV): "seen
respondents whox-rated
were female were approximately six and three quarters times more likely to have
movie in last year" [xmovie].
not seen an x-rated movie. Survey respondents who were more conservative were more likely to have not
When a problem states that a list of
independent variables can distinguish
survey respondents have not seen an x-rated movie by approximately
among groupsoneandanddoes
a quarter times.
not identify
control variable or an order of
importance for the variables, we do a
logistic regression entering all of the
variables simultaneously.
SW388R7
SPSS logistic regression models the relationship by computing
the changes in the likelihood of falling in the category of the
dependent variable which had the highest numerical code.
In the dataset GSS2000.sav,

The responses is the following
to seeing an statement true, were
x-rated movie false, or an incorrect application of a statistic?
coded:
1= Yes and 2 = No.
The SPSS output will model the changes in the likelihood of
not seeing an x-rated movie because the code for No is 2.
Survey respondents who were female were approximately six and three quarters times more likely to
have not seen an x-rated movie. Survey respondents who were more conservative were more likely to
The statements of the specific relationships
have not seen an x-rated movie. A one unit increase in liberal or conservative political views increased
between independent variables and the
the odds that survey respondents have not
dependent seen anare
variable x-rated movie by
all phrased approximately one and a quarter
in terms
times. of impact on not seeing an x-rated movie. SW388R7
analysis will confirm the generalizability
Theof the results.
specific Use a level
relationships forofthe
significance of 0.05 for evaluating the
independent
statistical relationship. variables listed in the problem indicate the direction
of the relationship, increasing or decreasing the
The variables "age" [age], "sex" [sex],likelihood of or
and "liberal falling in the modeled
conservative politicalgroup,
views" and the were useful
[polviews]
predictors for distinguishing betweenamount of change
groups based in the odds
on responses associated
to "seen x-ratedwith a in last year"
movie
one-unit change in the independent variable.
the odds that survey respondents have not seen an x-rated movie by approximately one and a quarter
times.
1. True In order for the logistic regression question to be

2. True with caution true, the overall relationship must be statistically
3. False significant, there must be no evidence of a flawed
numerical analysis, the classification accuracy
rate must be substantially better than could be
obtained by chance alone, and each significant
relationship must be interpreted correctly. SW388R7
LEVEL OF MEASUREMENT - 1
Logistic
increase in age increased regression
the odds requires
that survey that the have
respondents dependent
not seen an x-rated movie by 3.9%.
variable be non-metric and the independent
Survey respondents who were female were approximately six and three quarters times more likely to have
variables
not seen an x-rated movie. be respondents
Survey metric or dichotomous. "seen
who were more x-
conservative were more likely to have not
rated movie in last year" [xmovie] is an
dichotomous variable, which satisfies the level of
measurement requirement.
1. True It contains two categories: survey respondents

2. True with cautionwho had seen an x-rated movie in the last year
3. False and survey respondents who had not seen an x-
rated movie in the last year.
SW388R7
"Age" [age] is an interval level "Sex" [sex] is a dichotomous
variable, which satisfies the level
In the dataset GSS2000.sav, is the following statement true, false, or an incorrectnominal
or dummy-coded application of a statistic?
of Assume that there
measurement is no problem
requirements forwith missing data, outliers, or influential
variable whichcases,
may andbethat the validation
analysis
logistic will confirm
regression the generalizability of the results. Use a level
analysis. of significance
included in logistic of 0.05 for evaluating the
regression.
"Liberal or
increase in age increased the odds that survey respondents haveconservative
not seen anpolitical views"by 3.9%.
x-rated movie
[polviews]
Survey respondents who were female were approximately is an
six and ordinal
three level times
quarters variable.
moreIflikely to have
not seen an x-rated movie. Survey respondents who we were more
follow theconservative
conventionwere more likely to have not
of treating
seen an x-rated movie. A one unit increase in liberalordinal
or conservative political
level variables asviews
metricincreased the odds that
survey respondents have not seen an x-rated movievariables,
by approximately
the levelone and a quarter times.
of measurement
requirement for logistic regression
1. True analysis is satisfied. Since some data
2. True with caution analysts do not agree with this
convention, a note of caution should be
3. False
included in our interpretation.
SW388R7
Request simultaneous logistic regression
Select the Regression |

Binary Logistic…
command from the
Analyze menu.
SW388R7
Selecting the dependent variable
First, highlight the

dependent variable
xmovie in the list
of variables. Second, click on the right
arrow button to move the
dependent variable to the
Dependent text box.
SW388R7
Selecting the independent variables
Move the independent

variables listed in the
problem to the
Covariates list box.
SW388R7
Specifying the method for including variables
SPSS provides us with two methods for including

variables: to enter all of the independent variables
at one time, and a stepwise method for selecting
variables using a statistical test to determine the
order in which variables are included.
SPSS also supports the specification of "Blocks" of

variables for testing hierarchical models.
Since the problem

states that there is a
relationship without
requesting the best
predictors, we specify
Enter as the method for
including variables.
SW388R7
Completing the logistic regression request
Click on the OK
button to request
the output for the
logistic regression.
The logistic procedure supports the selection of subsets of

cases, automatic recoding of nominal variables, saving
diagnostic statistics like standardized residuals and Cook's
distance, and options for additional statistics. However,
none of these are needed for this analysis. SW388R7
Sample size – ratio of cases to variables
Case Processing Summary
a
Unweighted Cases N Percent
Selected Cases Included in Analysis 177 65.6
Missing Cases 93 34.4
Total 270 100.0
Unselected Cases 0 .0
Total 270 100.0
a. If weight is in effect, see classification table for the total
number of cases.
The minimum ratio of valid cases to independent

variables for logistic regression is 10 to 1, with a
preferred ratio of 20 to 1. In this analysis, there
are 177 valid cases and 3 independent
variables. The ratio of cases to independent
variables is 59.0 to 1, which satisfies the
minimum requirement. In addition, the ratio of
59.0 to 1 satisfies the preferred ratio of 20 to 1.
SW388R7
OVERALL RELATIONSHIP BETWEEN
INDEPENDENT AND DEPENDENT VARIABLES
Omnibus Tests of Model Coefficients
Chi-square df Sig.
Step 1 Step 39.668 3 .000
Block 39.668 3 .000
Model 39.668 3 .000
The presence of a relationship between the dependent

variable and combination of independent variables is
based on the statistical significance of the model chi-
square at step 1 after the independent variables have
been added to the analysis.
In this analysis, the probability of the model chi-square

(39.668) was <0.001, less than or equal to the level of
significance of 0.05. The null hypothesis that there is no
difference between the model with only a constant and
the model with independent variables was rejected. The
existence of a relationship between the independent
variables and the dependent variable was supported.
SW388R7
NUMERICAL PROBLEMS
Variables in the Equation

Step
a
AGE .038 .014 7.629 1 .006 1.039
1 SEX 1.901 .410 21.452 1 .000 6.689
POLVIEWS .306 .135 5.110 1 .024 1.358
Constant -4.590 1.045 19.302 1 .000 .010
a. Variable(s) entered on step 1: AGE, SEX, POLVIEWS.
Multicollinearity in the logistic regression solution is detected

by examining the standard errors for the b coefficients. A
standard error larger than 2.0 indicates numerical problems,
such as multicollinearity among the independent variables,
zero cells for a dummy-coded independent variable because
all of the subjects have the same value for the variable, and
'complete separation' whereby the two groups in the
dependent event variable can be perfectly separated by
scores on one of the independent variables. Analyses that
indicate numerical problems should not be interpreted.
None of the independent variables in this analysis had a

standard error larger than 2.0. (The check for standard
errors larger than 2.0 does not include the standard error for
the Constant.) SW388R7
RELATIONSHIP OF INDIVIDUAL INDEPENDENT
VARIABLES TO DEPENDENT VARIABLE - 1
The probability of the Wald statistic for the variable age

was 0.006, less than or equal to the level of significance
of 0.05. The null hypothesis that the b coefficient for age
was equal to zero was rejected. This supports the
relationship that "survey respondents who were older
were more likely to have not seen an x-rated movie."

Step
a
AGE .038 .014 7.629 1 .006 1.039
1 SEX 1.901 .410 21.452 1 .000 6.689
POLVIEWS .306 .135 5.110 1 .024 1.358
Constant -4.590 1.045 19.302 1 .000 .010
The value of Exp(B) was 1.039 which implies that a

one unit increase in age increased the odds that
survey respondents have not seen an x-rated movie
by 3.9%. This confirms the statement of the amount
of change in the likelihood of belonging to the modeled
group of the dependent variable associated with a one
unit change in the independent variable, age.
SW388R7
The probability of the Wald statistic for the variable sex

was <0.001, less than or equal to the level of significance
of 0.05. The null hypothesis that the b coefficient for sex
was equal to zero was rejected. This supports the
relationship that "survey respondents who were female
were approximately six and three quarters times more
likely to have not seen an x-rated movie."

Step
a
AGE .038 .014 7.629 1 .006 1.039
1 SEX 1.901 .410 21.452 1 .000 6.689
POLVIEWS .306 .135 5.110 1 .024 1.358
Constant -4.590 1.045 19.302 1 .000 .010
The value of Exp(B) was 6.689 which implies

that a one unit increase in sex increased the
odds by approximately six and three
quarters times that survey respondents
have not seen an x-rated movie.
SW388R7
The probability of the Wald statistic for the variable liberal or
conservative political views was 0.024, less than or equal to the
level of significance of 0.05. The null hypothesis that the b
coefficient for liberal or conservative political views was equal to
zero was rejected. This supports the relationship that "survey
respondents who were more conservative were more likely to have
not seen an x-rated movie." Liberal or conservative political views is
an ordinal variable that is coded so that higher numeric values are
associated with survey respondents who were more conservative.

Step
a
AGE .038 .014 7.629 1 .006 1.039
1 SEX 1.901 .410 21.452 1 .000 6.689
POLVIEWS .306 .135 5.110 1 .024 1.358
Constant -4.590 1.045 19.302 1 .000 .010
The value of Exp(B) was 1.358 which implies that

a one unit increase in liberal or conservative
political views increased the odds that survey
respondents have not seen an x-rated movie by
approximately one and a quarter times. SW388R7
CLASSIFICATION USING THE LOGISTIC REGRESSION MODEL:
by chance accuracy rate
The independent variables could be characterized as useful

predictors distinguishing survey respondents who have not
seen an x-rated movie from survey respondents who have
seen an x-rated movie if the classification accuracy rate was
substantially higher than the accuracy attainable by chance
alone. Operationally, the classification accuracy rate should
be 25% or more higher than the proportional by chance
accuracy rate.
Predicted
SEEN X-RATED MOVIE
IN LAST YEAR Percentage
Step 0 SEEN X-RATED MOVIE YES 0 45 .0
IN LAST YEAR NO 0 132 100.0
The proportional by chance accuracy rate was computed by first
b. The cut value is .500
calculating the proportion of cases for each group based on the number
of cases in each group in the classification table at Step 0. The
proportion in the "YES" group is 45/177 = 0.254. The proportion in the
"No" group is 132/177 = 0.746.
Then, we square and sum the proportion of cases in each group (0.254² SW388R7
Data Analysis
+ 0.746² = 0.621). 0.621 is the proportional by chance accuracy rate. & Computers II
criteria for classification accuracy
Predicted
SEEN X-RATED MOVIE
IN LAST YEAR Percentage
Step 1 SEEN X-RATED MOVIE YES 19 26 42.2
IN LAST YEAR NO 9 123 93.2
The accuracy rate computed by SPSS was 80.2%

which was greater than or equal to the
proportional by chance accuracy criteria of
77.6% (1.25 x 62.1% = 77.6%).
The criteria for classification accuracy is

satisfied.
SW388R7
Answering the question in problem 1 - 1
The variables "age" [age], "sex" [sex], and "liberal or conservative political views" [polviews] were
useful predictors for distinguishing between groups based on responses to "seen x-rated movie in last
year" [xmovie]. These predictors differentiate survey respondents who have not seen an x-rated movie
from survey respondents who have seen an x-rated movie.
increase in age increased the odds thatWe
survey
found respondents havesignificant
a statistically not seen an x-rated movie by 3.9%.
overall
Survey respondents who were female relationship
were approximately
between sixthe
andcombination
three quarters
of times more likely to have
not seen an x-rated movie. Survey respondents
independentwhovariables
were more conservative
and were more likely to have not
the dependent
variable.
There was no evidence of numerical problems in
the solution.
Moreover, the classification accuracy surpassed

the proportional by chance accuracy criteria,
supporting the utility of the model.
SW388R7
statistical relationship. We verified that each statement about the
relationship between an independent variable and
The variables "age" [age], "sex" the
[sex],dependent variable
and "liberal was correct
or conservative in both
political views" [polviews] were useful
direction of the relationship and the
predictors for distinguishing between groups based on responses to "seen change in movie in last year"
x-rated
likelihood
[xmovie]. These predictors differentiate associated
survey with awho
respondents one-unit
have notchange
seen of
an the
x-rated movie from
independent variable.
the odds that survey respondents have not seen an x-rated movie by approximately one and a quarter
times.
1. True
The answer to the question is true
2. True with caution with caution.
3. False
4. Inappropriate application of a statistic A caution is added because of the
inclusion of ordinal level variables.
SW388R7
Problem 2
After controlling for the effect of the variable "sex" [sex] on "should marijuana be made legal" [grass], the
variable "general happiness" [happy] and "confidence in the executive branch of the federal government"
[confed] were useful predictors for distinguishing between groups based on responses to "should
marijuana be made legal" [grass]. These predictors differentiate survey respondents who have been less
supportive that the use of marijuana should be made legal from survey respondents who have been more
supportive that the use of marijuana should be made legal.
Survey respondents who were less happy overall were less likely to have been less supportive that the use
of marijuana should be made legal. A one unit increase in general happiness decreased the odds that
survey respondents have been less supportive that the use of marijuana should be made legal by 66.9%.
Survey respondents who had less confidence in the executive branch of the federal government were less
likely to have been less supportive that the use of marijuana should be made legal. A one unit increase in
confidence in the executive branch of the federal government decreased the odds that survey respondents
have been less supportive that the use of marijuana should be made legal by 42.8%.
1. True
3. False
SW388R7
variable "general happiness" [happy] and For "confidence in the executive
these problems, we willbranch of the federal government"
with missing data, outliers, or
supportive that the use of marijuana should influential
be madecases,
legal. and that the
the generalizability of the
Survey respondents who were less happy results
overall were less likely to have been less supportive that the use
Survey respondents who had less confidence In this problem,
in the webranch
executive are told to federal government were less
of the
use 0.05 as alpha for the logistic
confidence in the executive branch of the regression.
federal government decreased the odds that survey respondents
1. True
3. False
SW388R7
The variables listed first in the problem statement are
the independent variables (IVs): "sex" [sex] , "general
happiness" [happy], and "confidence in the executive
branch of the federal government" [confed].
Sex is a control variable and general happiness and

In the dataset GSS2000.sav,
confidence is the following
in the executive statement
branchy true, false, or an incorrect application of a statistic?
are predictors.
After controlling for the effect of the variable "sex" [sex] on "should marijuana be made legal" [grass],
the variable "general happiness" [happy] and "confidence in the executive branch of the federal
government" [confed] were useful predictors for distinguishing between groups based on responses to
"should marijuana be made legal" [grass]. These predictors differentiate survey respondents who have
been less supportive that the use of marijuana should be made legal from survey respondents who have
been more supportive that the use of marijuana should be made legal.
The variable used to define groups
is the dependent variable (DV):
"should marijuana be
survey respondents havemade
beenlegal"
less supportive that the use of marijuana should be made legal by 66.9%.
[grass].
When
likely to have been less supportive that the use of marijuana a problem
should be madeidentifies control
legal. A one unit increase in
variables, we do a hierarchical
have been less supportive that the use of marijuana should be made
logistic legal by 42.8%.
regression entering the
variables in SPSS blocks.
SW388R7
SPSS logistic regression models the relationship by computing
the changes in the likelihood of falling in the category of the
dependent variable which had the highest numerical code.
The responses to seeing an x-rated movie were coded:

In the1=dataset
Legal GSS2000.sav,
and 2 = Not isLegal.
the following statement true, false, or an incorrect application of a statistic?
analysis
Thewill confirm
SPSS thewill
output generalizability of the results.
model the changes in the Use a level of
likelihood of significance of 0.05 for evaluating the
being less supportive of legalizing marijuana because 2
corresponds to not legalizing marijuana.
supportive that the use of marijuana should be made legal from survey respondents who have been
more supportive that the use of marijuana should be made legal.
confidence in the executive
The branch of theoffederal
statements government
the specific decreased
relationships the odds that survey respondents
between
have been less supportive that the use of marijuana should be made legal
independent variables and the dependent variable by 42.8%.
are all
phrased in terms of impact on being less supportive of SW388R7
legalizing marijuana. Data Analysis & Computers II
The specific relationships for the independent
After controlling for the effect of thevariables
variable "sex" [sex]
listed on "should
in the problemmarijuana be made
indicate the legal" [grass], the
direction
[confed] were useful predictors for distinguishing
likelihood of between
falling ingroups based on
the modeled responses
group, to "should
and the
marijuana be made legal" [grass]. Theseamount of change in the odds associated with a have been less
predictors differentiate survey respondents who
supportive that the use of marijuanaone-unit
should bechange
made legal from
in the survey respondents
independent variable. who have been more
Survey respondents who were less happy overall were less likely to have been less supportive that the
use of marijuana should be made legal. A one unit increase in general happiness decreased the odds
that survey respondents have been less supportive that the use of marijuana should be made legal by
66.9%. Survey respondents who had less confidence in the executive branch of the federal government
were less likely to have been less supportive that the use of marijuana should be made legal. A one unit
increase in confidence in the executive branch of the federal government decreased the odds that
1. True
In order for the logistic regression question to be true, the
relationship between the predictors and the dependent variable
3. False must be statistically significant after entering the control variables
in a previous stage, there must be no evidence of a flawed
numerical analysis, the classification accuracy rate must be
substantially better than could be obtained by chance alone, and
each significant relationship must be interpreted correctly. SW388R7
supportive that the use of marijuana should be made legal from survey respondents who have been
more supportive that the use of marijuana should be made legal.
survey respondents have regression
Logistic been less supportive that the use
analysis requires thatofthe
marijuana
dependentshould be made legal by 66.9%.
Survey respondents who had
variable less confidence
be dichotomous in the executive
and independentbranch of the federal government were less
variables
likely to have been
be less supportive
metric that the use
or dichotomous. of marijuana
"Should should
marijuana bebe made legal. A one unit increase in
made
confidence in thelegal"
executive branch of the federal government decreased
[grass] is a dichotomous variable, which satisfies the odds that survey respondents
have been less supportive that
the level of the use of marijuana
measurement shouldfor
requirement bethe
made legal by 42.8%.
dependent
variable.
1. True
2. It contains two categories:
True with caution
3. False •survey respondents who have been less supportive that
4. Inappropriatethe use of marijuana
application should be made legal
of a statistic
•survey respondents who have been more supportive that
the use of marijuana should be made legal
SW388R7
"Sex" [sex] is a dichotomous or
In the datasetdummy-coded
GSS2000.sav, isnominal variable which
the following statement true, false, or an incorrect application of a statistic?
may be included in logistic
Assume that there is no problem with missing regression
data, outliers, or influential cases, and that the validation
analysis.
After controlling for the effect of the variable "sex" [sex] on "should marijuana be made legal" [grass],
the variable "general happiness" [happy] and "confidence in the executive branch of the federal
government" [confed] were useful predictors for distinguishing between groups based on responses to
"should marijuana be made legal" [grass]. These predictors differentiate survey respondents who have
been less supportive that the use of marijuana should be made legal from survey respondents who have
been more supportive that the use of marijuana should be made legal.
Survey respondents who were less"General

happy overall were less
happiness" likely toand
[happy] have been less supportive
"confidence in the that the use
of marijuana should be made legal.executive
A one unitbranch
increase ofin general
the federalhappiness
government"decreased the odds that
survey respondents have been less[confed]
supportive that
are the use
ordinal of marijuana
level variables. should be made
If we follow thelegal by 66.9%.
Survey respondents who had less confidence
conventioninof the executive
treating branch
ordinal of the
level federalas
variables government were less
likely to have been less supportive metric
that thevariables,
use of marijuana should be made
the level of measurement legal. A one unit increase in
confidence in the executive branchrequirement
of the federalfor
government decreased the
logistic regression analysis isodds that survey respondents
have been less supportive that the satisfied.
use of marijuana
Since someshould be made
data analystslegaldobynot
42.8%.
agree with
this convention, a note of caution should be included
in our interpretation.
SW388R7
Request hierarchical logistic regression

Binary Logistic…
command from the
Analyze menu.
SW388R7
dependent variable
grass in the list of
variables.
Second, click on the right

Dependent text box.
SW388R7
Selecting the control independent variables
Second, click on the

First, move the control Next button to add the
independent variable, new block that will
sex, listed in the contain the predictors.
problem to the
SW388R7
Adding the predictor independent variables
First, move the

predictors to the
SW388R7
In our hierarchical
regression, we will specify
that all of the variables in
each block be entered
simultaneously when the
block is entered.
SW388R7
Click on the OK
button to request
the output for the
The logistic procedure supports the selection of subsets of

cases, automatic recoding of nominal variables, saving
diagnostic statistics like standardized residuals and Cook's
distance, and options for additional statistics. However,
none of these are needed for this analysis. SW388R7
a
Total 270 100.0
Total 270 100.0
number of cases.

variables for logistic regression is 10 to 1, with a
preferred ratio of 20 to 1. In this analysis, there
are 163 valid cases and 3 independent
variables. The ratio of cases to independent
variables is 54.33 to 1, which satisfies the
minimum requirement. In addition, the ratio of
54.33 to 1 satisfies the preferred ratio of 20 to
1.
SW388R7
In a hierarchical logistic regression, the presence of a relationship

between the dependent variable and combination of independent
variables entered after the control variables have been included is
based on the statistical significance of the block chi-square for the
second block of variables in which the predictor independent
variables are included.
In this analysis, the probability of the block chi-square (17.467)

was <0.001, less than or equal to the level of significance of
0.05. The null hypothesis that there is no difference between the
model with only a constant and the control variables versus the
model with the predictor independent variables was rejected. The
contribution of the relationship between the predictor independent
variables and the dependent variable was supported. SW388R7
NUMERICAL PROBLEMS

Step
a
SEX .154 .351 .194 1 .660 1.167
1 HAPPY -1.104 .354 9.739 1 .002 .331
CONFED -.559 .270 4.290 1 .038 .572
Constant 3.721 1.066 12.195 1 .000 41.308
a. Variable(s) entered on step 1: HAPPY, CONFED.


the Constant.) SW388R7
The probability of the Wald statistic for the variable general

happiness was 0.002, less than or equal to the level of
significance of 0.05. The null hypothesis that the b coefficient
for general happiness was equal to zero was rejected. This
supports the relationship that "survey respondents who were
less happy overall were less likely to have been less supportive
that the use of marijuana should be made legal." General
happiness is an ordinal variable that is coded so that lower
numeric values are associated with survey respondents who
were happier overall.

Step
a
SEX .154 .351 .194 1 .660 1.167
1 HAPPY -1.104 .354 9.739 1 .002 .331
CONFED -.559 .270 4.290 1 .038 .572
Constant 3.721 1.066 12.195 1 .000 41.308

one unit increase in general happiness decreased the
odds that survey respondents have been less
supportive that the use of marijuana should be made SW388R7
legal by 66.9%. Data Analysis & Computers II
The probability of the Wald statistic for the variable confidence in the
executive branch of the federal government was 0.038, less than or
equal to the level of significance of 0.05. The null hypothesis that the
b coefficient for confidence in the executive branch of the federal
government was equal to zero was rejected. This supports the
relationship that "survey respondents who had less confidence in the
executive branch of the federal government were less likely to have
been less supportive that the use of marijuana should be made legal."
Confidence in the executive branch of the federal government is an
ordinal variable that is coded so that lower numeric values are
associated with survey respondents who had more confidence in the
executive branch of the federal government.

Step
a
SEX .154 .351 .194 1 .660 1.167
1 HAPPY -1.104 .354 9.739 1 .002 .331
CONFED -.559 .270 4.290 1 .038 .572
Constant 3.721 1.066 12.195 1 .000 41.308
The value of Exp(B) was 0.572 which implies

that a one unit increase in confidence in the
executive branch of the federal government
decreased the odds that survey respondents
have been less supportive that the use of SW388R7
marijuana should be made legal by 42.8%.

predictors distinguishing survey respondents who have been
less supportive that the use of marijuana should be made
legal from survey respondents who have been more
supportive that the use of marijuana should be made legal if
the classification accuracy rate was substantially higher than
the accuracy attainable by chance alone. Operationally, the
classification accuracy rate should be 25% or more higher
than the proportional by chance accuracy rate.
Predicted
SHOULD MARIJUANA BE
MADE LEGAL Percentage
Observed LEGAL NOT LEGAL Correct
Step 0 SHOULD MARIJUANA LEGAL 0 57 .0
BE MADE LEGAL NOT LEGAL 0 106 100.0
b. The cut value
The is .500 by chance accuracy rate was computed by
proportional
calculating the proportion of cases for each group based on
the number of cases in each group in the classification table
at Step 0, and then squaring and summing the proportion of
cases in each group (0.350² + 0.650² = 0.545). SW388R7
Predicted
SHOULD MARIJUANA BE
MADE LEGAL Percentage
Observed LEGAL NOT LEGAL Correct
Step 1 SHOULD MARIJUANA LEGAL 18 39 31.6
BE MADE LEGAL NOT LEGAL 13 93 87.7
The accuracy rate computed by SPSS was 68.1%

which was greater than or equal to the
68.1% (1.25 x 54.5% = 68.1%).

satisfied.
SW388R7
Survey respondents who had less confidence
We foundin the executive branch
a statistically of theoverall
significant federal government were less
relationship between the predictor independent
confidence in the executive branch of variables
the federaland
government decreased
the dependent the odds that survey respondents
variable.
1. True the solution.
Moreover, the classification accuracy surpassed
3. False
SW388R7
We verified that each statement about the
After controlling for the effect ofthe
thedependent
variable "sex" [sex] on "should marijuana be made legal" [grass], the
variable was correct in both
direction of the relationship and the change in
marijuana be made legal" [grass]. likelihood associated
These predictors with a one-unit
differentiate change of the
survey respondents who have been less
independent variable.
1. True
2. True with caution The answer to the question is true
3. False with caution.
A caution is added because of the
inclusion of ordinal level variables.
SW388R7
Problem 3
From the list of variables "highest academic degree" [degree], "total family income" [income98], and
"satisfaction with financial situation" [satfin], the most useful predictor for distinguishing between groups
based on responses to "expect u.s. in world war in 10 years" [uswary] was "total family income"
[income98]. These predictors differentiate survey respondents who have been less positive that the United
States would fight in another world war within the next ten years from survey respondents who have been
more positive that the United States would fight in another world war within the next ten years.
The most important predictor for identifying survey respondents who have been less positive that the
United States would fight in another world war within the next ten years was total family income.
Survey respondents who had higher total family incomes were more likely to have been less positive that
the United States would fight in another world war within the next ten years. A one unit increase in total
family income increased the odds that survey respondents have been less positive that the United States
would fight in another world war within the next ten years by 10.0%.
1. True
3. False
SW388R7
Dissecting Problem 3 - 1
"satisfaction with financial situation" [satfin],
For the most
these useful predictor
problems, we will for distinguishing between groups
with missing data, outliers, or
more positive that the United States would influential cases, world
fight in another and that
warthe
within the next ten years.
the generalizability of the
results
In this problem, we are told to
Survey respondents who had higher total use
family incomes
0.05 wereformore
as alpha the likely to have been less positive that
logistic
the United States would fight in another world war within
regression. the next ten years. A one unit increase in total
1. True
3. False
SW388R7
The variables listed first in the The variable used to
problem statement are the define groups is the
independent variables (IVs): "highest dependent variable (DV):
academic degree" [degree], "total "expect u.s. in world war
family income" [income98], and inincorrect
10 years" [uswary].of a statistic?
In the dataset GSS2000.sav, is the following statement true, false, or an application
"satisfaction with financial situation"
analysis[satfin].
will confirm the generalizability of the results. Use a level of significance of 0.05 for evaluating the
"satisfaction with financial situation" [satfin], the most useful predictor for distinguishing between
groups based on responses to "expect u.s. in world war in 10 years" [uswary] was "total family income"
[income98]. These predictors differentiate survey respondents who have been less positive that the
United States would fight in another world war within the next ten years from survey respondents who
have been more positive that the United States would fight in another world war within the next ten years.
the United States would fight in another world war within the nextSince
ten years. A one unit
the problem increase in total
identifies
family income increased the odds that survey respondents have beenthe less
mostpositive
usefulthat
of the United States
important predictor, we do
a stepwise logistic
regression. SW388R7
[income98]. These predictors differentiate survey respondents who have been less positive that the
United States would fight in another world war within the next ten years from survey respondents who
have been more positive that the United States would fight in another world war within the next ten
years.
Survey respondents whologistic

SPSS had higher total family
regression incomes
models were more likely
the relationship to have been
by computing the less positive that
the United Stateschanges
would fight in another
in the world
likelihood of war within
falling thecategory
in the next ten years.
of the A one unit increase in total
family income increased the odds that survey respondents have been
dependent variable which had the highest numerical code.less positive that the United States
The responses to “expect u.s. in world war in 10 years” were
coded: 1= Yes and 2 = No.
The SPSS output will model the changes in the likelihood of being
less positive that the United States would fight in another world
war within the next ten years.
SW388R7
statistical relationship. The statements of the specific
relationships between independent
variables and the dependent variable are
allmost
"satisfaction with financial situation" [satfin], the phrased in predictor
useful terms of for
impact on being between groups
distinguishing
less positive that the United States
based on responses to "expect u.s. in world war in 10 years" [uswary] was "total family would
income"
fight in another world war within the next
States would fight in another world war within the ten next
years.
ten years from survey respondents who have been
Survey respondents who had higher total family incomes were more likely to have been less positive
that the United States would fight in another world war within the next ten years. A one unit increase in
total family income increased the odds that survey respondents have been less positive that the United
States would fight in another world war within the next ten years by 10.0%.
SW388R7
States would fight in another world warThewithin
specifictherelationships for
next ten years the survey
from independent
respondents who have been
variables listed in the problem indicate
more positive that the United States would fight in another world war within the thenext
direction
ten years.
likelihood of falling in the modeled group, and the
United States would fight in another amount of within
world war change associated
the with was
next ten years a one-unit
total family income.
change in the independent variable.
Survey respondents who had higher total family incomes were more likely to have been less positive
that the United States would fight in another world war within the next ten years. A one unit increase in
total family income increased the odds that survey respondents have been less positive that the United
States would fight in another world war within the next ten years by 10.0%.
1. True
3. False
In order for the logistic regression question to be true, the
relationship between the predictors selected for inclusion and the
dependent variable must be statistically significant, there must be
no evidence of a flawed numerical analysis, the classification
accuracy rate must be substantially better than could be obtained
by chance alone, and the order of entry and each significant
relationship must be interpreted correctly. SW388R7
Logistic regression analysis requires that the dependent variable
the United Statesbewould
dichotomous and the
fight in another independent
world war withinvariables
the next be
tenmetric orone unit increase in total
years. A
dichotomous.
family income increased "Expect
the odds u.s. inrespondents
that survey world war in 10 been
have years" [uswary]
less positive is a the United States
that
dichotomous variable, which satisfies the level
would fight in another world war within the next ten years by 10.0%. of measurement
requirement for the dependent variable.
1. True
It contains two categories:
3. False survey respondents who have been less positive that the United
4. InappropriateStates
application
wouldoffight
a statistic
in another world war within the next ten years
survey respondents who have been more positive that the United
States would fight in another world war within the next ten years.
SW388R7
"Highest academic degree" [degree], "total family
income" [income98], and "satisfaction with financial
situation" [satfin] are ordinal level variables. If we
follow the convention of treating ordinal level
variables as metric variables, the level of
measurement requirement for logistic regression
Assume that there is no problem withanalysis is satisfied.
missing Since
data, outliers, orsome datacases,
influential analysts
anddo not
that the validation
agree with this convention, a note of caution should
statistical relationship. be included in our interpretation.
SW388R7
Request stepwise logistic regression

Binary Logistic…
command from the
Analyze menu.
SW388R7

dependent variable
uswary in the list
of variables.
Second, click on the right
Dependent text box.
SW388R7
Adding the independent variables
First, move the

predictors to the
SW388R7
In our stepwise logistic

regression, we specify
the Forward
Conditional method for
adding variables.
SW388R7
Adding options to the output
To add a summary of steps

at the end of the analysis
and specifications for
stepwise method, click on
the Options… button. SW388R7
Including a summary of steps
To obtain a summary of the steps

on which variables were added or
removed from the analysis, mark
the option button At last step in
the Display panel.
SW388R7
Specifications for stepwise method
Click on the
Continue button to
close the dialog box.
We can change the criteria for adding and

removing variables from the analysis by
changing the probability for entry and removal.
We will use the default level of significance of
0.05 for entry and 0.10 for removal.
SW388R7
Click on the OK
button to request
the output for the
SW388R7
a
Total 270 100.0
Total 270 100.0
number of cases.

variables for stepwise logistic regression is 10 to
1, with a preferred ratio of 50 to 1. In this
analysis, there are 136 valid cases and 3
independent variables. The ratio of cases to
independent variables is 45.33 to 1, which
satisfies the minimum requirement. However,
the ratio of 45.33 to 1 does not satisfy the
preferred ratio of 50 to 1. A caution should be
added to the interpretation of the analysis and a
split sample validation should be conducted. SW388R7
The presence of a relationship between the dependent variable

and combination of independent variables is based on the
statistical significance of the model chi-square.
In this analysis, the probability of the model chi-square (9.001)

was 0.003, less than or equal to the level of significance of 0.05.
The null hypothesis that there is no difference between the model
with only a constant and the model with independent variables
was rejected. The existence of a relationship between the
independent variables and the dependent variable was supported.
SW388R7
NUMERICAL PROBLEMS

Step
a
INCOME98 .095 .033 8.436 1 .004 1.100
1 Constant -1.033 .527 3.847 1 .050 .356
a. Variable(s) entered on step 1: INCOME98.


the Constant.)
SW388R7
VARIABLES TO DEPENDENT VARIABLE
The probability of the Wald statistic for the variable total family
income was 0.004, less than or equal to the level of significance
of 0.05. The null hypothesis that the b coefficient for total family
income was equal to zero was rejected. This supports the
relationship that "survey respondents who had higher total family
incomes were more likely to have been less positive that the
United States would fight in another world war within the next
ten years." Total family income is an ordinal variable that is
coded so that higher numeric values are associated with survey
respondents who had higher total family incomes.

Step
a
INCOME98 .095 .033 8.436 1 .004 1.100
1 Constant -1.033 .527 3.847 1 .050 .356
a. Variable(s) entered on step 1: INCOME98.

one unit increase in total family income increased the
odds that survey respondents have been less positive
that the United States would fight in another world war SW388R7
within the next ten years by 10.0%. Data Analysis & Computers II
IMPORTANCE OF INDIVIDUAL INDEPENDENT
VARIABLES TO DEPENDENT VARIABLE
The order of importance is based on the entry

order of the variables included in the stepwise
logistic regression. The entry order is
summarized in the Step Summary table, in
which we see which variable was added or
removed at each step.
Step Summarya,b
Improvement Model Correct

Step Chi-square df Sig. Chi-square df Sig. Class % Variable
1 IN:
9.001 1 .003 9.001 1 .003 67.6% INCOME9
8
a. No more variables can be deleted from or added to the current model.
b. End block: 1
The most important predictor for identifying
survey respondents who have been less
positive that the United States would fight in
another world war within the next ten years
was total family income [INCOME98].
The importance of the predictors stated in

the problem is correct. SW388R7

predictors distinguishing survey respondents who have been
less positive that the United States would fight in another
world war within the next ten years from survey respondents
who have been more positive that the United States would
fight in another world war within the next ten years if the
classification accuracy rate was substantially higher than the
accuracy attainable by chance alone. Operationally, the
classification accuracy rate should be 25% or more higher
than the proportional by chance accuracy rate.
Predicted
EXPECT U.S. IN
WORLD WAR IN 10
YEARS Percentage
Step 0 EXPECT U.S. IN WORLD YES 0 54 .0
The
b. The cutproportional
value is .500 by chance accuracy rate was computed by
calculating the proportion of cases for each group based on
the number of cases in each group in the classification table
at Step 0, and then squaring and summing the proportion of SW388R7
cases in each group (0.397² + 0.603² = 0.521). Data Analysis & Computers II
Predicted
EXPECT U.S. IN
WORLD WAR IN 10
YEARS Percentage
Step 1 EXPECT U.S. IN WORLD YES 20 34 37.0
The accuracy rate computed by SPSS was

67.6% which was greater than or equal to the
65.2% (1.25 x 52.1% = 65.2%).

satisfied.
SW388R7
Survey respondents who had higher total

We family
found incomes were more
a statistically likely to
significant have been less positive that
overall
the United States would fight in another world war between
relationship within thethe
next ten years.
predictor A one unit increase in total
independent
family income increased the odds thatvariables
survey respondents have beenvariable.
and the dependent less positive that the United States
1. True the solution.
3. False Moreover, the classification accuracy surpassed
SW388R7
statistical relationship. We verified that each statement about the
From the list of variables
the"highest academic
dependent degree"
variable was[degree],
correct in"total
bothfamily income" [income98], and
"satisfaction with financial situation"
direction of the[satfin], the mostand
relationship useful
the predictor
change in for distinguishing between groups
based on responses to "expect u.s. in world war in 10 years" [uswary]
likelihood associated with a one-unit change of the was "total family income"
[income98]. These predictors differentiate
independent variable. survey respondents who have been less positive that the United
more positive that the United
We also States would
verified thefight in another
order world war
of importance for within
the the next ten years.
independent variables included in the stepwise
analysis.
The answer to the question is true
with caution.
1. True
2. True with caution A caution is added to the findings
3. False because of the inclusion of ordinal
4. Inappropriate application of a statistic level independent variables. A
caution is added to the findings
because of the preferred sample SW388R7
size is not met. Data Analysis & Computers II
Steps in binary logistic regression:
level of measurement and initial sample size
The following is a guide to the decision process for answering

problems about the basic relationships in logistic regression:
Dependent dichotomous? No Inappropriate

Independent variables application of
metric or dichotomous? a statistic
Yes
Ratio of cases to No Inappropriate

independent variables at application of
least 10 to 1?
a statistic
Yes
Run logistic regression, using method for including

variables identified in the research question.
SW388R7
Steps in logistic regression:
overall relationship and numerical problems
Hierarchical method of
entry used to include
independent variables?
No Yes
Presence of relationship Presence of relationship

confirmed by test of confirmed by test of
model chi-square? block chi-square?
No No
False Yes False

Yes
Standard errors of
Yes
coefficients indicate
False
presence of numerical
problems (s.e. > 2.0)?
SW388R7
No
relationships between IV's and DV
Stepwise method of entry

used to include
independent variables?
Yes
No
Entry order of variables
interpreted correctly?
No
False
Yes
Relationships between No
individual IVs and DV groups False
interpreted correctly?
Yes
SW388R7
classification accuracy and adding cautions
Overall accuracy rate is No

25% > than proportional False
by chance accuracy rate?
Yes
Satisfies preferred ratio of No

cases to IV's of 20 to 1 True with caution
(50 to 1 for stepwise)
Yes
One or more IV's are Yes

ordinal level variables? True with caution
No
SW388R7
True

Logistic Regression

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Logistic Regression

Uploaded by

Copyright:

Available Formats

Logistic Regression

Logistic Regression and Odds

Example of Odds Ratio

 The probability of receiving a death sentence is 0.34 or 34%

 The odds of receiving a death sentence = death

• Or, inverting the odds,

• The odds of receiving a death sentence if the defendant was Black =

• Which we interpret as:

The Exp(B) output using SPSS is the

The odds ratio is output in SPSS in the

Variable s in the Equation

B S.E. Wald df Sig. Exp(B)

• If the probability for group membership in the modeled category is

• If an independent variable is nominal level and not

• If an independent variable is ordinal, we will attach

• Because it does not impose these

• For preferred case-to-variable ratios, we will use 20

• For all methods, the contribution to the model is

• This difference in likelihood follows a chi-square distribution, and

• The significance test for the model chi-square is our statistical

• The SPSS output for logistic

We will not routinely request

• The individual coefficients represent change in the probability of being a

• A more useful measure to assess the utility of a

• Even if the independent variables had no relationship to the groups

• The estimate of by chance accuracy that we will use is the

SPSS reports the overall accuracy rate in

The proportional by chance accuracy rate was computed by

The proportional by chance accuracy criteria is 65.2% (1.25 x

In the dataset GSS2000.sav,

1. True In order for the logistic regression question to be

1. True It contains two categories: survey respondents

Select the Regression |

First, highlight the

Move the independent

SPSS provides us with two methods for including

SPSS also supports the specification of "Blocks" of

Since the problem

The logistic procedure supports the selection of subsets of

The minimum ratio of valid cases to independent

The presence of a relationship between the dependent

In this analysis, the probability of the model chi-square

B S.E. Wald df Sig. Exp(B)

Multicollinearity in the logistic regression solution is detected

None of the independent variables in this analysis had a

The probability of the Wald statistic for the variable age

Variables in the Equation

B S.E. Wald df Sig. Exp(B)

The value of Exp(B) was 1.039 which implies that a

The probability of the Wald statistic for the variable sex

Variables in the Equation

B S.E. Wald df Sig. Exp(B)

The value of Exp(B) was 6.689 which implies

Variables in the Equation

B S.E. Wald df Sig. Exp(B)

The value of Exp(B) was 1.358 which implies that

The independent variables could be characterized as useful

The accuracy rate computed by SPSS was 80.2%

The criteria for classification accuracy is

Moreover, the classification accuracy surpassed

Sex is a control variable and general happiness and

The responses to seeing an x-rated movie were coded: