LogisticRegression BasicRelationships

SW388R7
Data Analysis &

Computers II
Logistic Regression Basic Relationships
Slide 1
Logistic Regression
Describing Relationships
Classification Accuracy
Sample Problems
Compu
ters II
Logistic regression
Slide 2
Logistic regression is used to analyze relationships between a

dichotomous dependent variable and metric or dichotomous
independent variables. (SPSS now supports Multinomial Logistic
Regression that can be used with more than two groups, but our
focus here is on binary logistic regression for two groups.)
Logistic regression combines the independent variables to

estimate the probability that a particular event will occur, i.e.
a subject will be a member of one of the groups defined by the
dichotomous dependent variable. In SPSS, the model is always
constructed to predict the group with higher numeric code. If
responses are coded 1 for Yes and 2 for No, SPSS will predict
membership in the No category. If responses are coded 1 for No
and 2 for Yes, SPSS will predict membership in the Yes category.
We will refer to the predicted event for a particular analysis as
the modeled event.
This will create some awkward wording in our problems. Our

only option for changing this is to recode the variable.
Compu
ters II
What logistic regression predicts
Slide 3
The variate or value produced by logistic regression is a

probability value between 0.0 and 1.0.
If the probability for group membership in the modeled

category is above some cut point (the default is 0.50), the
subject is predicted to be a member of the modeled group. If
the probability is below the cut point, the subject is predicted
to be a member of the other group.
For any given case, logistic regression computes the probability

that a case with a particular set of values for the independent
variable is a member of the modeled category.
Compu
ters II
Level of measurement requirements
Slide 4
Logistic regression analysis requires that the dependent

variable be dichotomous.
Logistic regression analysis requires that the independent

variables be metric or dichotomous.
If an independent variable is nominal level and not

dichotomous, the logistic regression procedure in SPSS has a
option to dummy code the variable for you.
If an independent variable is ordinal, we will attach the usual

caution.
Compu
ters II
Assumptions
Slide 5
Logistic regression does not make any assumptions of normality,

linearity, and homogeneity of variance for the independent
variables.
Because it does not impose these requirements, it is preferred

to discriminant analysis when the data does not satisfy these
assumptions.
Compu
ters II
Sample size requirements
Slide 6
The minimum number of cases per independent variable is 10,

using a guideline provided by Hosmer and Lemeshow, authors of
Applied Logistic Regression, one of the main resources for
Logistic Regression.
For preferred case-to-variable ratios, we will use 20 to 1 for

simultaneous and hierarchical logistic regression and 50 to 1 for
stepwise logistic regression.
Compu
ters II
Methods for including variables
Slide 7
There are three methods available for including variables in the

regression equation:
the simultaneous method in which all independents are
included at the same time
The hierarchical method in which control variables are
entered in the analysis before the predictors whose effects
we are primarily concerned with.
The stepwise method (forward conditional in SPSS) in which
variables are selected in the order in which they maximize
the statistically significant contribution to the model.
For all methods, the contribution to the model is measures by

model chi-square is a statistical measure of the fit between the
dependent and independent variables, like R.
Compu
ters II
Computational method
Slide 8
Multiple regression uses the least-squares method to find the

coefficients for the independent variables in the regression
equation, i.e. it computed coefficients that minimized the
residuals for all cases.
Logistic regression uses maximum-likelihood estimation to

compute the coefficients for the logistic regression equation.
This method finds attempts to find coefficients that match the
breakdown of cases on the dependent variable.
The overall measure of how will the model fits is given by the
likelihood value, which is similar to the residual or error sum of
squares value for multiple regression. A model that fits the data
well will have a small likelihood value. A perfect model would
have a likelihood value of zero.
Maximum-likelihood estimation is an interative procedure that

successively tries works to get closer and closer to the correct
answer. When SPSS reports the "iterations," it is telling us how
may cycles it took to get the answer.
Compu
ters II
Overall test of relationship
Slide 9
The overall test of relationship among the independent

variables and groups defined by the dependent is based on the
reduction in the likelihood values for a model which does not
contain any independent variables and the model that contains
the independent variables.
This difference in likelihood follows a chi-square distribution,

and is referred to as the model chi-square.
The significance test for the model chi-square is our statistical

evidence of the presence of a relationship between the
dependent variable and the combination of the independent
variables.
ters II
Slide
10
Beginning logistic regression model
The SPSS output for logistic

regression begins with output
for a model that contains no
independent variables. It labels
this output "Block 0: Beginning
Block" and (if we request the
optional iteration history)
reports the initial -2 Log
Likelihood, which we can think
of as a measure of the error
associated trying to predict the
dependent variable without
using any information from the
independent variables.
The initial -2 log

likelihood is 213.891.
We will not routinely request

the iteration history because
it does not usually yield us
additional useful
information.
ters II
Slide
11
Ending logistic regression model
After the independent

variables are entered in
Block 1, the -2 log likelihood
is again measured (180.267 in
this problem).
The difference between
ending and beginning -2 log
likelihood is the model chisquare that is used in the
test of overall statistical
significance.
In this problem, the model
chi-square is 33.625 (213.891
180.267), which is
statistically significant at
p<0.001.
Model chi-square is
33.625, significant at
p < 0.001.
ters II
Relationship of Individual Independent

Variables and Dependent Variable
Slide
12
There is a test of significance for the relationship between an

individual independent variable and the dependent variable, a
significance test of the Wald statistic .
The individual coefficients represent change in the probability of being

a member of the modeled category. Individual coefficients are
expressed in log units and are not directly interpretable. However, if
the b coefficient is used as the power to which the base of the natural
logarithm (2.71828) is raised, the result represents the change in the
odds of the modeled event associated with a one-unit change in the
independent variable.
If a coefficient is positive, its transformed log value will be greater

than one, meaning that the modeled event is more likely to occur. If a
coefficient is negative, its transformed log value will be less than one,
and the odds of the event occurring decrease. A coefficient of zero (0)
has a transformed log value of 1.0, meaning that this coefficient does
not change the odds of the event one way or the other.
ters II
Slide
13
Numerical problems
The maximum likelihood method used to calculate logistic

regression is an iterative fitting process that attempts to cycle
through repetitions to find an answer.
Sometimes, the method will break down and not be able to
converge or find an answer.
Sometimes the method will produce wildly improbable results,
reporting that a one-unit change in an independent variable
increases the odds of the modeled event by hundreds of
thousands or millions. These implausible results can be
produced by multicollinearity, categories of predictors having
no cases or zero cells, and complete separation whereby the
two groups are perfectly separated by the scores on one or
more independent variables.
The clue that we have numerical problems and should not
interpret the results are standard errors for some independent
variables that are larger than 2.0.
ters II
Slide
14
Strength of logistic regression relationship
While logistic regression does compute correlation measures to

estimate the strength of the relationship (pseudo R square
measures, such as Nagelkerke's R), these correlations measures
do not really tell us much about the accuracy or errors
associated with the model.
A more useful measure to assess the utility of a logistic

regression model is classification accuracy, which compares
predicted group membership based on the logistic model to the
actual, known group membership, which is the value for the
dependent variable.
ters II
Slide
15
Evaluating usefulness for logistic models
The benchmark that we will use to characterize a logistic

regression model as useful is a 25% improvement over the rate
of accuracy achievable by chance alone.
Even if the independent variables had no relationship to the

groups defined by the dependent variable, we would still
expect to be correct in our predictions of group membership
some percentage of the time. This is referred to as by chance
accuracy.
The estimate of by chance accuracy that we will use is the

proportional by chance accuracy rate, computed by summing
the squared percentage of cases in each group.
ters II
Slide
16
Comparing accuracy rates
To characterize our model as useful, we compare the overall

percentage accuracy rate produced by SPSS at the last step in which
variables are entered to 25% more than the proportional by chance
accuracy. (Note: SPSS does not compute a cross-validated accuracy
rate for logistic regression.)
Classification Tablea
Step 1
Observed
EXPECT U.S. IN WORLD
WAR IN 10 YEARS
YES
NO
Predicted
EXPECT U.S. IN
WORLD WAR IN 10
YEARS
YES
NO
20
34
10
72
Overall Percentage
a. The cut value is .500
SPSS reports the overall accuracy rate in

the footnotes to the table "Classification
Table." The overall accuracy rate
computed by SPSS was 67.6%.
Percentage
Correct
37.0
87.8
67.6
ters II
Slide
17
Computing by chance accuracy

The number of cases in each group is found in the Classification Table at
Step 0 (before any independent variables are included). The proportion
of cases in the largest group is equal to the overall percentage (60.3%).
Classification Tablea,b
Step 0
Observed
WAR IN 10 YEARS
YES
NO
Predicted
EXPECT U.S. IN
WORLD WAR IN 10
YEARS
YES
NO
0
54
0
82
Overall Percentage
Percentage
Correct
.0
100.0
60.3
a. Constant is included in the model.

b. The cut value is .500
The proportional by chance accuracy rate was computed by

calculating the proportion of cases for each group based on the
number of cases in each group in the classification table at Step
0, and then squaring and summing the proportion of cases in
each group (0.397 + 0.603 = 0.521).
The proportional by chance accuracy criteria is 65.2% (1.25 x
52.1% = 65.2%).
ters II
Slide
18
Problem 1
In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of
a statistic? Assume that there is no problem with missing data, outliers, or influential cases,
and that the validation analysis will confirm the generalizability of the results. Use a level of
significance of 0.05 for evaluating the statistical relationship.
The variables "age" [age], "sex" [sex], and "liberal or conservative political views" [polviews]
were useful predictors for distinguishing between groups based on responses to "seen x-rated
movie in last year" [xmovie]. These predictors differentiate survey respondents who have not
seen an x-rated movie from survey respondents who have seen an x-rated movie.
Survey respondents who were older were more likely to have not seen an x-rated movie. A one
unit increase in age increased the odds that survey respondents have not seen an x-rated movie
by 3.9%. Survey respondents who were female were approximately six and three quarters times
more likely to have not seen an x-rated movie. Survey respondents who were more
conservative were more likely to have not seen an x-rated movie. A one unit increase in liberal
or conservative political views increased the odds that survey respondents have not seen an xrated movie by approximately one and a quarter times.
1.
2.
3.
4.
True
True with caution
False
Inappropriate application of a statistic
ters II
Slide
19
Dissecting problem 1 - 1
and that the validation analysis will confirm the generalizability of the results. Use a level
of significance of 0.05 for evaluating the statistical relationship.
For these problems, we will
were useful predictors for distinguishing
between
groupsis based
on responses to "seen x-rated
assume
that there
no problem
movie in last year" [xmovie]. These predictors
differentiate
survey
with missing
data, outliers,
or respondents who have not
seen an x-rated movie from survey respondents
who have
an x-rated movie.
influential cases,
and seen
that the
validation analysis will confirm

the generalizability of the
Survey respondents who were older were
more likely to have not seen an x-rated movie. A one
results
unit increase in age increased the odds that survey respondents have not seen an x-rated movie
this problem,
we are told to six and three quarters times
by 3.9%. Survey respondents who wereInfemale
were approximately
use
0.05
as
alpha
for the
more likely to have not seen an x-rated movie. Survey respondents
who were more
logistic regression.
conservative were more likely to have not seen an x-rated movie. A one unit increase in liberal
or conservative political views increased the odds that survey respondents have not seen an xrated movie by approximately one and a quarter times.
1.
2.
3.
4.
True
True with caution
False
ters II
Slide
20
The
variables
listed first is
in the
the following
problem statement true, false, or an incorrect application of
In the
dataset
GSS2000.sav,
statement
are the
a statistic?
Assume
thatindependent
there is no variables
problem with missing data, outliers, or influential cases,
(IVs): "age" [age], "sex" [sex], and "liberal
and that
the validation analysis will confirm the generalizability of the results. Use a level of
or conservative political views" [polviews].
The variables "age" [age], "sex" [sex], and "liberal or conservative political views"
[polviews] were useful predictors for distinguishing between groups based on responses to
"seen x-rated movie in last year" [xmovie]. These predictors differentiate survey respondents
who have not seen an x-rated movie from survey respondents who have seen an x-rated movie.
The variable
used to
define
Survey
respondents
who
were older were more likely to have not seen an x-rated movie. A one
groups is the dependent
unit
increase in age increased the odds that survey respondents have not seen an x-rated movie
variable (DV): "seen x-rated
bymovie
3.9%. in
Survey
respondents
who were female were approximately six and three quarters times
last year"
[xmovie].
more likely to have not seen an x-rated movie. Survey
respondents
were
When
a problemwho
states
thatmore
a list of
conservative were more likely to have not seen an x-rated
movie.
A one unit
in liberal
independent
variables
can increase
distinguish
among
groups
and doeshave
not identify
or conservative political views increased the odds that
survey
respondents
not seen an xcontrol variable or an order of
rated movie by approximately one and a quarter times.
importance for the variables, we do a
logistic regression entering all of the
variables simultaneously.
ters II
Slide
21
SPSS logistic regression models the relationship by computing
the changes in the likelihood of falling in the category of the
dependent variable which had the highest numerical code.
responses to
an x-rated
movie were
In the datasetThe
GSS2000.sav,
is seeing
the following
statement
true,coded:
false, or an incorrect application of
1=
Yes
and
2
=
No.
The SPSS output will model the changes in the likelihood of
significance of
0.05
for evaluating
statistical
not
seeing
an x-rated the
movie
becauserelationship.
the code for No is 2.
movie in last year" [xmovie]. These predictors differentiate survey respondents who have
not seen an x-rated movie from survey respondents who have seen an x-rated movie.
Survey respondents who were older were more likely to have not seen an x-rated movie. A
one unit increase in age increased the odds that survey respondents have not seen an xrated movie by 3.9%. Survey respondents who were female were approximately six and
three quarters times moreThe
likely
to have not
seen
an x-rated
movie. Survey respondents
statements
of the
specific
relationships
who were more conservative
were independent
more likely variables
to have not
between
and seen
the an x-rated movie. A one
dependent variable
are views
all phrased
in terms
unit increase in liberal or conservative
political
increased
the odds that survey
on movie
not seeing
an x-rated movie.
respondents have not seenofanimpact
x-rated
by approximately
one and a quarter times.
ters II
Slide
22
and that the validation analysis will
generalizability
of the results. Use a level of
Theconfirm
specific the
relationships
for the independent
significance of 0.05 for evaluatingvariables
the statistical
relationship.
listed in the problem indicate the direction
of the relationship, increasing or decreasing the
likelihood of falling in the modeled group, and the
The variables "age" [age], "sex" [sex],
and "liberal or conservative political views" [polviews]
amount of change in the odds associated with a
between groups based on responses to "seen x-rated
one-unit change in the independent variable.
three quarters times more likely to have not seen an x-rated movie. Survey respondents
who were more conservative were more likely to have not seen an x-rated movie. A one
unit increase in liberal or conservative political views increased the odds that survey
respondents have not seen an x-rated movie by approximately one and a quarter times.
1.
2.
3.
4.
In order for the logistic regression question to be

True
true, the overall relationship must be statistically
True with caution
significant, there must be no evidence of a flawed
numerical analysis, the classification accuracy
False
rate must be substantially better than could be
obtained by chance alone, and each significant
relationship must be interpreted correctly.
ters II
Slide
23
LEVEL OF MEASUREMENT - 1
unit increase in ageLogistic
increased
the odds
that survey
respondents
regression
requires
that the
dependenthave not seen an x-rated movie
variable
be
non-metric
and
the
independent
or movie.
dichotomous.
xmore likely to havevariables
not seenbe
anmetric
x-rated
Survey"seen
respondents
who were more
rated movie in last year" [xmovie] is an
conservative were more
likely tovariable,
have not
seensatisfies
an x-rated
dichotomous
which
the movie.
level of A one unit increase in liberal
or conservative political
measurement
views increased
requirement.
the odds that survey respondents have not seen an xrated movie by approximately one and a quarter times.
1.
2.
3.
4.
It contains two categories: survey respondents

who had seen an x-rated movie in the last year
and survey respondents who had not seen an xrated movie in the last year.
True
True with caution
False
ters II
Slide
24
"Age" [age] is an interval level
"Sex" [sex] is a dichotomous
variable,
which
satisfies
the
level
or dummy-coded
In the dataset GSS2000.sav, is the following statement true,
false, or an nominal
incorrect application of
of measurement requirements for
variable
which may
be
a
statistic?
Assume
that
there
is
no
problem
with
missing
data,
outliers,
or
influential cases,
logistic regression analysis.
included in logistic regression.
"Liberal
or conservative
views"
unit increase in age increased the odds that survey
respondents
havepolitical
not seen
an x-rated movie
[polviews]
is
an
ordinal
level
variable.
If
by 3.9%. Survey respondents who were female were approximately six and three quarters
times
we
follow
the
convention
of
treating
ordinal
level variables
metric
conservative were more likely to have not seen
an x-rated
movie. Aas
one
unit increase in liberal
variables, the level of measurement
or conservative political views increased the odds
that
survey
respondents
have not seen an xrequirement for logistic regression
rated movie by approximately one and a quarter
times.
analysis
is satisfied. Since some data
1.
2.
3.
4.
True
True with caution
False
analysts do not agree with this

convention, a note of caution should be
included in our interpretation.
ters II
Slide
25
Request simultaneous logistic regression
Select the Regression |

Binary Logistic
command from the
Analyze menu.
ters II
Slide
26
Selecting the dependent variable
First, highlight the

dependent variable
xmovie in the list
of variables.
Second, click on the right

arrow button to move the
dependent variable to the
Dependent text box.
ters II
Slide
27
Selecting the independent variables
Move the independent

variables listed in the
problem to the
Covariates list box.
ters II
Slide
28
Specifying the method for including variables

SPSS provides us with two methods for including
variables: to enter all of the independent variables
at one time, and a stepwise method for selecting
variables using a statistical test to determine the
order in which variables are included.
SPSS also supports the specification of "Blocks" of
variables for testing hierarchical models.
Since the problem

states that there is a
relationship without
requesting the best
predictors, we specify
Enter as the method for
including variables.
ters II
Slide
29
Completing the logistic regression request
Click on the OK
button to request
the output for the
The logistic procedure supports the selection of subsets of

cases, automatic recoding of nominal variables, saving
diagnostic statistics like standardized residuals and Cook's
distance, and options for additional statistics. However,
none of these are needed for this analysis.
ters II
Slide
30
Sample size ratio of cases to variables

Case Processing Summary
Unweighted Cases
Selected Cases
Unselected Cases
Total
N
Included in Analysis
Missing Cases
Total
177
93
270
0
270
Percent
65.6
34.4
100.0
.0
100.0
a. If weight is in effect, see classification table for the total

number of cases.
The minimum ratio of valid cases to

independent variables for logistic regression is
10 to 1, with a preferred ratio of 20 to 1. In this
analysis, there are 177 valid cases and 3
independent variables. The ratio of cases to
independent variables is 59.0 to 1, which
satisfies the minimum requirement. In addition,
the ratio of 59.0 to 1 satisfies the preferred
ratio of 20 to 1.
ters II
Slide
31
OVERALL RELATIONSHIP BETWEEN

INDEPENDENT AND DEPENDENT VARIABLES
Omnibus Tests of Model Coefficients
Step 1
Step
Block
Model
Chi-square
39.668
39.668
39.668
df
3
3
3
Sig.
.000
.000
.000
The presence of a relationship between the dependent

variable and combination of independent variables is
based on the statistical significance of the model chisquare at step 1 after the independent variables have
been added to the analysis.
In this analysis, the probability of the model chi-square
(39.668) was <0.001, less than or equal to the level of
significance of 0.05. The null hypothesis that there is
no difference between the model with only a constant
and the model with independent variables was rejected.
The existence of a relationship between the
independent variables and the dependent variable was
supported.
ters II
Slide
32
NUMERICAL PROBLEMS
Variables in the Equation
Step
a
1
AGE
SEX
POLVIEWS
Constant
B
.038
1.901
.306
-4.590
S.E.
.014
.410
.135
1.045
Wald
7.629
21.452
5.110
19.302
df
1
1
1
1
Sig.
.006
.000
.024
.000
Exp(B)
1.039
6.689
1.358
.010
a. Variable(s) entered on step 1: AGE, SEX, POLVIEWS.
Multicollinearity in the logistic regression solution is detected

by examining the standard errors for the b coefficients. A
standard error larger than 2.0 indicates numerical problems,
such as multicollinearity among the independent variables,
zero cells for a dummy-coded independent variable because
all of the subjects have the same value for the variable, and
'complete separation' whereby the two groups in the
dependent event variable can be perfectly separated by
scores on one of the independent variables. Analyses that
indicate numerical problems should not be interpreted.
None of the independent variables in this analysis had a
standard error larger than 2.0. (The check for standard
errors larger than 2.0 does not include the standard error
for the Constant.)
ters II
Slide
33
RELATIONSHIP OF INDIVIDUAL INDEPENDENT

VARIABLES TO DEPENDENT VARIABLE - 1
The probability of the Wald statistic for the variable age
was 0.006, less than or equal to the level of significance
of 0.05. The null hypothesis that the b coefficient for age
was equal to zero was rejected. This supports the
relationship that "survey respondents who were older
were more likely to have not seen an x-rated movie."
Step
a
1
AGE
SEX
POLVIEWS
Constant
B
.038
1.901
.306
-4.590
S.E.
.014
.410
.135
1.045
Wald
7.629
21.452
5.110
19.302
df
1
1
1
1
Sig.
.006
.000
.024
.000
Exp(B)
1.039
6.689
1.358
.010
The value of Exp(B) was 1.039 which implies that a

one unit increase in age increased the odds that
survey respondents have not seen an x-rated movie
by 3.9%. This confirms the statement of the amount
of change in the likelihood of belonging to the
modeled group of the dependent variable associated
with a one unit change in the independent variable,
age.
ters II
Slide
34

The probability of the Wald statistic for the variable sex
was <0.001, less than or equal to the level of
significance of 0.05. The null hypothesis that the b
coefficient for sex was equal to zero was rejected. This
supports the relationship that "survey respondents who
were female were approximately six and three quarters
times more likely to have not seen an x-rated movie."

Step
a
1
AGE
SEX
POLVIEWS
Constant
B
.038
1.901
.306
-4.590
S.E.
.014
.410
.135
1.045
Wald
7.629
21.452
5.110
19.302
df
1
1
1
1
Sig.
.006
.000
.024
.000
The value of Exp(B) was 6.689 which implies

that a one unit increase in sex increased the
odds by approximately six and three
quarters times that survey respondents
have not seen an x-rated movie.
Exp(B)
1.039
6.689
1.358
.010
ters II
Slide
35

The probability of the Wald statistic for the variable liberal or
conservative political views was 0.024, less than or equal to the
level of significance of 0.05. The null hypothesis that the b
coefficient for liberal or conservative political views was equal to
zero was rejected. This supports the relationship that "survey
respondents who were more conservative were more likely to have
not seen an x-rated movie." Liberal or conservative political views is
an ordinal variable that is coded so that higher numeric values are
associated with survey respondents who were more conservative.

Step
a
1
AGE
SEX
POLVIEWS
Constant
B
.038
1.901
.306
-4.590
S.E.
.014
.410
.135
1.045
Wald
7.629
21.452
5.110
19.302
df
1
1
1
1
Sig.
.006
.000
.024
.000
The value of Exp(B) was 1.358 which implies that

a one unit increase in liberal or conservative
political views increased the odds that survey
respondents have not seen an x-rated movie by
approximately one and a quarter times.
Exp(B)
1.039
6.689
1.358
.010
ters II
Slide CLASSIFICATION USING THE LOGISTIC REGRESSION MODEL:
by chance accuracy rate
36
The independent variables could be characterized as useful
predictors distinguishing survey respondents who have not
seen an x-rated movie from survey respondents who have
seen an x-rated movie if the classification accuracy rate was
substantially higher than the accuracy attainable by chance
alone. Operationally, the classification accuracy rate should
be 25% or more higher than the proportional by chance
accuracy rate.
Step 0
Observed
SEEN X-RATED MOVIE
IN LAST YEAR
YES
NO
Overall Percentage
Predicted
SEEN X-RATED MOVIE
IN LAST YEAR
YES
NO
0
45
0
132
Percentage
Correct
.0
100.0
74.6
Thecut
proportional
b. The
value is .500 by chance accuracy rate was computed by first
calculating the proportion of cases for each group based on the number
of cases in each group in the classification table at Step 0. The
proportion in the "YES" group is 45/177 = 0.254. The proportion in the
"No" group is 132/177 = 0.746.
Then, we square and sum the proportion of cases in each group (0.254
+ 0.746 = 0.621). 0.621 is the proportional by chance accuracy rate.
ters II
criteria for classification accuracy
37
Step 1
Observed
SEEN X-RATED MOVIE
IN LAST YEAR
YES
NO
Predicted
SEEN X-RATED MOVIE
IN LAST YEAR
YES
NO
19
26
9
123
Overall Percentage
The accuracy rate computed by SPSS was 80.2%

which was greater than or equal to the
proportional by chance accuracy criteria of
77.6% (1.25 x 62.1% = 77.6%).
The criteria for classification accuracy is
satisfied.
Percentage
Correct
42.2
93.2
80.2
ters II
Slide
38
Answering the question in problem 1 - 1

The variables "age" [age], "sex" [sex], and "liberal or conservative political views"
[polviews] were useful predictors for distinguishing between groups based on responses to
"seen x-rated movie in last year" [xmovie]. These predictors differentiate survey
respondents who have not seen an x-rated movie from survey respondents who have seen
an x-rated movie.
We
found
a statistically
significant
overall
Survey respondents who were older
were
more
likely to have
not seen
an x-rated movie. A one
between
the combination
of seen an x-rated movie
unit increase in age increased the relationship
odds that survey
respondents
have not
independent variables and the dependent
variable.
conservative were more likely to have
seen
an x-rated
A one
unit increase
in liberal
Therenot
was
no evidence
of movie.
numerical
problems
in
or conservative political views increased
the odds that survey respondents have not seen an xthe solution.
rated movie by approximately one and a quarter times.
Moreover, the classification accuracy surpassed
the proportional by chance accuracy criteria,
supporting the utility of the model.
ters II
Slide
39

significance of 0.05 for evaluating
the statistical
We verified
that eachrelationship.
statement about the
relationship between an independent variable and
the dependent variable was correct in both
The variables "age" [age], "sex"
[sex], and "liberal or conservative political views" [polviews]
direction of the relationship and the change in
betweenwith
groups
based on
responses
likelihood associated
a one-unit
change
of theto "seen x-rated
movie in last year" [xmovie]. independent
These predictors
differentiate survey respondents who have not
variable.
three quarters times more likely to have not seen an x-rated movie. Survey respondents
who were more conservative were more likely to have not seen an x-rated movie. A one
unit increase in liberal or conservative political views increased the odds that survey
respondents have not seen an x-rated movie by approximately one and a quarter times.
1.
2.
3.
4.
True
True with caution
False
The answer to the question is true

with caution.
A caution is added because of the
inclusion of ordinal level variables.
ters II
Slide
40
Problem 2
After controlling for the effect of the variable "sex" [sex] on "should marijuana be made legal"
[grass], the variable "general happiness" [happy] and "confidence in the executive branch of the
federal government" [confed] were useful predictors for distinguishing between groups based
on responses to "should marijuana be made legal" [grass]. These predictors differentiate survey
respondents who have been less supportive that the use of marijuana should be made legal
from survey respondents who have been more supportive that the use of marijuana should be
made legal.
Survey respondents who were less happy overall were less likely to have been less supportive
that the use of marijuana should be made legal. A one unit increase in general happiness
decreased the odds that survey respondents have been less supportive that the use of
marijuana should be made legal by 66.9%. Survey respondents who had less confidence in the
executive branch of the federal government were less likely to have been less supportive that
the use of marijuana should be made legal. A one unit increase in confidence in the executive
branch of the federal government decreased the odds that survey respondents have been less
supportive that the use of marijuana should be made legal by 42.8%.
1.
2.
3.
4.
True
True with caution
False
ters II
Slide
41
For [happy]
these problems,
we will in the executive branch of the
[grass], the variable "general happiness"
and "confidence
federal government" [confed] were useful
predictors
forisdistinguishing
assume
that there
no problem between groups based
on responses to "should marijuana be with
mademissing
legal" data,
[grass].
These or
predictors differentiate survey
outliers,
respondents who have been less supportive
that
the
use
of
marijuana
should be made legal
influential cases, and that the
from survey respondents who have been
more supportive
that
the use of marijuana should be
validation
analysis will
confirm
made legal.
the generalizability of the
results
In this
problem,
areincrease
told to in general happiness
that the use of marijuana should be made
legal.
A onewe
unit
use 0.05
as alpha
decreased the odds that survey respondents
have
been for
lessthe
supportive that the use of
marijuana should be made legal by 66.9%.
Survey
respondents who had less confidence in the
logistic
regression.
1.
2.
3.
4.
True
True with caution
False
ters II
Slide
42
The variables listed first in the problem statement are
the independent variables (IVs): "sex" [sex] , "general
happiness" [happy], and "confidence in the executive
branch of the federal government" [confed].
Sex is a control variable and general happiness and
In the dataset
GSS2000.sav,
is the branchy
followingare
statement
true, false, or an incorrect application of
confidence
in the executive
predictors.
a statistic? Assume that there is no problem with missing data, outliers, or influential cases, and
that the validation analysis will confirm the generalizability of the results. Use a level of
After controlling for the effect of the variable "sex" [sex] on "should marijuana be made
legal" [grass], the variable "general happiness" [happy] and "confidence in the executive
branch of the federal government" [confed] were useful predictors for distinguishing
between groups based on responses to "should marijuana be made legal" [grass]. These
predictors differentiate survey respondents who have been less supportive that the use of
marijuana should be made legal from survey respondents who have been more supportive that
the use of marijuana should be made legal.
The variable used to define groups

is the dependent variable (DV):
Survey
were
less happy overall were less likely to have been less supportive
"shouldrespondents
marijuana bewho
made
legal"
that
the
use
of
marijuana
should
be made legal. A one unit increase in general happiness
[grass].
decreased the odds that survey respondents have been
less supportive that the use of marijuana
When a problem identifies control
should be made legal by 66.9%. Survey respondents who had less confidence in the executive
do supportive
a hierarchical
branch of the federal government were less likely tovariables,
have beenweless
that the use of
regression
entering
the branch of
marijuana should be made legal. A one unit increaselogistic
in confidence
in the
executive
variables
in SPSShave
blocks.
the federal government decreased the odds that survey
respondents
been less supportive
that the use of marijuana should be made legal by 42.8%.
ters II
Slide
43
SPSS logistic regression models the relationship by computing

the changes in the likelihood of falling in the category of the
dependent variable which had the highest numerical code.
The responses to seeing an x-rated movie were coded:
1=dataset
Legal and
2 = Not Legal.
In the
GSS2000.sav,
is the following statement true, false, or an incorrect application of
and that
the validation
willchanges
confirminthe
The SPSS
output willanalysis
model the
thegeneralizability
likelihood of
significance
of
0.05
for
evaluating
the
statistical
relationship.
being less supportive of legalizing marijuana because 2
corresponds to not legalizing marijuana.
on responses to "should marijuana be made legal" [grass]. These predictors differentiate
survey respondents who have been less supportive that the use of marijuana should be
made legal from survey respondents who have been more supportive that the use of
marijuana should be made legal.
marijuana should be made
legal by 66.9%.
had less confidence in the
The statements
of the Survey
specificrespondents
relationshipswho
between
executive branch of the
federal
government
were
less
likely
to
have
been less supportive that
independent variables and the dependent variable are all
in terms of impact on being less supportive of
branch of the federal phrased
government
decreased the odds that survey respondents have been less
legalizing
marijuana.
supportive that the use of marijuana
should be made legal by 42.8%.
ters II
Slide
44
The specific relationships for the independent
variables listed in the problem indicate the direction
After controlling for the effect ofofthe
variable "sex" [sex] on "should marijuana be made legal"
the relationship, increasing or decreasing the
[grass], the variable "general happiness"
[happy] and "confidence in the executive branch of the
of change
in the odds
associated
with
a
on responses to "should marijuanaamount
be made
legal" [grass].
These
predictors
differentiate
survey
one-unit
change
in
the
independent
variable.
made legal.
Survey respondents who were less happy overall were less likely to have been less
supportive that the use of marijuana should be made legal. A one unit increase in general
happiness decreased the odds that survey respondents have been less supportive that the
use of marijuana should be made legal by 66.9%. Survey respondents who had less
confidence in the executive branch of the federal government were less likely to have been
less supportive that the use of marijuana should be made legal. A one unit increase in
confidence in the executive branch of the federal government decreased the odds that
survey respondentsInhave
less
supportive
that question
the use to
of be
marijuana
orderbeen
for the
logistic
regression
true, the should be made
legal by 42.8%.
relationship between the predictors and the dependent variable
1.
2.
3.
4.
must be statistically significant after entering the control
variables in a previous stage, there must be no evidence of a

True
flawed numerical analysis, the classification accuracy rate must
True with caution
be substantially better than could be obtained by chance alone,
False
and each significant relationship must be interpreted correctly.
ters II
Slide
45
on responses to "should marijuana be made legal" [grass]. These predictors differentiate
survey respondents who have been less supportive that the use of marijuana should be
made legal from survey respondents who have been more supportive that the use of
marijuana should be made legal.
that the use of Logistic
marijuana
should be
maderequires
legal. A that
one the
unitdependent
increase in general happiness
regression
analysis
decreased the odds
thatbesurvey
respondents
have
been less variables
supportive that the use of
variable
dichotomous
and the
independent
be be
metric
or legal
dichotomous.
"Should
be made
marijuana should
made
by 66.9%.
Surveymarijuana
respondents
who had less confidence in the
legal"
[grass]
is
a
dichotomous
variable,
which
satisfies
the level
of measurement
requirement
forincrease
the dependent
the use of marijuana
should
be made legal.
A one unit
in confidence in the executive
variable.
supportive thatItthe
use of marijuana should be made legal by 42.8%.
contains two categories:
survey respondents who have been less supportive that
the use of marijuana should be made legal
True
survey respondents who have been more supportive
True with caution
that the use of marijuana should be made legal
1.
2.
3. False
4. Inappropriate application of a statistic
ters II
Slide
46
"Sex" [sex] is a dichotomous or
dummy-coded nominal variable which
In the dataset
GSS2000.sav,
following
statement true, false, or an incorrect application of
may
be includedisinthe
logistic
regression
a statistic? Assume
that there is no problem with missing data, outliers, or influential cases,
analysis.
After controlling for the effect of the variable "sex" [sex] on "should marijuana be made
legal" [grass], the variable "general happiness" [happy] and "confidence in the executive
branch of the federal government" [confed] were useful predictors for distinguishing
between groups based on responses to "should marijuana be made legal" [grass]. These
predictors differentiate survey respondents who have been less supportive that the use of
marijuana should be made legal from survey respondents who have been more supportive that
the use of marijuana should be made legal.
"General happiness" [happy] and "confidence in the
executive
the less
federal
government"
Survey respondents who were less
happy branch
overallof
were
likely
to have been less supportive
[confed]
are
ordinal
level
variables.
If we
follow the
that the use of marijuana should be made legal. A one unit increase
in general
happiness
convention
of
treating
ordinal
level
variables
decreased the odds that survey respondents have been less supportive thatasthe use of
variables,
the
level of measurement
marijuana should be made legalmetric
by 66.9%.
Survey
respondents
who had less confidence in the
requirement for logistic regression analysis is
executive branch of the federalsatisfied.
government
less
likely
to have
been
less with
supportive that
Sincewere
some
data
analysts
do not
agree
the use of marijuana should be this
made
legal.
A
one
unit
increase
in
confidence
in
the executive
convention, a note of caution should be included
branch of the federal government
decreased
the odds that survey respondents have been less
in our
interpretation.
ters II
Slide
47
Request hierarchical logistic regression

Binary Logistic
command from the
Analyze menu.
ters II
Slide
48

dependent variable
grass in the list of
variables.

Dependent text box.
ters II
Slide
49
Selecting the control independent variables
First, move the control

independent variable,
sex, listed in the
problem to the
Second, click on the

Next button to add the
new block that will
contain the predictors.
ters II
Slide
50
Adding the predictor independent variables
First, move the

predictors to the
ters II
Slide
51
In our hierarchical
regression, we will specify
that all of the variables in
each block be entered
simultaneously when the
block is entered.
ters II
Slide
52
Click on the OK
button to request
the output for the
The logistic procedure supports the selection of subsets of

cases, automatic recoding of nominal variables, saving
diagnostic statistics like standardized residuals and Cook's
distance, and options for additional statistics. However,
none of these are needed for this analysis.
ters II
Slide
53

Unweighted Cases
Selected Cases
Unselected Cases
Total
N
Missing Cases
Total
163
107
270
0
270
Percent
60.4
39.6
100.0
.0
100.0

number of cases.

independent variables for logistic regression is
10 to 1, with a preferred ratio of 20 to 1. In this
analysis, there are 163 valid cases and 3
independent variables. The ratio of cases to
independent variables is 54.33 to 1, which
satisfies the minimum requirement. In addition,
the ratio of 54.33 to 1 satisfies the preferred
ratio of 20 to 1.
ters II
Slide
54

In a hierarchical logistic regression, the presence of a relationship

between the dependent variable and combination of independent
variables entered after the control variables have been included is
based on the statistical significance of the block chi-square for
the second block of variables in which the predictor independent
variables are included.
In this analysis, the probability of the block chi-square (17.467)
was <0.001, less than or equal to the level of significance of
0.05. The null hypothesis that there is no difference between the
model with only a constant and the control variables versus the
model with the predictor independent variables was rejected. The
contribution of the relationship between the predictor
independent variables and the dependent variable was supported.
ters II
Slide
55
NUMERICAL PROBLEMS
Step
a
1
SEX
HAPPY
CONFED
Constant
B
.154
-1.104
-.559
3.721
S.E.
.351
.354
.270
1.066
Wald
.194
9.739
4.290
12.195
df
1
1
1
1
Sig.
.660
.002
.038
.000
Exp(B)
1.167
.331
.572
41.308
a. Variable(s) entered on step 1: HAPPY, CONFED.

for the Constant.)
ters II
Slide
56

The probability of the Wald statistic for the variable general
happiness was 0.002, less than or equal to the level of
significance of 0.05. The null hypothesis that the b coefficient
for general happiness was equal to zero was rejected. This
supports the relationship that "survey respondents who were
less happy overall were less likely to have been less supportive
that the use of marijuana should be made legal." General
happiness is an ordinal variable that is coded so that lower
numeric values are associated with survey respondents who
were happier overall.

Step
a
1
SEX
HAPPY
CONFED
Constant
B
.154
-1.104
-.559
3.721
S.E.
.351
.354
.270
1.066
Wald
.194
9.739
4.290
12.195
df
1
1
1
1
Sig.
.660
.002
.038
.000

one unit increase in general happiness decreased the
odds that survey respondents have been less
supportive that the use of marijuana should be made
legal by 66.9%.
Exp(B)
1.167
.331
.572
41.308
ters II
Slide
57

The probability of the Wald statistic for the variable confidence in the
executive branch of the federal government was 0.038, less than or
equal to the level of significance of 0.05. The null hypothesis that the
b coefficient for confidence in the executive branch of the federal
government was equal to zero was rejected. This supports the
relationship that "survey respondents who had less confidence in the
executive branch of the federal government were less likely to have
been less supportive that the use of marijuana should be made legal."
Confidence in the executive branch of the federal government is an
ordinal variable that is coded so that lower numeric values are
associated with survey respondents who had more confidence in the
executive branch of the federal government.
Step
a
1
SEX
HAPPY
CONFED
Constant
B
.154
-1.104
-.559
3.721
S.E.
.351
.354
.270
1.066
Wald
.194
9.739
4.290
12.195
df
1
1
1
1
Sig.
.660
.002
.038
.000
The value of Exp(B) was 0.572 which implies

that a one unit increase in confidence in the
executive branch of the federal government
have been less supportive that the use of
marijuana should be made legal by 42.8%.
Exp(B)
1.167
.331
.572
41.308
ters II
58
predictors distinguishing survey respondents who have been
less supportive that the use of marijuana should be made
legal from survey respondents who have been more
supportive that the use of marijuana should be made legal if
the classification accuracy rate was substantially higher than
the accuracy attainable by chance alone. Operationally, the
classification accuracy rate should be 25% or more higher
than the proportional by chance accuracy rate.
Step 0
Observed
SHOULD MARIJUANA
BE MADE LEGAL
LEGAL
NOT LEGAL
Predicted
SHOULD MARIJUANA BE
MADE LEGAL
LEGAL
NOT LEGAL
0
57
0
106
Overall Percentage
b. The cut value is .500
The proportional by chance accuracy rate was computed by

calculating the proportion of cases for each group based on
the number of cases in each group in the classification table
at Step 0, and then squaring and summing the proportion of
cases in each group (0.350 + 0.650 = 0.545).
Percentage
Correct
.0
100.0
65.0
ters II
59
Step 1
Observed
SHOULD MARIJUANA
BE MADE LEGAL
LEGAL
NOT LEGAL
Predicted
SHOULD MARIJUANA BE
MADE LEGAL
LEGAL
NOT LEGAL
18
39
13
93
Overall Percentage
The accuracy rate computed by SPSS was 68.1%

which was greater than or equal to the
68.1% (1.25 x 54.5% = 68.1%).
satisfied.
Percentage
Correct
31.6
87.7
68.1
ters II
Slide
60

federal government" [confed] were useful predictors for distinguishing between groups based on
responses to "should marijuana be made legal" [grass]. These predictors differentiate survey
respondents who have been less supportive that the use of marijuana should be made legal from
survey respondents who have been more supportive that the use of marijuana should be made
legal.
We found ahave
statistically
significant
overall
been less
supportive
that the use of marijuana
relationship
between
the
predictor
independent
should be made legal by 66.9%. Survey respondents who had less confidence in the executive
variables
thetodependent
branch of the federal government were
less and
likely
have beenvariable.
less supportive that the use of
marijuana should be made legal. A one unit increase in confidence in the executive branch of
wasthat
no evidence
of numericalhave
problems
the federal government decreased There
the odds
survey respondents
been in
less supportive
the
solution.
that the use of marijuana should be
made
legal by 42.8%.
1.
2.
3.
4.
True
True with caution
False
Inappropriate application of a

statistic
ters II
Slide
61

significance of 0.05 for evaluating
the statistical relationship.
We verified that each statement about the
After controlling for the effect
the variable
"sex" was
[sex]correct
on "should
marijuana be made legal"
theofdependent
variable
in both
[grass], the variable "generaldirection
happiness"
[happy]
and "confidence
in theinexecutive branch of the
of the
relationship
and the change
federal government" [confed]likelihood
were useful
predictors
distinguishing
groups based
associated
with for
a one-unit
changebetween
of the
on responses to "should marijuana
be
made
legal"
[grass].
These
predictors
differentiate
survey
independent variable.
made legal.
marijuana should be made legal by 66.9%. Survey respondents who had less confidence in the
1.
2.
3.
4.
True
True with caution
False
The answer to the question is true

with caution.
A caution is added because of the

inclusion of ordinal level variables.
ters II
Slide
62
Problem 3
From the list of variables "highest academic degree" [degree], "total family income"
[income98], and "satisfaction with financial situation" [satfin], the most useful predictor for
distinguishing between groups based on responses to "expect u.s. in world war in 10 years"
[uswary] was "total family income" [income98]. These predictors differentiate survey
respondents who have been less positive that the United States would fight in another world
war within the next ten years from survey respondents who have been more positive that the
United States would fight in another world war within the next ten years.
The most important predictor for identifying survey respondents who have been less positive
that the United States would fight in another world war within the next ten years was total
family income.
Survey respondents who had higher total family incomes were more likely to have been less
positive that the United States would fight in another world war within the next ten years. A
one unit increase in total family income increased the odds that survey respondents have been
less positive that the United States would fight in another world war within the next ten years
by 10.0%.
1.
2.
3.
4.
True
True with caution
False
ters II
Slide
63
Dissecting Problem 3 - 1
[income98], and "satisfaction with financial
situation"
[satfin],
For these
problems,
we willthe most useful predictor for
distinguishing between groups based on
responses
to
"expect
u.s. in world war in 10 years"
assume that there is no problem
[uswary] was "total family income" [income98].
These
with missing
data,predictors
outliers, ordifferentiate survey
respondents who have been less positive
that
the
United
would fight in another world
influential cases, andStates
that the
war within the next ten years from survey
respondents
who
have
been more positive that the
validation analysis will confirm
United States would fight in another world
war within the
the generalizability
of next
the ten years.
results
that the United States would fight in another
world war
the
In this problem,
we within
are told
to next ten years was total
family income.
use 0.05 as alpha for the
one unit increase in total family income increased the odds that survey respondents have been
by 10.0%.
1.
2.
3.
4.
True
True with caution
False
ters II
Slide
64
The variables listed first in the
The variable used to
problem statement are the
define groups is the
independent variables (IVs): "highest
dependent variable (DV):
academic degree" [degree], "total
"expect u.s. in world war
family income" [income98], and
in 10
[uswary].
In the "satisfaction
dataset GSS2000.sav,
is thesituation"
following statement true, false,
or years"
an incorrect
application of
with financial
a statistic?
Assume
that
there
is
no
problem
with
missing
data,
outliers,
or
influential
cases, and
[satfin].
respondents who have been less positive that the United States would fight in another world war
within the next ten years from survey respondents who have been more positive that the United
States would fight in another world war within the next ten years.
family income.
Since the problem identifies
themore
mostlikely
usefultoofhave been less
Survey respondents who had higher total family incomes were
positive that the United States would fight in another world important
war withinpredictor,
the nextwe
ten do
years. A one
a
stepwise
logistic
unit increase in total family income increased the odds that survey respondents have been less
positive that the United States would fight in another world regression.
war within the next ten years by
10.0%.
ters II
Slide
65
From the list of variables "highest academic degree" [degree], "total family income" [income98],
and "satisfaction with financial situation" [satfin], the most useful predictor for distinguishing
between groups based on responses to "expect u.s. in world war in 10 years" [uswary] was "total
family income" [income98]. These predictors differentiate survey respondents who have been
less positive that the United States would fight in another world war within the next ten
years from survey respondents who have been more positive that the United States would
fight in another world war within the next ten years.
family income.
SPSS logistic regression models the relationship by computing the
changes
the
likelihood
fallingincomes
in the category
of the
Survey respondents
who in
had
higher
totalof
family
were more
likely to have been less
positive that thedependent
United States
would
fight
in the
another
world
war within
variable
which
had
highest
numerical
code.the next ten years. A one
positive that theThe
United
Statesto
would
fight
in in
another
world
waryears
withinwere
the next ten years by
responses
expect
u.s.
world war
in 10
10.0%.
coded: 1= Yes and 2 = No.
The SPSS output will model the changes in the likelihood of being
less positive that the United States would fight in another world
war within the next ten years.
ters II
Slide
66
The statements of the specific
significance of 0.05 for evaluating the statistical
relationship.
relationships between independent
variables and the dependent variable are
From the list of variables "highest academic
"total
family
income"
alldegree"
phrased[degree],
in terms of
impact
on being
[income98], and "satisfaction with financialless
situation"
[satfin],
the
most
useful
predictor for
positive that the United States would
distinguishing between groups based on responses
to "expect
u.s.
in within
world the
warnext
in 10 years"
fight in another
world
war
[uswary] was "total family income" [income98].
These
predictors
differentiate
survey
ten years.
war within the next ten years from survey respondents who have been more positive that the
United States would fight in another world war within the next ten years.
The most important predictor for identifying survey respondents who have been less
positive that the United States would fight in another world war within the next ten years
was total family income.
positive that the United States would fight in another world war within the next ten years.
A one unit increase in total family income increased the odds that survey respondents have
been less positive that the United States would fight in another world war within the next
ten years by 10.0%.
ters II
Slide
67
[uswary] was "total family income"
[income98]. These predictors differentiate survey
The specific relationships for the independent
respondents who have been less positive
that the United States would fight in another world
variables listed in the problem indicate the direction
war within the next ten years from
survey respondents who have been more positive that the
of the
relationship,
increasing
decreasing
United States would fight in another
world
war within
the nextorten
years. the
amount of change associated with a one-unit
The most important predictor forchange
identifying
respondents
who have been less positive
in thesurvey
independent
variable.
family income.
positive that the United States would fight in another world war within the next ten years.
A one unit increase in total family income increased the odds that survey respondents have
been less positive that the United States would fight in another world war within the next
ten years by 10.0%.
1.
2.
3.
4.
True
True with caution
In order for the logistic regression question to be true, the
False
relationship between the predictors selected for inclusion and the
Inappropriate application
a statistic
dependent of
variable
must be statistically significant, there must be
no evidence of a flawed numerical analysis, the classification
accuracy rate must be substantially better than could be obtained
by chance alone, and the order of entry and each significant
relationship must be interpreted correctly.
ters II
Slide
68
from survey respondents who have been more positive that the United States would fight in
another world war within the next ten years.
family income. Logistic regression analysis requires that the dependent variable
be dichotomous and the independent variables be metric or

dichotomous. "Expect u.s. in world war in 10 years" [uswary] is a
Survey respondents
who hadvariable,
higher total
family
incomes
were
dichotomous
which
satisfies
the level
of more
measurement
positive that the
United
States
would
fight
in
another
world
war
within
the next ten years. A one
requirement for the dependent variable.
positive that the United States would fight in another world war within the next ten years by
10.0%.
It contains two categories:
survey
respondents
States would fight in
True
survey respondents
True with caution
States would fight in
who have been less positive that the United

another world war within the next ten years
who have been more positive that the United
1.
2.
3. False
4. Inappropriate application of a statistic
ters II
Slide
69
"Highest academic degree" [degree], "total family
income" [income98], and "satisfaction with financial
situation" [satfin] are ordinal level variables. If we
follow the convention of treating ordinal level
variables as metric variables, the level of
measurement requirement for logistic regression
In the dataset GSS2000.sav, is the
following
statement
true,
false,
an incorrect
analysis
is satisfied.
Since
some
dataoranalysts
do notapplication of
with this
convention,
a note
of caution
should
a statistic? Assume that there isagree
no problem
with
missing data,
outliers,
or influential
cases, and
included
our interpretation.
that the validation analysis will be
confirm
theingeneralizability
family income.
positive that the United States would fight in another world war within the next ten years. A one
positive that the United States would fight in another world war within the next ten years by
10.0%.
ters II
Slide
70
Request stepwise logistic regression

Binary Logistic
command from the
Analyze menu.
ters II
Slide
71

dependent variable
uswary in the list
of variables.

Dependent text box.
ters II
Slide
72
Adding the independent variables
First, move the

predictors to the
ters II
Slide
73
In our stepwise logistic

regression, we specify
the Forward
Conditional method for
adding variables.
ters II
Slide
74
Adding options to the output
To add a summary of steps

at the end of the analysis
and specifications for
stepwise method, click on
the Options button.
ters II
Slide
75
Including a summary of steps
To obtain a summary of the steps

on which variables were added or
removed from the analysis, mark
the option button At last step in
the Display panel.
ters II
Slide
76
Specifications for stepwise method
Click on the
Continue button to
close the dialog box.
We can change the criteria for adding and

removing variables from the analysis by
changing the probability for entry and removal.
We will use the default level of significance of
0.05 for entry and 0.10 for removal.
ters II
Slide
77
Click on the OK
button to request
the output for the
ters II
Slide
78

Unweighted Cases
Selected Cases
Unselected Cases
Total
N
Missing Cases
Total
136
134
270
0
270
Percent
50.4
49.6
100.0
.0
100.0

number of cases.

independent variables for stepwise logistic
regression is 10 to 1, with a preferred ratio of
50 to 1. In this analysis, there are 136 valid
cases and 3 independent variables. The ratio of
cases to independent variables is 45.33 to 1,
which satisfies the minimum requirement.
However, the ratio of 45.33 to 1 does not satisfy
the preferred ratio of 50 to 1. A caution should
be added to the interpretation of the analysis
and a split sample validation should be
conducted.
ters II
Slide
79

The presence of a relationship between the dependent variable

and combination of independent variables is based on the
statistical significance of the model chi-square.
In this analysis, the probability of the model chi-square (9.001)
was 0.003, less than or equal to the level of significance of 0.05.
The null hypothesis that there is no difference between the model
with only a constant and the model with independent variables
was rejected. The existence of a relationship between the
independent variables and the dependent variable was supported.
ters II
Slide
80
NUMERICAL PROBLEMS
Step
a
1
INCOME98
Constant
B
.095
-1.033
S.E.
.033
.527
Wald
8.436
3.847
df
1
1
Sig.
.004
.050
Exp(B)
1.100
.356
a. Variable(s) entered on step 1: INCOME98.

for the Constant.)
ters II
Slide
81

VARIABLES TO DEPENDENT VARIABLE
The probability of the Wald statistic for the variable total family
income was 0.004, less than or equal to the level of significance
of 0.05. The null hypothesis that the b coefficient for total family
income was equal to zero was rejected. This supports the
relationship that "survey respondents who had higher total
family incomes were more likely to have been less positive that
the United States would fight in another world war within the
next ten years." Total family income is an ordinal variable that is
coded so that higher numeric values are associated with survey
respondents who had higher total family incomes.

Step
a
1
INCOME98
Constant
B
.095
-1.033
S.E.
.033
.527
Wald
8.436
3.847
df
1
1
Sig.
.004
.050
a. Variable(s) entered on step 1: INCOME98.

one unit increase in total family income increased the
odds that survey respondents have been less positive
that the United States would fight in another world
war within the next ten years by 10.0%.
Exp(B)
1.100
.356
ters II
IMPORTANCE OF INDIVIDUAL INDEPENDENT

VARIABLES TO DEPENDENT VARIABLE
Slide
82
The order of importance is based on the entry

order of the variables included in the stepwise
logistic regression. The entry order is
summarized in the Step Summary table, in
which we see which variable was added or
removed at each step.
Step Summarya,b
Step
1
Improvement
Chi-square
df
9.001
Sig.
.003
Chi-square
9.001
Model
df
Sig.
1
Correct
Class %
.003
67.6%
a. No more variables can be deleted from or added to the current model.

b. End block: 1
The most important predictor for identifying

survey respondents who have been less
positive that the United States would fight in
another world war within the next ten years
was total family income [INCOME98].
The importance of the predictors stated in
the problem is correct.
Variable
IN:
INCOME9
8
ters II
83
predictors distinguishing survey respondents who have been
less positive that the United States would fight in another
world war within the next ten years from survey respondents
who have been more positive that the United States would
fight in another world war within the next ten years if the
classification accuracy rate was substantially higher than the
accuracy attainable by chance alone. Operationally, the
classification accuracy rate should be 25% or more higher
than the proportional by chance accuracy rate.
Step 0
Observed
WAR IN 10 YEARS
YES
NO
Predicted
EXPECT U.S. IN
WORLD WAR IN 10
YEARS
YES
NO
0
54
0
82
Overall Percentage
b. The
cut proportional
value is .500 by chance accuracy rate was computed by
The
calculating the proportion of cases for each group based on

the number of cases in each group in the classification table
at Step 0, and then squaring and summing the proportion of
cases in each group (0.397 + 0.603 = 0.521).
Percentage
Correct
.0
100.0
60.3
ters II
84
Step 1
Observed
WAR IN 10 YEARS
YES
NO
Predicted
EXPECT U.S. IN
WORLD WAR IN 10
YEARS
YES
NO
20
34
10
72
Overall Percentage
The accuracy rate computed by SPSS was

67.6% which was greater than or equal to the
65.2% (1.25 x 52.1% = 65.2%).
satisfied.
Percentage
Correct
37.0
87.8
67.6
ters II
Slide
85

family income.
We found a statistically significant overall
relationship between the predictor independent
and the
dependent
variable.
Survey respondents who had highervariables
total family
incomes
were more
positive that the United States would fight in another world war within the next ten years. A one
was no
evidence
of numerical
problems in
unit increase in total family incomeThere
increased
the
odds that
survey respondents
have been less
the fight
solution.
positive that the United States would
in another world war within the next ten years by
10.0%.
1.
2.
3.
4.

True
True with caution
False
ters II
Slide
86

significance of 0.05 for
theeach
statistical
relationship.
Weevaluating
verified that
statement
about the
the dependent
variable was
correct
in both "total family income"
From the list of variables
"highest academic
degree"
[degree],
direction with
of thefinancial
relationship
and the
changethe
in most useful predictor for
[income98], and "satisfaction
situation"
[satfin],
likelihood
associated
with a one-unit
change
distinguishing between
groups based
on responses
to "expect
u.s.ofinthe
world war in 10 years"
independent
variable.
war within the next ten
from survey
respondents
who have
been more positive that the
We years
also verified
the order
of importance
for the
United States would fight
in another
world included
war within
thestepwise
next ten years.
independent
variables
in the
analysis.
family income.
Theodds
answer
the question
is true have been
one unit increase in total family income increased the
thattosurvey
respondents
with caution.
less positive that the United States would fight in another
world war within the next ten years
by 10.0%.
1.
2.
3.
4.
True
True with caution
False
A caution is added to the findings

because of the inclusion of ordinal
level independent variables. A
caution is added to the findings
because of the preferred sample
size is not met.
ters II
Slide
87
Steps in binary logistic regression:

level of measurement and initial sample size
The following is a guide to the decision process for answering
problems about the basic relationships in logistic regression:
Dependent dichotomous?
Independent variables
metric or dichotomous?
No
Inappropriate
application of
a statistic
Yes
Ratio of cases to
independent variables at
least 10 to 1?
Yes
Run logistic regression, using method for including
variables identified in the research question.
No
Inappropriate
application of
a statistic
ters II
Steps in logistic regression:

overall relationship and numerical problems
Slide
88
No
No
False
Hierarchical method of
entry used to include
independent variables?
Presence of relationship
confirmed by test of
model chi-square?
Yes
Presence of relationship
confirmed by test of
block chi-square?
No
False
Yes
Yes
Standard errors of
coefficients indicate
presence of numerical
problems (s.e. > 2.0)?
No
Yes
False
ters II
Slide
89

relationships between IV's and DV
Stepwise method of entry

used to include
independent variables?
Yes
No
Entry order of variables
interpreted correctly?
No
Yes
Relationships between
individual IVs and DV groups
interpreted correctly?
Yes
No
False
False
ters II
Slide
90

classification accuracy and adding cautions
Overall accuracy rate is

25% > than proportional
by chance accuracy rate?
No
False
Yes
Satisfies preferred ratio of

cases to IV's of 20 to 1
(50 to 1 for stepwise)
No
True with caution
Yes
One or more IV's are

ordinal level variables?
No
True
Yes
True with caution

LogisticRegression BasicRelationships

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

LogisticRegression BasicRelationships

Uploaded by

Copyright:

Available Formats

SW388R7

Data Analysis &

Logistic Regression Basic Relationships

Logistic regression is used to analyze relationships between a

Logistic regression combines the independent variables to

This will create some awkward wording in our problems. Our

What logistic regression predicts

The variate or value produced by logistic regression is a

If the probability for group membership in the modeled

For any given case, logistic regression computes the probability

Level of measurement requirements

Logistic regression analysis requires that the dependent

Logistic regression analysis requires that the independent

If an independent variable is nominal level and not

If an independent variable is ordinal, we will attach the usual

Logistic regression does not make any assumptions of normality,

Because it does not impose these requirements, it is preferred

Sample size requirements

The minimum number of cases per independent variable is 10,

For preferred case-to-variable ratios, we will use 20 to 1 for

Methods for including variables

There are three methods available for including variables in the

For all methods, the contribution to the model is measures by

Multiple regression uses the least-squares method to find the

Logistic regression uses maximum-likelihood estimation to

Maximum-likelihood estimation is an interative procedure that

Overall test of relationship

The overall test of relationship among the independent

This difference in likelihood follows a chi-square distribution,

The significance test for the model chi-square is our statistical

Beginning logistic regression model

The SPSS output for logistic

The initial -2 log

We will not routinely request

Ending logistic regression model

After the independent

Relationship of Individual Independent

There is a test of significance for the relationship between an

The individual coefficients represent change in the probability of being

If a coefficient is positive, its transformed log value will be greater

The maximum likelihood method used to calculate logistic

Strength of logistic regression relationship

While logistic regression does compute correlation measures to

A more useful measure to assess the utility of a logistic

Evaluating usefulness for logistic models

The benchmark that we will use to characterize a logistic

Even if the independent variables had no relationship to the

The estimate of by chance accuracy that we will use is the

Comparing accuracy rates

To characterize our model as useful, we compare the overall

SPSS reports the overall accuracy rate in

Computing by chance accuracy

a. Constant is included in the model.

The proportional by chance accuracy rate was computed by

validation analysis will confirm

In order for the logistic regression question to be

It contains two categories: survey respondents

analysts do not agree with this

Request simultaneous logistic regression

Select the Regression |

Selecting the dependent variable

First, highlight the