Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 90

SW388R7

Data Analysis &


Computers II

Logistic Regression Basic Relationships

Slide 1

Logistic Regression
Describing Relationships
Classification Accuracy
Sample Problems

Compu
ters II

Logistic regression

Slide 2

Logistic regression is used to analyze relationships between a


dichotomous dependent variable and metric or dichotomous
independent variables. (SPSS now supports Multinomial Logistic
Regression that can be used with more than two groups, but our
focus here is on binary logistic regression for two groups.)

Logistic regression combines the independent variables to


estimate the probability that a particular event will occur, i.e.
a subject will be a member of one of the groups defined by the
dichotomous dependent variable. In SPSS, the model is always
constructed to predict the group with higher numeric code. If
responses are coded 1 for Yes and 2 for No, SPSS will predict
membership in the No category. If responses are coded 1 for No
and 2 for Yes, SPSS will predict membership in the Yes category.
We will refer to the predicted event for a particular analysis as
the modeled event.

This will create some awkward wording in our problems. Our


only option for changing this is to recode the variable.

Compu
ters II

What logistic regression predicts

Slide 3

The variate or value produced by logistic regression is a


probability value between 0.0 and 1.0.

If the probability for group membership in the modeled


category is above some cut point (the default is 0.50), the
subject is predicted to be a member of the modeled group. If
the probability is below the cut point, the subject is predicted
to be a member of the other group.

For any given case, logistic regression computes the probability


that a case with a particular set of values for the independent
variable is a member of the modeled category.

Compu
ters II

Level of measurement requirements

Slide 4

Logistic regression analysis requires that the dependent


variable be dichotomous.

Logistic regression analysis requires that the independent


variables be metric or dichotomous.

If an independent variable is nominal level and not


dichotomous, the logistic regression procedure in SPSS has a
option to dummy code the variable for you.

If an independent variable is ordinal, we will attach the usual


caution.

Compu
ters II

Assumptions

Slide 5

Logistic regression does not make any assumptions of normality,


linearity, and homogeneity of variance for the independent
variables.

Because it does not impose these requirements, it is preferred


to discriminant analysis when the data does not satisfy these
assumptions.

Compu
ters II

Sample size requirements

Slide 6

The minimum number of cases per independent variable is 10,


using a guideline provided by Hosmer and Lemeshow, authors of
Applied Logistic Regression, one of the main resources for
Logistic Regression.

For preferred case-to-variable ratios, we will use 20 to 1 for


simultaneous and hierarchical logistic regression and 50 to 1 for
stepwise logistic regression.

Compu
ters II

Methods for including variables

Slide 7

There are three methods available for including variables in the


regression equation:
the simultaneous method in which all independents are
included at the same time
The hierarchical method in which control variables are
entered in the analysis before the predictors whose effects
we are primarily concerned with.
The stepwise method (forward conditional in SPSS) in which
variables are selected in the order in which they maximize
the statistically significant contribution to the model.

For all methods, the contribution to the model is measures by


model chi-square is a statistical measure of the fit between the
dependent and independent variables, like R.

Compu
ters II

Computational method

Slide 8

Multiple regression uses the least-squares method to find the


coefficients for the independent variables in the regression
equation, i.e. it computed coefficients that minimized the
residuals for all cases.

Logistic regression uses maximum-likelihood estimation to


compute the coefficients for the logistic regression equation.
This method finds attempts to find coefficients that match the
breakdown of cases on the dependent variable.

The overall measure of how will the model fits is given by the
likelihood value, which is similar to the residual or error sum of
squares value for multiple regression. A model that fits the data
well will have a small likelihood value. A perfect model would
have a likelihood value of zero.

Maximum-likelihood estimation is an interative procedure that


successively tries works to get closer and closer to the correct
answer. When SPSS reports the "iterations," it is telling us how
may cycles it took to get the answer.

Compu
ters II

Overall test of relationship

Slide 9

The overall test of relationship among the independent


variables and groups defined by the dependent is based on the
reduction in the likelihood values for a model which does not
contain any independent variables and the model that contains
the independent variables.

This difference in likelihood follows a chi-square distribution,


and is referred to as the model chi-square.

The significance test for the model chi-square is our statistical


evidence of the presence of a relationship between the
dependent variable and the combination of the independent
variables.

ters II
Slide
10

Beginning logistic regression model

The SPSS output for logistic


regression begins with output
for a model that contains no
independent variables. It labels
this output "Block 0: Beginning
Block" and (if we request the
optional iteration history)
reports the initial -2 Log
Likelihood, which we can think
of as a measure of the error
associated trying to predict the
dependent variable without
using any information from the
independent variables.

The initial -2 log


likelihood is 213.891.

We will not routinely request


the iteration history because
it does not usually yield us
additional useful
information.

ters II
Slide
11

Ending logistic regression model

After the independent


variables are entered in
Block 1, the -2 log likelihood
is again measured (180.267 in
this problem).
The difference between
ending and beginning -2 log
likelihood is the model chisquare that is used in the
test of overall statistical
significance.
In this problem, the model
chi-square is 33.625 (213.891
180.267), which is
statistically significant at
p<0.001.

Model chi-square is
33.625, significant at
p < 0.001.

ters II

Relationship of Individual Independent


Variables and Dependent Variable

Slide
12

There is a test of significance for the relationship between an


individual independent variable and the dependent variable, a
significance test of the Wald statistic .

The individual coefficients represent change in the probability of being


a member of the modeled category. Individual coefficients are
expressed in log units and are not directly interpretable. However, if
the b coefficient is used as the power to which the base of the natural
logarithm (2.71828) is raised, the result represents the change in the
odds of the modeled event associated with a one-unit change in the
independent variable.

If a coefficient is positive, its transformed log value will be greater


than one, meaning that the modeled event is more likely to occur. If a
coefficient is negative, its transformed log value will be less than one,
and the odds of the event occurring decrease. A coefficient of zero (0)
has a transformed log value of 1.0, meaning that this coefficient does
not change the odds of the event one way or the other.

ters II
Slide
13

Numerical problems

The maximum likelihood method used to calculate logistic


regression is an iterative fitting process that attempts to cycle
through repetitions to find an answer.
Sometimes, the method will break down and not be able to
converge or find an answer.
Sometimes the method will produce wildly improbable results,
reporting that a one-unit change in an independent variable
increases the odds of the modeled event by hundreds of
thousands or millions. These implausible results can be
produced by multicollinearity, categories of predictors having
no cases or zero cells, and complete separation whereby the
two groups are perfectly separated by the scores on one or
more independent variables.
The clue that we have numerical problems and should not
interpret the results are standard errors for some independent
variables that are larger than 2.0.

ters II
Slide
14

Strength of logistic regression relationship

While logistic regression does compute correlation measures to


estimate the strength of the relationship (pseudo R square
measures, such as Nagelkerke's R), these correlations measures
do not really tell us much about the accuracy or errors
associated with the model.

A more useful measure to assess the utility of a logistic


regression model is classification accuracy, which compares
predicted group membership based on the logistic model to the
actual, known group membership, which is the value for the
dependent variable.

ters II
Slide
15

Evaluating usefulness for logistic models

The benchmark that we will use to characterize a logistic


regression model as useful is a 25% improvement over the rate
of accuracy achievable by chance alone.

Even if the independent variables had no relationship to the


groups defined by the dependent variable, we would still
expect to be correct in our predictions of group membership
some percentage of the time. This is referred to as by chance
accuracy.

The estimate of by chance accuracy that we will use is the


proportional by chance accuracy rate, computed by summing
the squared percentage of cases in each group.

ters II
Slide
16

Comparing accuracy rates

To characterize our model as useful, we compare the overall


percentage accuracy rate produced by SPSS at the last step in which
variables are entered to 25% more than the proportional by chance
accuracy. (Note: SPSS does not compute a cross-validated accuracy
rate for logistic regression.)
Classification Tablea

Step 1

Observed
EXPECT U.S. IN WORLD
WAR IN 10 YEARS

YES
NO

Predicted
EXPECT U.S. IN
WORLD WAR IN 10
YEARS
YES
NO
20
34
10
72

Overall Percentage
a. The cut value is .500

SPSS reports the overall accuracy rate in


the footnotes to the table "Classification
Table." The overall accuracy rate
computed by SPSS was 67.6%.

Percentage
Correct
37.0
87.8
67.6

ters II
Slide
17

Computing by chance accuracy


The number of cases in each group is found in the Classification Table at
Step 0 (before any independent variables are included). The proportion
of cases in the largest group is equal to the overall percentage (60.3%).
Classification Tablea,b

Step 0

Observed
EXPECT U.S. IN WORLD
WAR IN 10 YEARS

YES
NO

Predicted
EXPECT U.S. IN
WORLD WAR IN 10
YEARS
YES
NO
0
54
0
82

Overall Percentage

Percentage
Correct
.0
100.0
60.3

a. Constant is included in the model.


b. The cut value is .500

The proportional by chance accuracy rate was computed by


calculating the proportion of cases for each group based on the
number of cases in each group in the classification table at Step
0, and then squaring and summing the proportion of cases in
each group (0.397 + 0.603 = 0.521).
The proportional by chance accuracy criteria is 65.2% (1.25 x
52.1% = 65.2%).

ters II
Slide
18

Problem 1
In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of
a statistic? Assume that there is no problem with missing data, outliers, or influential cases,
and that the validation analysis will confirm the generalizability of the results. Use a level of
significance of 0.05 for evaluating the statistical relationship.
The variables "age" [age], "sex" [sex], and "liberal or conservative political views" [polviews]
were useful predictors for distinguishing between groups based on responses to "seen x-rated
movie in last year" [xmovie]. These predictors differentiate survey respondents who have not
seen an x-rated movie from survey respondents who have seen an x-rated movie.
Survey respondents who were older were more likely to have not seen an x-rated movie. A one
unit increase in age increased the odds that survey respondents have not seen an x-rated movie
by 3.9%. Survey respondents who were female were approximately six and three quarters times
more likely to have not seen an x-rated movie. Survey respondents who were more
conservative were more likely to have not seen an x-rated movie. A one unit increase in liberal
or conservative political views increased the odds that survey respondents have not seen an xrated movie by approximately one and a quarter times.
1.
2.
3.
4.

True
True with caution
False
Inappropriate application of a statistic

ters II
Slide
19

Dissecting problem 1 - 1
In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of
a statistic? Assume that there is no problem with missing data, outliers, or influential cases,
and that the validation analysis will confirm the generalizability of the results. Use a level
of significance of 0.05 for evaluating the statistical relationship.
The variables "age" [age], "sex" [sex], and "liberal or conservative political views" [polviews]
For these problems, we will
were useful predictors for distinguishing
between
groupsis based
on responses to "seen x-rated
assume
that there
no problem
movie in last year" [xmovie]. These predictors
differentiate
survey
with missing
data, outliers,
or respondents who have not
seen an x-rated movie from survey respondents
who have
an x-rated movie.
influential cases,
and seen
that the

validation analysis will confirm


the generalizability of the
Survey respondents who were older were
more likely to have not seen an x-rated movie. A one
results

unit increase in age increased the odds that survey respondents have not seen an x-rated movie
this problem,
we are told to six and three quarters times
by 3.9%. Survey respondents who wereInfemale
were approximately
use
0.05
as
alpha
for the
more likely to have not seen an x-rated movie. Survey respondents
who were more
logistic regression.
conservative were more likely to have not seen an x-rated movie. A one unit increase in liberal
or conservative political views increased the odds that survey respondents have not seen an xrated movie by approximately one and a quarter times.
1.
2.
3.
4.

True
True with caution
False
Inappropriate application of a statistic

ters II
Slide
20

Dissecting problem 1 - 2
The
variables
listed first is
in the
the following
problem statement true, false, or an incorrect application of
In the
dataset
GSS2000.sav,
statement
are the
a statistic?
Assume
thatindependent
there is no variables
problem with missing data, outliers, or influential cases,
(IVs): "age" [age], "sex" [sex], and "liberal
and that
the validation analysis will confirm the generalizability of the results. Use a level of
or conservative political views" [polviews].
significance of 0.05 for evaluating the statistical relationship.

The variables "age" [age], "sex" [sex], and "liberal or conservative political views"
[polviews] were useful predictors for distinguishing between groups based on responses to
"seen x-rated movie in last year" [xmovie]. These predictors differentiate survey respondents
who have not seen an x-rated movie from survey respondents who have seen an x-rated movie.
The variable
used to
define
Survey
respondents
who
were older were more likely to have not seen an x-rated movie. A one
groups is the dependent
unit
increase in age increased the odds that survey respondents have not seen an x-rated movie
variable (DV): "seen x-rated
bymovie
3.9%. in
Survey
respondents
who were female were approximately six and three quarters times
last year"
[xmovie].
more likely to have not seen an x-rated movie. Survey
respondents
were
When
a problemwho
states
thatmore
a list of
conservative were more likely to have not seen an x-rated
movie.
A one unit
in liberal
independent
variables
can increase
distinguish
among
groups
and doeshave
not identify
or conservative political views increased the odds that
survey
respondents
not seen an xcontrol variable or an order of
rated movie by approximately one and a quarter times.
importance for the variables, we do a
logistic regression entering all of the
variables simultaneously.

ters II
Slide
21

Dissecting problem 1 - 3
SPSS logistic regression models the relationship by computing
the changes in the likelihood of falling in the category of the
dependent variable which had the highest numerical code.
responses to
an x-rated
movie were
In the datasetThe
GSS2000.sav,
is seeing
the following
statement
true,coded:
false, or an incorrect application of
1=
Yes
and
2
=
No.
a statistic? Assume that there is no problem with missing data, outliers, or influential cases,
and that the validation analysis will confirm the generalizability of the results. Use a level of
The SPSS output will model the changes in the likelihood of
significance of
0.05
for evaluating
statistical
not
seeing
an x-rated the
movie
becauserelationship.
the code for No is 2.

The variables "age" [age], "sex" [sex], and "liberal or conservative political views" [polviews]
were useful predictors for distinguishing between groups based on responses to "seen x-rated
movie in last year" [xmovie]. These predictors differentiate survey respondents who have
not seen an x-rated movie from survey respondents who have seen an x-rated movie.
Survey respondents who were older were more likely to have not seen an x-rated movie. A
one unit increase in age increased the odds that survey respondents have not seen an xrated movie by 3.9%. Survey respondents who were female were approximately six and
three quarters times moreThe
likely
to have not
seen
an x-rated
movie. Survey respondents
statements
of the
specific
relationships
who were more conservative
were independent
more likely variables
to have not
between
and seen
the an x-rated movie. A one
dependent variable
are views
all phrased
in terms
unit increase in liberal or conservative
political
increased
the odds that survey
on movie
not seeing
an x-rated movie.
respondents have not seenofanimpact
x-rated
by approximately
one and a quarter times.

ters II
Slide
22

Dissecting problem 1 - 4
In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of
a statistic? Assume that there is no problem with missing data, outliers, or influential cases,
and that the validation analysis will
generalizability
of the results. Use a level of
Theconfirm
specific the
relationships
for the independent
significance of 0.05 for evaluatingvariables
the statistical
relationship.
listed in the problem indicate the direction
of the relationship, increasing or decreasing the
likelihood of falling in the modeled group, and the
The variables "age" [age], "sex" [sex],
and "liberal or conservative political views" [polviews]
amount of change in the odds associated with a
were useful predictors for distinguishing
between groups based on responses to "seen x-rated
one-unit change in the independent variable.

movie in last year" [xmovie]. These predictors differentiate survey respondents who have not
seen an x-rated movie from survey respondents who have seen an x-rated movie.
Survey respondents who were older were more likely to have not seen an x-rated movie. A
one unit increase in age increased the odds that survey respondents have not seen an xrated movie by 3.9%. Survey respondents who were female were approximately six and
three quarters times more likely to have not seen an x-rated movie. Survey respondents
who were more conservative were more likely to have not seen an x-rated movie. A one
unit increase in liberal or conservative political views increased the odds that survey
respondents have not seen an x-rated movie by approximately one and a quarter times.
1.
2.
3.
4.

In order for the logistic regression question to be


True
true, the overall relationship must be statistically
True with caution
significant, there must be no evidence of a flawed
numerical analysis, the classification accuracy
False
rate must be substantially better than could be
Inappropriate application of a statistic
obtained by chance alone, and each significant
relationship must be interpreted correctly.

ters II
Slide
23

LEVEL OF MEASUREMENT - 1
In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of
a statistic? Assume that there is no problem with missing data, outliers, or influential cases,
and that the validation analysis will confirm the generalizability of the results. Use a level of
significance of 0.05 for evaluating the statistical relationship.
The variables "age" [age], "sex" [sex], and "liberal or conservative political views" [polviews]
were useful predictors for distinguishing between groups based on responses to "seen x-rated
movie in last year" [xmovie]. These predictors differentiate survey respondents who have not
seen an x-rated movie from survey respondents who have seen an x-rated movie.
Survey respondents who were older were more likely to have not seen an x-rated movie. A one
unit increase in ageLogistic
increased
the odds
that survey
respondents
regression
requires
that the
dependenthave not seen an x-rated movie
variable
be
non-metric
and
the
independent
by 3.9%. Survey respondents who were female were approximately six and three quarters times
or movie.
dichotomous.
xmore likely to havevariables
not seenbe
anmetric
x-rated
Survey"seen
respondents
who were more
rated movie in last year" [xmovie] is an
conservative were more
likely tovariable,
have not
seensatisfies
an x-rated
dichotomous
which
the movie.
level of A one unit increase in liberal
or conservative political
measurement
views increased
requirement.
the odds that survey respondents have not seen an xrated movie by approximately one and a quarter times.
1.
2.
3.
4.

It contains two categories: survey respondents


who had seen an x-rated movie in the last year
and survey respondents who had not seen an xrated movie in the last year.

True
True with caution
False
Inappropriate application of a statistic

ters II
Slide
24

LEVEL OF MEASUREMENT - 2
"Age" [age] is an interval level
"Sex" [sex] is a dichotomous
variable,
which
satisfies
the
level
or dummy-coded
In the dataset GSS2000.sav, is the following statement true,
false, or an nominal
incorrect application of
of measurement requirements for
variable
which may
be
a
statistic?
Assume
that
there
is
no
problem
with
missing
data,
outliers,
or
influential cases,
logistic regression analysis.
included in logistic regression.

and that the validation analysis will confirm the generalizability of the results. Use a level of
significance of 0.05 for evaluating the statistical relationship.

The variables "age" [age], "sex" [sex], and "liberal or conservative political views" [polviews]
were useful predictors for distinguishing between groups based on responses to "seen x-rated
movie in last year" [xmovie]. These predictors differentiate survey respondents who have not
seen an x-rated movie from survey respondents who have seen an x-rated movie.
Survey respondents who were older were more likely to have not seen an x-rated movie. A one
"Liberal
or conservative
views"
unit increase in age increased the odds that survey
respondents
havepolitical
not seen
an x-rated movie
[polviews]
is
an
ordinal
level
variable.
If
by 3.9%. Survey respondents who were female were approximately six and three quarters
times
we
follow
the
convention
of
treating
more likely to have not seen an x-rated movie. Survey respondents who were more
ordinal
level variables
metric
conservative were more likely to have not seen
an x-rated
movie. Aas
one
unit increase in liberal
variables, the level of measurement
or conservative political views increased the odds
that
survey
respondents
have not seen an xrequirement for logistic regression
rated movie by approximately one and a quarter
times.
analysis
is satisfied. Since some data
1.
2.
3.
4.

True
True with caution
False
Inappropriate application of a statistic

analysts do not agree with this


convention, a note of caution should be
included in our interpretation.

ters II
Slide
25

Request simultaneous logistic regression

Select the Regression |


Binary Logistic
command from the
Analyze menu.

ters II
Slide
26

Selecting the dependent variable

First, highlight the


dependent variable
xmovie in the list
of variables.

Second, click on the right


arrow button to move the
dependent variable to the
Dependent text box.

ters II
Slide
27

Selecting the independent variables

Move the independent


variables listed in the
problem to the
Covariates list box.

ters II
Slide
28

Specifying the method for including variables


SPSS provides us with two methods for including
variables: to enter all of the independent variables
at one time, and a stepwise method for selecting
variables using a statistical test to determine the
order in which variables are included.
SPSS also supports the specification of "Blocks" of
variables for testing hierarchical models.

Since the problem


states that there is a
relationship without
requesting the best
predictors, we specify
Enter as the method for
including variables.

ters II
Slide
29

Completing the logistic regression request

Click on the OK
button to request
the output for the
logistic regression.

The logistic procedure supports the selection of subsets of


cases, automatic recoding of nominal variables, saving
diagnostic statistics like standardized residuals and Cook's
distance, and options for additional statistics. However,
none of these are needed for this analysis.

ters II
Slide
30

Sample size ratio of cases to variables


Case Processing Summary
Unweighted Cases
Selected Cases

Unselected Cases
Total

N
Included in Analysis
Missing Cases
Total

177
93
270
0
270

Percent
65.6
34.4
100.0
.0
100.0

a. If weight is in effect, see classification table for the total


number of cases.

The minimum ratio of valid cases to


independent variables for logistic regression is
10 to 1, with a preferred ratio of 20 to 1. In this
analysis, there are 177 valid cases and 3
independent variables. The ratio of cases to
independent variables is 59.0 to 1, which
satisfies the minimum requirement. In addition,
the ratio of 59.0 to 1 satisfies the preferred
ratio of 20 to 1.

ters II
Slide
31

OVERALL RELATIONSHIP BETWEEN


INDEPENDENT AND DEPENDENT VARIABLES
Omnibus Tests of Model Coefficients
Step 1

Step
Block
Model

Chi-square
39.668
39.668
39.668

df
3
3
3

Sig.
.000
.000
.000

The presence of a relationship between the dependent


variable and combination of independent variables is
based on the statistical significance of the model chisquare at step 1 after the independent variables have
been added to the analysis.
In this analysis, the probability of the model chi-square
(39.668) was <0.001, less than or equal to the level of
significance of 0.05. The null hypothesis that there is
no difference between the model with only a constant
and the model with independent variables was rejected.
The existence of a relationship between the
independent variables and the dependent variable was
supported.

ters II
Slide
32

NUMERICAL PROBLEMS
Variables in the Equation
Step
a
1

AGE
SEX
POLVIEWS
Constant

B
.038
1.901
.306
-4.590

S.E.
.014
.410
.135
1.045

Wald
7.629
21.452
5.110
19.302

df
1
1
1
1

Sig.
.006
.000
.024
.000

Exp(B)
1.039
6.689
1.358
.010

a. Variable(s) entered on step 1: AGE, SEX, POLVIEWS.

Multicollinearity in the logistic regression solution is detected


by examining the standard errors for the b coefficients. A
standard error larger than 2.0 indicates numerical problems,
such as multicollinearity among the independent variables,
zero cells for a dummy-coded independent variable because
all of the subjects have the same value for the variable, and
'complete separation' whereby the two groups in the
dependent event variable can be perfectly separated by
scores on one of the independent variables. Analyses that
indicate numerical problems should not be interpreted.
None of the independent variables in this analysis had a
standard error larger than 2.0. (The check for standard
errors larger than 2.0 does not include the standard error
for the Constant.)

ters II
Slide
33

RELATIONSHIP OF INDIVIDUAL INDEPENDENT


VARIABLES TO DEPENDENT VARIABLE - 1
The probability of the Wald statistic for the variable age
was 0.006, less than or equal to the level of significance
of 0.05. The null hypothesis that the b coefficient for age
was equal to zero was rejected. This supports the
relationship that "survey respondents who were older
were more likely to have not seen an x-rated movie."
Variables in the Equation
Step
a
1

AGE
SEX
POLVIEWS
Constant

B
.038
1.901
.306
-4.590

S.E.
.014
.410
.135
1.045

Wald
7.629
21.452
5.110
19.302

df
1
1
1
1

Sig.
.006
.000
.024
.000

Exp(B)
1.039
6.689
1.358
.010

a. Variable(s) entered on step 1: AGE, SEX, POLVIEWS.

The value of Exp(B) was 1.039 which implies that a


one unit increase in age increased the odds that
survey respondents have not seen an x-rated movie
by 3.9%. This confirms the statement of the amount
of change in the likelihood of belonging to the
modeled group of the dependent variable associated
with a one unit change in the independent variable,
age.

ters II
Slide
34

RELATIONSHIP OF INDIVIDUAL INDEPENDENT


VARIABLES TO DEPENDENT VARIABLE - 2
The probability of the Wald statistic for the variable sex
was <0.001, less than or equal to the level of
significance of 0.05. The null hypothesis that the b
coefficient for sex was equal to zero was rejected. This
supports the relationship that "survey respondents who
were female were approximately six and three quarters
times more likely to have not seen an x-rated movie."

Variables in the Equation


Step
a
1

AGE
SEX
POLVIEWS
Constant

B
.038
1.901
.306
-4.590

S.E.
.014
.410
.135
1.045

Wald
7.629
21.452
5.110
19.302

df
1
1
1
1

Sig.
.006
.000
.024
.000

a. Variable(s) entered on step 1: AGE, SEX, POLVIEWS.

The value of Exp(B) was 6.689 which implies


that a one unit increase in sex increased the
odds by approximately six and three
quarters times that survey respondents
have not seen an x-rated movie.

Exp(B)
1.039
6.689
1.358
.010

ters II
Slide
35

RELATIONSHIP OF INDIVIDUAL INDEPENDENT


VARIABLES TO DEPENDENT VARIABLE - 3
The probability of the Wald statistic for the variable liberal or
conservative political views was 0.024, less than or equal to the
level of significance of 0.05. The null hypothesis that the b
coefficient for liberal or conservative political views was equal to
zero was rejected. This supports the relationship that "survey
respondents who were more conservative were more likely to have
not seen an x-rated movie." Liberal or conservative political views is
an ordinal variable that is coded so that higher numeric values are
associated with survey respondents who were more conservative.

Variables in the Equation


Step
a
1

AGE
SEX
POLVIEWS
Constant

B
.038
1.901
.306
-4.590

S.E.
.014
.410
.135
1.045

Wald
7.629
21.452
5.110
19.302

df
1
1
1
1

Sig.
.006
.000
.024
.000

a. Variable(s) entered on step 1: AGE, SEX, POLVIEWS.

The value of Exp(B) was 1.358 which implies that


a one unit increase in liberal or conservative
political views increased the odds that survey
respondents have not seen an x-rated movie by
approximately one and a quarter times.

Exp(B)
1.039
6.689
1.358
.010

ters II
Slide CLASSIFICATION USING THE LOGISTIC REGRESSION MODEL:
by chance accuracy rate
36
The independent variables could be characterized as useful
predictors distinguishing survey respondents who have not
seen an x-rated movie from survey respondents who have
seen an x-rated movie if the classification accuracy rate was
substantially higher than the accuracy attainable by chance
alone. Operationally, the classification accuracy rate should
be 25% or more higher than the proportional by chance
accuracy rate.
Classification Tablea,b

Step 0

Observed
SEEN X-RATED MOVIE
IN LAST YEAR

YES
NO

Overall Percentage

Predicted
SEEN X-RATED MOVIE
IN LAST YEAR
YES
NO
0
45
0
132

Percentage
Correct
.0
100.0
74.6

a. Constant is included in the model.

Thecut
proportional
b. The
value is .500 by chance accuracy rate was computed by first

calculating the proportion of cases for each group based on the number
of cases in each group in the classification table at Step 0. The
proportion in the "YES" group is 45/177 = 0.254. The proportion in the
"No" group is 132/177 = 0.746.
Then, we square and sum the proportion of cases in each group (0.254
+ 0.746 = 0.621). 0.621 is the proportional by chance accuracy rate.

ters II
Slide CLASSIFICATION USING THE LOGISTIC REGRESSION MODEL:
criteria for classification accuracy
37
Classification Tablea

Step 1

Observed
SEEN X-RATED MOVIE
IN LAST YEAR

YES
NO

Predicted
SEEN X-RATED MOVIE
IN LAST YEAR
YES
NO
19
26
9
123

Overall Percentage
a. The cut value is .500

The accuracy rate computed by SPSS was 80.2%


which was greater than or equal to the
proportional by chance accuracy criteria of
77.6% (1.25 x 62.1% = 77.6%).
The criteria for classification accuracy is
satisfied.

Percentage
Correct
42.2
93.2
80.2

ters II
Slide
38

Answering the question in problem 1 - 1


In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of
a statistic? Assume that there is no problem with missing data, outliers, or influential cases,
and that the validation analysis will confirm the generalizability of the results. Use a level of
significance of 0.05 for evaluating the statistical relationship.
The variables "age" [age], "sex" [sex], and "liberal or conservative political views"
[polviews] were useful predictors for distinguishing between groups based on responses to
"seen x-rated movie in last year" [xmovie]. These predictors differentiate survey
respondents who have not seen an x-rated movie from survey respondents who have seen
an x-rated movie.
We
found
a statistically
significant
overall
Survey respondents who were older
were
more
likely to have
not seen
an x-rated movie. A one
between
the combination
of seen an x-rated movie
unit increase in age increased the relationship
odds that survey
respondents
have not
independent variables and the dependent
by 3.9%. Survey respondents who were female were approximately six and three quarters times
variable.
more likely to have not seen an x-rated movie. Survey respondents who were more
conservative were more likely to have
seen
an x-rated
A one
unit increase
in liberal
Therenot
was
no evidence
of movie.
numerical
problems
in
or conservative political views increased
the odds that survey respondents have not seen an xthe solution.
rated movie by approximately one and a quarter times.
Moreover, the classification accuracy surpassed
the proportional by chance accuracy criteria,
supporting the utility of the model.

ters II
Slide
39

Answering the question in problem 1 - 2


In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of
a statistic? Assume that there is no problem with missing data, outliers, or influential cases,
and that the validation analysis will confirm the generalizability of the results. Use a level of
significance of 0.05 for evaluating
the statistical
We verified
that eachrelationship.
statement about the
relationship between an independent variable and
the dependent variable was correct in both
The variables "age" [age], "sex"
[sex], and "liberal or conservative political views" [polviews]
direction of the relationship and the change in
were useful predictors for distinguishing
betweenwith
groups
based on
responses
likelihood associated
a one-unit
change
of theto "seen x-rated
movie in last year" [xmovie]. independent
These predictors
differentiate survey respondents who have not
variable.

seen an x-rated movie from survey respondents who have seen an x-rated movie.

Survey respondents who were older were more likely to have not seen an x-rated movie. A
one unit increase in age increased the odds that survey respondents have not seen an xrated movie by 3.9%. Survey respondents who were female were approximately six and
three quarters times more likely to have not seen an x-rated movie. Survey respondents
who were more conservative were more likely to have not seen an x-rated movie. A one
unit increase in liberal or conservative political views increased the odds that survey
respondents have not seen an x-rated movie by approximately one and a quarter times.
1.
2.
3.
4.

True
True with caution
False
Inappropriate application of a statistic

The answer to the question is true


with caution.
A caution is added because of the
inclusion of ordinal level variables.

ters II
Slide
40

Problem 2
In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of
a statistic? Assume that there is no problem with missing data, outliers, or influential cases,
and that the validation analysis will confirm the generalizability of the results. Use a level of
significance of 0.05 for evaluating the statistical relationship.
After controlling for the effect of the variable "sex" [sex] on "should marijuana be made legal"
[grass], the variable "general happiness" [happy] and "confidence in the executive branch of the
federal government" [confed] were useful predictors for distinguishing between groups based
on responses to "should marijuana be made legal" [grass]. These predictors differentiate survey
respondents who have been less supportive that the use of marijuana should be made legal
from survey respondents who have been more supportive that the use of marijuana should be
made legal.
Survey respondents who were less happy overall were less likely to have been less supportive
that the use of marijuana should be made legal. A one unit increase in general happiness
decreased the odds that survey respondents have been less supportive that the use of
marijuana should be made legal by 66.9%. Survey respondents who had less confidence in the
executive branch of the federal government were less likely to have been less supportive that
the use of marijuana should be made legal. A one unit increase in confidence in the executive
branch of the federal government decreased the odds that survey respondents have been less
supportive that the use of marijuana should be made legal by 42.8%.
1.
2.
3.
4.

True
True with caution
False
Inappropriate application of a statistic

ters II
Slide
41

Dissecting problem 2 - 1
In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of
a statistic? Assume that there is no problem with missing data, outliers, or influential cases,
and that the validation analysis will confirm the generalizability of the results. Use a level
of significance of 0.05 for evaluating the statistical relationship.
After controlling for the effect of the variable "sex" [sex] on "should marijuana be made legal"
For [happy]
these problems,
we will in the executive branch of the
[grass], the variable "general happiness"
and "confidence
federal government" [confed] were useful
predictors
forisdistinguishing
assume
that there
no problem between groups based
on responses to "should marijuana be with
mademissing
legal" data,
[grass].
These or
predictors differentiate survey
outliers,
respondents who have been less supportive
that
the
use
of
marijuana
should be made legal
influential cases, and that the
from survey respondents who have been
more supportive
that
the use of marijuana should be
validation
analysis will
confirm
made legal.
the generalizability of the
results

Survey respondents who were less happy overall were less likely to have been less supportive
In this
problem,
areincrease
told to in general happiness
that the use of marijuana should be made
legal.
A onewe
unit
use 0.05
as alpha
decreased the odds that survey respondents
have
been for
lessthe
supportive that the use of
marijuana should be made legal by 66.9%.
Survey
respondents who had less confidence in the
logistic
regression.
executive branch of the federal government were less likely to have been less supportive that
the use of marijuana should be made legal. A one unit increase in confidence in the executive
branch of the federal government decreased the odds that survey respondents have been less
supportive that the use of marijuana should be made legal by 42.8%.
1.
2.
3.
4.

True
True with caution
False
Inappropriate application of a statistic

ters II
Slide
42

Dissecting problem 2 - 2
The variables listed first in the problem statement are
the independent variables (IVs): "sex" [sex] , "general
happiness" [happy], and "confidence in the executive
branch of the federal government" [confed].
Sex is a control variable and general happiness and

In the dataset
GSS2000.sav,
is the branchy
followingare
statement
true, false, or an incorrect application of
confidence
in the executive
predictors.
a statistic? Assume that there is no problem with missing data, outliers, or influential cases, and
that the validation analysis will confirm the generalizability of the results. Use a level of
significance of 0.05 for evaluating the statistical relationship.
After controlling for the effect of the variable "sex" [sex] on "should marijuana be made
legal" [grass], the variable "general happiness" [happy] and "confidence in the executive
branch of the federal government" [confed] were useful predictors for distinguishing
between groups based on responses to "should marijuana be made legal" [grass]. These
predictors differentiate survey respondents who have been less supportive that the use of
marijuana should be made legal from survey respondents who have been more supportive that
the use of marijuana should be made legal.

The variable used to define groups


is the dependent variable (DV):
Survey
were
less happy overall were less likely to have been less supportive
"shouldrespondents
marijuana bewho
made
legal"
that
the
use
of
marijuana
should
be made legal. A one unit increase in general happiness
[grass].
decreased the odds that survey respondents have been
less supportive that the use of marijuana
When a problem identifies control
should be made legal by 66.9%. Survey respondents who had less confidence in the executive
do supportive
a hierarchical
branch of the federal government were less likely tovariables,
have beenweless
that the use of
regression
entering
the branch of
marijuana should be made legal. A one unit increaselogistic
in confidence
in the
executive
variables
in SPSShave
blocks.
the federal government decreased the odds that survey
respondents
been less supportive

that the use of marijuana should be made legal by 42.8%.

ters II
Slide
43

Dissecting problem 2 - 3

SPSS logistic regression models the relationship by computing


the changes in the likelihood of falling in the category of the
dependent variable which had the highest numerical code.
The responses to seeing an x-rated movie were coded:
1=dataset
Legal and
2 = Not Legal.
In the
GSS2000.sav,
is the following statement true, false, or an incorrect application of

a statistic? Assume that there is no problem with missing data, outliers, or influential cases,
and that
the validation
willchanges
confirminthe
of the results. Use a level of
The SPSS
output willanalysis
model the
thegeneralizability
likelihood of
significance
of
0.05
for
evaluating
the
statistical
relationship.
being less supportive of legalizing marijuana because 2
corresponds to not legalizing marijuana.

After controlling for the effect of the variable "sex" [sex] on "should marijuana be made legal"
[grass], the variable "general happiness" [happy] and "confidence in the executive branch of the
federal government" [confed] were useful predictors for distinguishing between groups based
on responses to "should marijuana be made legal" [grass]. These predictors differentiate
survey respondents who have been less supportive that the use of marijuana should be
made legal from survey respondents who have been more supportive that the use of
marijuana should be made legal.
Survey respondents who were less happy overall were less likely to have been less supportive
that the use of marijuana should be made legal. A one unit increase in general happiness
decreased the odds that survey respondents have been less supportive that the use of
marijuana should be made
legal by 66.9%.
had less confidence in the
The statements
of the Survey
specificrespondents
relationshipswho
between
executive branch of the
federal
government
were
less
likely
to
have
been less supportive that
independent variables and the dependent variable are all
the use of marijuana should be made legal. A one unit increase in confidence in the executive
in terms of impact on being less supportive of
branch of the federal phrased
government
decreased the odds that survey respondents have been less
legalizing
marijuana.
supportive that the use of marijuana
should be made legal by 42.8%.

ters II
Slide
44

Dissecting problem 2 - 4
In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of
a statistic? Assume that there is no problem with missing data, outliers, or influential cases,
and that the validation analysis will confirm the generalizability of the results. Use a level of
significance of 0.05 for evaluating the statistical relationship.
The specific relationships for the independent
variables listed in the problem indicate the direction
After controlling for the effect ofofthe
variable "sex" [sex] on "should marijuana be made legal"
the relationship, increasing or decreasing the
[grass], the variable "general happiness"
[happy] and "confidence in the executive branch of the
likelihood of falling in the modeled group, and the
federal government" [confed] were useful predictors for distinguishing between groups based
of change
in the odds
associated
with
a
on responses to "should marijuanaamount
be made
legal" [grass].
These
predictors
differentiate
survey
one-unit
change
in
the
independent
variable.
respondents who have been less supportive that the use of marijuana should be made legal

from survey respondents who have been more supportive that the use of marijuana should be
made legal.

Survey respondents who were less happy overall were less likely to have been less
supportive that the use of marijuana should be made legal. A one unit increase in general
happiness decreased the odds that survey respondents have been less supportive that the
use of marijuana should be made legal by 66.9%. Survey respondents who had less
confidence in the executive branch of the federal government were less likely to have been
less supportive that the use of marijuana should be made legal. A one unit increase in
confidence in the executive branch of the federal government decreased the odds that
survey respondentsInhave
less
supportive
that question
the use to
of be
marijuana
orderbeen
for the
logistic
regression
true, the should be made
legal by 42.8%.
relationship between the predictors and the dependent variable
1.
2.
3.
4.

must be statistically significant after entering the control

variables in a previous stage, there must be no evidence of a


True
flawed numerical analysis, the classification accuracy rate must
True with caution
be substantially better than could be obtained by chance alone,
False
and each significant relationship must be interpreted correctly.
Inappropriate application of a statistic

ters II
Slide
45

LEVEL OF MEASUREMENT - 1
In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of
a statistic? Assume that there is no problem with missing data, outliers, or influential cases,
and that the validation analysis will confirm the generalizability of the results. Use a level of
significance of 0.05 for evaluating the statistical relationship.
After controlling for the effect of the variable "sex" [sex] on "should marijuana be made legal"
[grass], the variable "general happiness" [happy] and "confidence in the executive branch of the
federal government" [confed] were useful predictors for distinguishing between groups based
on responses to "should marijuana be made legal" [grass]. These predictors differentiate
survey respondents who have been less supportive that the use of marijuana should be
made legal from survey respondents who have been more supportive that the use of
marijuana should be made legal.
Survey respondents who were less happy overall were less likely to have been less supportive
that the use of Logistic
marijuana
should be
maderequires
legal. A that
one the
unitdependent
increase in general happiness
regression
analysis
decreased the odds
thatbesurvey
respondents
have
been less variables
supportive that the use of
variable
dichotomous
and the
independent
be be
metric
or legal
dichotomous.
"Should
be made
marijuana should
made
by 66.9%.
Surveymarijuana
respondents
who had less confidence in the
legal"
[grass]
is
a
dichotomous
variable,
which
satisfies
executive branch of the federal government were less likely to have been less supportive that
the level
of measurement
requirement
forincrease
the dependent
the use of marijuana
should
be made legal.
A one unit
in confidence in the executive
variable.
branch of the federal government decreased the odds that survey respondents have been less
supportive thatItthe
use of marijuana should be made legal by 42.8%.
contains two categories:
survey respondents who have been less supportive that
the use of marijuana should be made legal
True
survey respondents who have been more supportive
True with caution
that the use of marijuana should be made legal

1.
2.
3. False
4. Inappropriate application of a statistic

ters II
Slide
46

LEVEL OF MEASUREMENT - 2
"Sex" [sex] is a dichotomous or
dummy-coded nominal variable which
In the dataset
GSS2000.sav,
following
statement true, false, or an incorrect application of
may
be includedisinthe
logistic
regression
a statistic? Assume
that there is no problem with missing data, outliers, or influential cases,
analysis.

and that the validation analysis will confirm the generalizability of the results. Use a level of
significance of 0.05 for evaluating the statistical relationship.

After controlling for the effect of the variable "sex" [sex] on "should marijuana be made
legal" [grass], the variable "general happiness" [happy] and "confidence in the executive
branch of the federal government" [confed] were useful predictors for distinguishing
between groups based on responses to "should marijuana be made legal" [grass]. These
predictors differentiate survey respondents who have been less supportive that the use of
marijuana should be made legal from survey respondents who have been more supportive that
the use of marijuana should be made legal.
"General happiness" [happy] and "confidence in the

executive
the less
federal
government"
Survey respondents who were less
happy branch
overallof
were
likely
to have been less supportive
[confed]
are
ordinal
level
variables.
If we
follow the
that the use of marijuana should be made legal. A one unit increase
in general
happiness
convention
of
treating
ordinal
level
variables
decreased the odds that survey respondents have been less supportive thatasthe use of
variables,
the
level of measurement
marijuana should be made legalmetric
by 66.9%.
Survey
respondents
who had less confidence in the
requirement for logistic regression analysis is
executive branch of the federalsatisfied.
government
less
likely
to have
been
less with
supportive that
Sincewere
some
data
analysts
do not
agree
the use of marijuana should be this
made
legal.
A
one
unit
increase
in
confidence
in
the executive
convention, a note of caution should be included
branch of the federal government
decreased
the odds that survey respondents have been less
in our
interpretation.
supportive that the use of marijuana should be made legal by 42.8%.

ters II
Slide
47

Request hierarchical logistic regression

Select the Regression |


Binary Logistic
command from the
Analyze menu.

ters II
Slide
48

Selecting the dependent variable


First, highlight the
dependent variable
grass in the list of
variables.

Second, click on the right


arrow button to move the
dependent variable to the
Dependent text box.

ters II
Slide
49

Selecting the control independent variables

First, move the control


independent variable,
sex, listed in the
problem to the
Covariates list box.

Second, click on the


Next button to add the
new block that will
contain the predictors.

ters II
Slide
50

Adding the predictor independent variables

First, move the


predictors to the
Covariates list box.

ters II
Slide
51

Specifying the method for including variables

In our hierarchical
regression, we will specify
that all of the variables in
each block be entered
simultaneously when the
block is entered.

ters II
Slide
52

Completing the logistic regression request

Click on the OK
button to request
the output for the
logistic regression.

The logistic procedure supports the selection of subsets of


cases, automatic recoding of nominal variables, saving
diagnostic statistics like standardized residuals and Cook's
distance, and options for additional statistics. However,
none of these are needed for this analysis.

ters II
Slide
53

Sample size ratio of cases to variables


Case Processing Summary
Unweighted Cases
Selected Cases

Unselected Cases
Total

N
Included in Analysis
Missing Cases
Total

163
107
270
0
270

Percent
60.4
39.6
100.0
.0
100.0

a. If weight is in effect, see classification table for the total


number of cases.

The minimum ratio of valid cases to


independent variables for logistic regression is
10 to 1, with a preferred ratio of 20 to 1. In this
analysis, there are 163 valid cases and 3
independent variables. The ratio of cases to
independent variables is 54.33 to 1, which
satisfies the minimum requirement. In addition,
the ratio of 54.33 to 1 satisfies the preferred
ratio of 20 to 1.

ters II
Slide
54

OVERALL RELATIONSHIP BETWEEN


INDEPENDENT AND DEPENDENT VARIABLES

In a hierarchical logistic regression, the presence of a relationship


between the dependent variable and combination of independent
variables entered after the control variables have been included is
based on the statistical significance of the block chi-square for
the second block of variables in which the predictor independent
variables are included.
In this analysis, the probability of the block chi-square (17.467)
was <0.001, less than or equal to the level of significance of
0.05. The null hypothesis that there is no difference between the
model with only a constant and the control variables versus the
model with the predictor independent variables was rejected. The
contribution of the relationship between the predictor
independent variables and the dependent variable was supported.

ters II
Slide
55

NUMERICAL PROBLEMS
Variables in the Equation
Step
a
1

SEX
HAPPY
CONFED
Constant

B
.154
-1.104
-.559
3.721

S.E.
.351
.354
.270
1.066

Wald
.194
9.739
4.290
12.195

df
1
1
1
1

Sig.
.660
.002
.038
.000

Exp(B)
1.167
.331
.572
41.308

a. Variable(s) entered on step 1: HAPPY, CONFED.

Multicollinearity in the logistic regression solution is detected


by examining the standard errors for the b coefficients. A
standard error larger than 2.0 indicates numerical problems,
such as multicollinearity among the independent variables,
zero cells for a dummy-coded independent variable because
all of the subjects have the same value for the variable, and
'complete separation' whereby the two groups in the
dependent event variable can be perfectly separated by
scores on one of the independent variables. Analyses that
indicate numerical problems should not be interpreted.
None of the independent variables in this analysis had a
standard error larger than 2.0. (The check for standard
errors larger than 2.0 does not include the standard error
for the Constant.)

ters II
Slide
56

RELATIONSHIP OF INDIVIDUAL INDEPENDENT


VARIABLES TO DEPENDENT VARIABLE - 1
The probability of the Wald statistic for the variable general
happiness was 0.002, less than or equal to the level of
significance of 0.05. The null hypothesis that the b coefficient
for general happiness was equal to zero was rejected. This
supports the relationship that "survey respondents who were
less happy overall were less likely to have been less supportive
that the use of marijuana should be made legal." General
happiness is an ordinal variable that is coded so that lower
numeric values are associated with survey respondents who
were happier overall.

Variables in the Equation


Step
a
1

SEX
HAPPY
CONFED
Constant

B
.154
-1.104
-.559
3.721

S.E.
.351
.354
.270
1.066

Wald
.194
9.739
4.290
12.195

df
1
1
1
1

Sig.
.660
.002
.038
.000

a. Variable(s) entered on step 1: HAPPY, CONFED.

The value of Exp(B) was 0.331 which implies that a


one unit increase in general happiness decreased the
odds that survey respondents have been less
supportive that the use of marijuana should be made
legal by 66.9%.

Exp(B)
1.167
.331
.572
41.308

ters II
Slide
57

RELATIONSHIP OF INDIVIDUAL INDEPENDENT


VARIABLES TO DEPENDENT VARIABLE - 2
The probability of the Wald statistic for the variable confidence in the
executive branch of the federal government was 0.038, less than or
equal to the level of significance of 0.05. The null hypothesis that the
b coefficient for confidence in the executive branch of the federal
government was equal to zero was rejected. This supports the
relationship that "survey respondents who had less confidence in the
executive branch of the federal government were less likely to have
been less supportive that the use of marijuana should be made legal."
Confidence in the executive branch of the federal government is an
ordinal variable that is coded so that lower numeric values are
associated with survey respondents who had more confidence in the
executive branch of the federal government.
Variables in the Equation
Step
a
1

SEX
HAPPY
CONFED
Constant

B
.154
-1.104
-.559
3.721

S.E.
.351
.354
.270
1.066

Wald
.194
9.739
4.290
12.195

df
1
1
1
1

Sig.
.660
.002
.038
.000

a. Variable(s) entered on step 1: HAPPY, CONFED.

The value of Exp(B) was 0.572 which implies


that a one unit increase in confidence in the
executive branch of the federal government
decreased the odds that survey respondents
have been less supportive that the use of
marijuana should be made legal by 42.8%.

Exp(B)
1.167
.331
.572
41.308

ters II
Slide CLASSIFICATION USING THE LOGISTIC REGRESSION MODEL:
by chance accuracy rate
58
The independent variables could be characterized as useful
predictors distinguishing survey respondents who have been
less supportive that the use of marijuana should be made
legal from survey respondents who have been more
supportive that the use of marijuana should be made legal if
the classification accuracy rate was substantially higher than
the accuracy attainable by chance alone. Operationally, the
classification accuracy rate should be 25% or more higher
than the proportional by chance accuracy rate.
Classification Tablea,b

Step 0

Observed
SHOULD MARIJUANA
BE MADE LEGAL

LEGAL
NOT LEGAL

Predicted
SHOULD MARIJUANA BE
MADE LEGAL
LEGAL
NOT LEGAL
0
57
0
106

Overall Percentage
a. Constant is included in the model.
b. The cut value is .500

The proportional by chance accuracy rate was computed by


calculating the proportion of cases for each group based on
the number of cases in each group in the classification table
at Step 0, and then squaring and summing the proportion of
cases in each group (0.350 + 0.650 = 0.545).

Percentage
Correct
.0
100.0
65.0

ters II
Slide CLASSIFICATION USING THE LOGISTIC REGRESSION MODEL:
criteria for classification accuracy
59
Classification Tablea

Step 1

Observed
SHOULD MARIJUANA
BE MADE LEGAL

LEGAL
NOT LEGAL

Predicted
SHOULD MARIJUANA BE
MADE LEGAL
LEGAL
NOT LEGAL
18
39
13
93

Overall Percentage
a. The cut value is .500

The accuracy rate computed by SPSS was 68.1%


which was greater than or equal to the
proportional by chance accuracy criteria of
68.1% (1.25 x 54.5% = 68.1%).
The criteria for classification accuracy is
satisfied.

Percentage
Correct
31.6
87.7
68.1

ters II
Slide
60

Answering the question in problem 2 - 1


In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of
a statistic? Assume that there is no problem with missing data, outliers, or influential cases, and
that the validation analysis will confirm the generalizability of the results. Use a level of
significance of 0.05 for evaluating the statistical relationship.
After controlling for the effect of the variable "sex" [sex] on "should marijuana be made legal"
[grass], the variable "general happiness" [happy] and "confidence in the executive branch of the
federal government" [confed] were useful predictors for distinguishing between groups based on
responses to "should marijuana be made legal" [grass]. These predictors differentiate survey
respondents who have been less supportive that the use of marijuana should be made legal from
survey respondents who have been more supportive that the use of marijuana should be made
legal.
Survey respondents who were less happy overall were less likely to have been less supportive
that the use of marijuana should be made legal. A one unit increase in general happiness
We found ahave
statistically
significant
overall
decreased the odds that survey respondents
been less
supportive
that the use of marijuana
relationship
between
the
predictor
independent
should be made legal by 66.9%. Survey respondents who had less confidence in the executive
variables
thetodependent
branch of the federal government were
less and
likely
have beenvariable.
less supportive that the use of
marijuana should be made legal. A one unit increase in confidence in the executive branch of
wasthat
no evidence
of numericalhave
problems
the federal government decreased There
the odds
survey respondents
been in
less supportive
the
solution.
that the use of marijuana should be
made
legal by 42.8%.
1.
2.
3.
4.

True
True with caution
False
Inappropriate application of a

Moreover, the classification accuracy surpassed


the proportional by chance accuracy criteria,
supporting the utility of the model.

statistic

ters II
Slide
61

Answering the question in problem 2 - 2


In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of
a statistic? Assume that there is no problem with missing data, outliers, or influential cases,
and that the validation analysis will confirm the generalizability of the results. Use a level of
significance of 0.05 for evaluating
the statistical relationship.
We verified that each statement about the
relationship between an independent variable and
After controlling for the effect
the variable
"sex" was
[sex]correct
on "should
marijuana be made legal"
theofdependent
variable
in both
[grass], the variable "generaldirection
happiness"
[happy]
and "confidence
in theinexecutive branch of the
of the
relationship
and the change
federal government" [confed]likelihood
were useful
predictors
distinguishing
groups based
associated
with for
a one-unit
changebetween
of the
on responses to "should marijuana
be
made
legal"
[grass].
These
predictors
differentiate
survey
independent variable.

respondents who have been less supportive that the use of marijuana should be made legal
from survey respondents who have been more supportive that the use of marijuana should be
made legal.

Survey respondents who were less happy overall were less likely to have been less supportive
that the use of marijuana should be made legal. A one unit increase in general happiness
decreased the odds that survey respondents have been less supportive that the use of
marijuana should be made legal by 66.9%. Survey respondents who had less confidence in the
executive branch of the federal government were less likely to have been less supportive that
the use of marijuana should be made legal. A one unit increase in confidence in the executive
branch of the federal government decreased the odds that survey respondents have been less
supportive that the use of marijuana should be made legal by 42.8%.
1.
2.
3.
4.

True
True with caution
False
Inappropriate application of a statistic

The answer to the question is true


with caution.

A caution is added because of the


inclusion of ordinal level variables.

ters II
Slide
62

Problem 3
In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of
a statistic? Assume that there is no problem with missing data, outliers, or influential cases,
and that the validation analysis will confirm the generalizability of the results. Use a level of
significance of 0.05 for evaluating the statistical relationship.
From the list of variables "highest academic degree" [degree], "total family income"
[income98], and "satisfaction with financial situation" [satfin], the most useful predictor for
distinguishing between groups based on responses to "expect u.s. in world war in 10 years"
[uswary] was "total family income" [income98]. These predictors differentiate survey
respondents who have been less positive that the United States would fight in another world
war within the next ten years from survey respondents who have been more positive that the
United States would fight in another world war within the next ten years.
The most important predictor for identifying survey respondents who have been less positive
that the United States would fight in another world war within the next ten years was total
family income.
Survey respondents who had higher total family incomes were more likely to have been less
positive that the United States would fight in another world war within the next ten years. A
one unit increase in total family income increased the odds that survey respondents have been
less positive that the United States would fight in another world war within the next ten years
by 10.0%.
1.
2.
3.
4.

True
True with caution
False
Inappropriate application of a statistic

ters II
Slide
63

Dissecting Problem 3 - 1
In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of
a statistic? Assume that there is no problem with missing data, outliers, or influential cases,
and that the validation analysis will confirm the generalizability of the results. Use a level
of significance of 0.05 for evaluating the statistical relationship.
From the list of variables "highest academic degree" [degree], "total family income"
[income98], and "satisfaction with financial
situation"
[satfin],
For these
problems,
we willthe most useful predictor for
distinguishing between groups based on
responses
to
"expect
u.s. in world war in 10 years"
assume that there is no problem
[uswary] was "total family income" [income98].
These
with missing
data,predictors
outliers, ordifferentiate survey
respondents who have been less positive
that
the
United
would fight in another world
influential cases, andStates
that the
war within the next ten years from survey
respondents
who
have
been more positive that the
validation analysis will confirm
United States would fight in another world
war within the
the generalizability
of next
the ten years.
results

The most important predictor for identifying survey respondents who have been less positive
that the United States would fight in another
world war
the
In this problem,
we within
are told
to next ten years was total
family income.
use 0.05 as alpha for the
logistic regression.

Survey respondents who had higher total family incomes were more likely to have been less
positive that the United States would fight in another world war within the next ten years. A
one unit increase in total family income increased the odds that survey respondents have been
less positive that the United States would fight in another world war within the next ten years
by 10.0%.
1.
2.
3.
4.

True
True with caution
False
Inappropriate application of a statistic

ters II
Slide
64

Dissecting Problem 3 - 2
The variables listed first in the
The variable used to
problem statement are the
define groups is the
independent variables (IVs): "highest
dependent variable (DV):
academic degree" [degree], "total
"expect u.s. in world war
family income" [income98], and
in 10
[uswary].
In the "satisfaction
dataset GSS2000.sav,
is thesituation"
following statement true, false,
or years"
an incorrect
application of
with financial
a statistic?
Assume
that
there
is
no
problem
with
missing
data,
outliers,
or
influential
cases, and
[satfin].

that the validation analysis will confirm the generalizability of the results. Use a level of
significance of 0.05 for evaluating the statistical relationship.

From the list of variables "highest academic degree" [degree], "total family income"
[income98], and "satisfaction with financial situation" [satfin], the most useful predictor for
distinguishing between groups based on responses to "expect u.s. in world war in 10 years"
[uswary] was "total family income" [income98]. These predictors differentiate survey
respondents who have been less positive that the United States would fight in another world war
within the next ten years from survey respondents who have been more positive that the United
States would fight in another world war within the next ten years.
The most important predictor for identifying survey respondents who have been less positive
that the United States would fight in another world war within the next ten years was total
family income.
Since the problem identifies
themore
mostlikely
usefultoofhave been less
Survey respondents who had higher total family incomes were
positive that the United States would fight in another world important
war withinpredictor,
the nextwe
ten do
years. A one
a
stepwise
logistic
unit increase in total family income increased the odds that survey respondents have been less
positive that the United States would fight in another world regression.
war within the next ten years by

10.0%.

ters II
Slide
65

Dissecting Problem 3 - 3
In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of
a statistic? Assume that there is no problem with missing data, outliers, or influential cases, and
that the validation analysis will confirm the generalizability of the results. Use a level of
significance of 0.05 for evaluating the statistical relationship.
From the list of variables "highest academic degree" [degree], "total family income" [income98],
and "satisfaction with financial situation" [satfin], the most useful predictor for distinguishing
between groups based on responses to "expect u.s. in world war in 10 years" [uswary] was "total
family income" [income98]. These predictors differentiate survey respondents who have been
less positive that the United States would fight in another world war within the next ten
years from survey respondents who have been more positive that the United States would
fight in another world war within the next ten years.
The most important predictor for identifying survey respondents who have been less positive
that the United States would fight in another world war within the next ten years was total
family income.
SPSS logistic regression models the relationship by computing the

changes
the
likelihood
fallingincomes
in the category
of the
Survey respondents
who in
had
higher
totalof
family
were more
likely to have been less
positive that thedependent
United States
would
fight
in the
another
world
war within
variable
which
had
highest
numerical
code.the next ten years. A one
unit increase in total family income increased the odds that survey respondents have been less
positive that theThe
United
Statesto
would
fight
in in
another
world
waryears
withinwere
the next ten years by
responses
expect
u.s.
world war
in 10
10.0%.
coded: 1= Yes and 2 = No.
The SPSS output will model the changes in the likelihood of being
less positive that the United States would fight in another world
war within the next ten years.

ters II
Slide
66

Dissecting Problem 3 - 4
In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of
a statistic? Assume that there is no problem with missing data, outliers, or influential cases,
and that the validation analysis will confirm the generalizability of the results. Use a level of
The statements of the specific
significance of 0.05 for evaluating the statistical
relationship.
relationships between independent
variables and the dependent variable are
From the list of variables "highest academic
"total
family
income"
alldegree"
phrased[degree],
in terms of
impact
on being
[income98], and "satisfaction with financialless
situation"
[satfin],
the
most
useful
predictor for
positive that the United States would
distinguishing between groups based on responses
to "expect
u.s.
in within
world the
warnext
in 10 years"
fight in another
world
war
[uswary] was "total family income" [income98].
These
predictors
differentiate
survey
ten years.

respondents who have been less positive that the United States would fight in another world
war within the next ten years from survey respondents who have been more positive that the
United States would fight in another world war within the next ten years.
The most important predictor for identifying survey respondents who have been less
positive that the United States would fight in another world war within the next ten years
was total family income.
Survey respondents who had higher total family incomes were more likely to have been less
positive that the United States would fight in another world war within the next ten years.
A one unit increase in total family income increased the odds that survey respondents have
been less positive that the United States would fight in another world war within the next
ten years by 10.0%.

ters II
Slide
67

Dissecting Problem 3 - 5
From the list of variables "highest academic degree" [degree], "total family income"
[income98], and "satisfaction with financial situation" [satfin], the most useful predictor for
distinguishing between groups based on responses to "expect u.s. in world war in 10 years"
[uswary] was "total family income"
[income98]. These predictors differentiate survey
The specific relationships for the independent
respondents who have been less positive
that the United States would fight in another world
variables listed in the problem indicate the direction
war within the next ten years from
survey respondents who have been more positive that the
of the
relationship,
increasing
decreasing
United States would fight in another
world
war within
the nextorten
years. the
likelihood of falling in the modeled group, and the
amount of change associated with a one-unit
The most important predictor forchange
identifying
respondents
who have been less positive
in thesurvey
independent
variable.

that the United States would fight in another world war within the next ten years was total
family income.

Survey respondents who had higher total family incomes were more likely to have been less
positive that the United States would fight in another world war within the next ten years.
A one unit increase in total family income increased the odds that survey respondents have
been less positive that the United States would fight in another world war within the next
ten years by 10.0%.
1.
2.
3.
4.

True
True with caution
In order for the logistic regression question to be true, the
False
relationship between the predictors selected for inclusion and the
Inappropriate application
a statistic
dependent of
variable
must be statistically significant, there must be
no evidence of a flawed numerical analysis, the classification
accuracy rate must be substantially better than could be obtained
by chance alone, and the order of entry and each significant
relationship must be interpreted correctly.

ters II
Slide
68

LEVEL OF MEASUREMENT - 1
In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of
a statistic? Assume that there is no problem with missing data, outliers, or influential cases, and
that the validation analysis will confirm the generalizability of the results. Use a level of
significance of 0.05 for evaluating the statistical relationship.
From the list of variables "highest academic degree" [degree], "total family income" [income98],
and "satisfaction with financial situation" [satfin], the most useful predictor for distinguishing
between groups based on responses to "expect u.s. in world war in 10 years" [uswary] was "total
family income" [income98]. These predictors differentiate survey respondents who have been
less positive that the United States would fight in another world war within the next ten years
from survey respondents who have been more positive that the United States would fight in
another world war within the next ten years.
The most important predictor for identifying survey respondents who have been less positive
that the United States would fight in another world war within the next ten years was total
family income. Logistic regression analysis requires that the dependent variable

be dichotomous and the independent variables be metric or


dichotomous. "Expect u.s. in world war in 10 years" [uswary] is a
Survey respondents
who hadvariable,
higher total
family
incomes
were
likely to have been less
dichotomous
which
satisfies
the level
of more
measurement
positive that the
United
States
would
fight
in
another
world
war
within
the next ten years. A one
requirement for the dependent variable.

unit increase in total family income increased the odds that survey respondents have been less
positive that the United States would fight in another world war within the next ten years by
10.0%.
It contains two categories:
survey

respondents
States would fight in
True
survey respondents
True with caution
States would fight in

who have been less positive that the United


another world war within the next ten years
who have been more positive that the United
another world war within the next ten years.

1.
2.
3. False
4. Inappropriate application of a statistic

ters II
Slide
69

LEVEL OF MEASUREMENT - 2
"Highest academic degree" [degree], "total family
income" [income98], and "satisfaction with financial
situation" [satfin] are ordinal level variables. If we
follow the convention of treating ordinal level
variables as metric variables, the level of
measurement requirement for logistic regression
In the dataset GSS2000.sav, is the
following
statement
true,
false,
an incorrect
analysis
is satisfied.
Since
some
dataoranalysts
do notapplication of
with this
convention,
a note
of caution
should
a statistic? Assume that there isagree
no problem
with
missing data,
outliers,
or influential
cases, and
included
our interpretation.
that the validation analysis will be
confirm
theingeneralizability
of the results. Use a level of

significance of 0.05 for evaluating the statistical relationship.

From the list of variables "highest academic degree" [degree], "total family income" [income98],
and "satisfaction with financial situation" [satfin], the most useful predictor for distinguishing
between groups based on responses to "expect u.s. in world war in 10 years" [uswary] was "total
family income" [income98]. These predictors differentiate survey respondents who have been
less positive that the United States would fight in another world war within the next ten years
from survey respondents who have been more positive that the United States would fight in
another world war within the next ten years.
The most important predictor for identifying survey respondents who have been less positive
that the United States would fight in another world war within the next ten years was total
family income.
Survey respondents who had higher total family incomes were more likely to have been less
positive that the United States would fight in another world war within the next ten years. A one
unit increase in total family income increased the odds that survey respondents have been less
positive that the United States would fight in another world war within the next ten years by
10.0%.

ters II
Slide
70

Request stepwise logistic regression

Select the Regression |


Binary Logistic
command from the
Analyze menu.

ters II
Slide
71

Selecting the dependent variable

First, highlight the


dependent variable
uswary in the list
of variables.

Second, click on the right


arrow button to move the
dependent variable to the
Dependent text box.

ters II
Slide
72

Adding the independent variables

First, move the


predictors to the
Covariates list box.

ters II
Slide
73

Specifying the method for including variables

In our stepwise logistic


regression, we specify
the Forward
Conditional method for
adding variables.

ters II
Slide
74

Adding options to the output

To add a summary of steps


at the end of the analysis
and specifications for
stepwise method, click on
the Options button.

ters II
Slide
75

Including a summary of steps

To obtain a summary of the steps


on which variables were added or
removed from the analysis, mark
the option button At last step in
the Display panel.

ters II
Slide
76

Specifications for stepwise method

Click on the
Continue button to
close the dialog box.

We can change the criteria for adding and


removing variables from the analysis by
changing the probability for entry and removal.
We will use the default level of significance of
0.05 for entry and 0.10 for removal.

ters II
Slide
77

Completing the logistic regression request

Click on the OK
button to request
the output for the
logistic regression.

ters II
Slide
78

Sample size ratio of cases to variables


Case Processing Summary
Unweighted Cases
Selected Cases

Unselected Cases
Total

N
Included in Analysis
Missing Cases
Total

136
134
270
0
270

Percent
50.4
49.6
100.0
.0
100.0

a. If weight is in effect, see classification table for the total


number of cases.

The minimum ratio of valid cases to


independent variables for stepwise logistic
regression is 10 to 1, with a preferred ratio of
50 to 1. In this analysis, there are 136 valid
cases and 3 independent variables. The ratio of
cases to independent variables is 45.33 to 1,
which satisfies the minimum requirement.
However, the ratio of 45.33 to 1 does not satisfy
the preferred ratio of 50 to 1. A caution should
be added to the interpretation of the analysis
and a split sample validation should be
conducted.

ters II
Slide
79

OVERALL RELATIONSHIP BETWEEN


INDEPENDENT AND DEPENDENT VARIABLES

The presence of a relationship between the dependent variable


and combination of independent variables is based on the
statistical significance of the model chi-square.
In this analysis, the probability of the model chi-square (9.001)
was 0.003, less than or equal to the level of significance of 0.05.
The null hypothesis that there is no difference between the model
with only a constant and the model with independent variables
was rejected. The existence of a relationship between the
independent variables and the dependent variable was supported.

ters II
Slide
80

NUMERICAL PROBLEMS
Variables in the Equation
Step
a
1

INCOME98
Constant

B
.095
-1.033

S.E.
.033
.527

Wald
8.436
3.847

df
1
1

Sig.
.004
.050

Exp(B)
1.100
.356

a. Variable(s) entered on step 1: INCOME98.

Multicollinearity in the logistic regression solution is detected


by examining the standard errors for the b coefficients. A
standard error larger than 2.0 indicates numerical problems,
such as multicollinearity among the independent variables,
zero cells for a dummy-coded independent variable because
all of the subjects have the same value for the variable, and
'complete separation' whereby the two groups in the
dependent event variable can be perfectly separated by
scores on one of the independent variables. Analyses that
indicate numerical problems should not be interpreted.
None of the independent variables in this analysis had a
standard error larger than 2.0. (The check for standard
errors larger than 2.0 does not include the standard error
for the Constant.)

ters II
Slide
81

RELATIONSHIP OF INDIVIDUAL INDEPENDENT


VARIABLES TO DEPENDENT VARIABLE
The probability of the Wald statistic for the variable total family
income was 0.004, less than or equal to the level of significance
of 0.05. The null hypothesis that the b coefficient for total family
income was equal to zero was rejected. This supports the
relationship that "survey respondents who had higher total
family incomes were more likely to have been less positive that
the United States would fight in another world war within the
next ten years." Total family income is an ordinal variable that is
coded so that higher numeric values are associated with survey
respondents who had higher total family incomes.

Variables in the Equation


Step
a
1

INCOME98
Constant

B
.095
-1.033

S.E.
.033
.527

Wald
8.436
3.847

df
1
1

Sig.
.004
.050

a. Variable(s) entered on step 1: INCOME98.

The value of Exp(B) was 1.100 which implies that a


one unit increase in total family income increased the
odds that survey respondents have been less positive
that the United States would fight in another world
war within the next ten years by 10.0%.

Exp(B)
1.100
.356

ters II

IMPORTANCE OF INDIVIDUAL INDEPENDENT


VARIABLES TO DEPENDENT VARIABLE

Slide
82

The order of importance is based on the entry


order of the variables included in the stepwise
logistic regression. The entry order is
summarized in the Step Summary table, in
which we see which variable was added or
removed at each step.
Step Summarya,b

Step
1

Improvement
Chi-square
df
9.001

Sig.
.003

Chi-square
9.001

Model
df

Sig.
1

Correct
Class %

.003

67.6%

a. No more variables can be deleted from or added to the current model.


b. End block: 1

The most important predictor for identifying


survey respondents who have been less
positive that the United States would fight in
another world war within the next ten years
was total family income [INCOME98].
The importance of the predictors stated in
the problem is correct.

Variable
IN:
INCOME9
8

ters II
Slide CLASSIFICATION USING THE LOGISTIC REGRESSION MODEL:
by chance accuracy rate
83
The independent variables could be characterized as useful
predictors distinguishing survey respondents who have been
less positive that the United States would fight in another
world war within the next ten years from survey respondents
who have been more positive that the United States would
fight in another world war within the next ten years if the
classification accuracy rate was substantially higher than the
accuracy attainable by chance alone. Operationally, the
classification accuracy rate should be 25% or more higher
than the proportional by chance accuracy rate.
Classification Tablea,b

Step 0

Observed
EXPECT U.S. IN WORLD
WAR IN 10 YEARS

YES
NO

Predicted
EXPECT U.S. IN
WORLD WAR IN 10
YEARS
YES
NO
0
54
0
82

Overall Percentage
a. Constant is included in the model.
b. The
cut proportional
value is .500 by chance accuracy rate was computed by
The

calculating the proportion of cases for each group based on


the number of cases in each group in the classification table
at Step 0, and then squaring and summing the proportion of
cases in each group (0.397 + 0.603 = 0.521).

Percentage
Correct
.0
100.0
60.3

ters II
Slide CLASSIFICATION USING THE LOGISTIC REGRESSION MODEL:
criteria for classification accuracy
84
Classification Tablea

Step 1

Observed
EXPECT U.S. IN WORLD
WAR IN 10 YEARS

YES
NO

Predicted
EXPECT U.S. IN
WORLD WAR IN 10
YEARS
YES
NO
20
34
10
72

Overall Percentage
a. The cut value is .500

The accuracy rate computed by SPSS was


67.6% which was greater than or equal to the
proportional by chance accuracy criteria of
65.2% (1.25 x 52.1% = 65.2%).
The criteria for classification accuracy is
satisfied.

Percentage
Correct
37.0
87.8
67.6

ters II
Slide
85

Answering the question in problem 3 - 1


In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of
a statistic? Assume that there is no problem with missing data, outliers, or influential cases, and
that the validation analysis will confirm the generalizability of the results. Use a level of
significance of 0.05 for evaluating the statistical relationship.
From the list of variables "highest academic degree" [degree], "total family income" [income98],
and "satisfaction with financial situation" [satfin], the most useful predictor for distinguishing
between groups based on responses to "expect u.s. in world war in 10 years" [uswary] was "total
family income" [income98]. These predictors differentiate survey respondents who have been
less positive that the United States would fight in another world war within the next ten years
from survey respondents who have been more positive that the United States would fight in
another world war within the next ten years.
The most important predictor for identifying survey respondents who have been less positive
that the United States would fight in another world war within the next ten years was total
family income.
We found a statistically significant overall
relationship between the predictor independent
and the
dependent
variable.
Survey respondents who had highervariables
total family
incomes
were more
likely to have been less

positive that the United States would fight in another world war within the next ten years. A one
was no
evidence
of numerical
problems in
unit increase in total family incomeThere
increased
the
odds that
survey respondents
have been less
the fight
solution.
positive that the United States would
in another world war within the next ten years by
10.0%.
1.
2.
3.
4.

Moreover, the classification accuracy surpassed


the proportional by chance accuracy criteria,
supporting the utility of the model.

True
True with caution
False
Inappropriate application of a statistic

ters II
Slide
86

Answering the question in problem 3 - 2


In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of
a statistic? Assume that there is no problem with missing data, outliers, or influential cases,
and that the validation analysis will confirm the generalizability of the results. Use a level of
significance of 0.05 for
theeach
statistical
relationship.
Weevaluating
verified that
statement
about the
relationship between an independent variable and
the dependent
variable was
correct
in both "total family income"
From the list of variables
"highest academic
degree"
[degree],
direction with
of thefinancial
relationship
and the
changethe
in most useful predictor for
[income98], and "satisfaction
situation"
[satfin],
likelihood
associated
with a one-unit
change
distinguishing between
groups based
on responses
to "expect
u.s.ofinthe
world war in 10 years"
independent
variable.
[uswary] was "total family income" [income98]. These predictors differentiate survey

respondents who have been less positive that the United States would fight in another world
war within the next ten
from survey
respondents
who have
been more positive that the
We years
also verified
the order
of importance
for the
United States would fight
in another
world included
war within
thestepwise
next ten years.
independent
variables
in the
analysis.

The most important predictor for identifying survey respondents who have been less positive
that the United States would fight in another world war within the next ten years was total
family income.
Survey respondents who had higher total family incomes were more likely to have been less
positive that the United States would fight in another world war within the next ten years. A
Theodds
answer
the question
is true have been
one unit increase in total family income increased the
thattosurvey
respondents
with caution.
less positive that the United States would fight in another
world war within the next ten years
by 10.0%.
1.
2.
3.
4.

True
True with caution
False
Inappropriate application of a statistic

A caution is added to the findings


because of the inclusion of ordinal
level independent variables. A
caution is added to the findings
because of the preferred sample
size is not met.

ters II
Slide
87

Steps in binary logistic regression:


level of measurement and initial sample size
The following is a guide to the decision process for answering
problems about the basic relationships in logistic regression:
Dependent dichotomous?
Independent variables
metric or dichotomous?

No

Inappropriate
application of
a statistic

Yes

Ratio of cases to
independent variables at
least 10 to 1?

Yes
Run logistic regression, using method for including
variables identified in the research question.

No

Inappropriate
application of
a statistic

ters II

Steps in logistic regression:


overall relationship and numerical problems

Slide
88

No

No
False

Hierarchical method of
entry used to include
independent variables?

Presence of relationship
confirmed by test of
model chi-square?

Yes

Presence of relationship
confirmed by test of
block chi-square?

No
False

Yes

Yes

Standard errors of
coefficients indicate
presence of numerical
problems (s.e. > 2.0)?

No

Yes

False

ters II
Slide
89

Steps in logistic regression:


relationships between IV's and DV

Stepwise method of entry


used to include
independent variables?

Yes

No
Entry order of variables
interpreted correctly?

No
Yes

Relationships between
individual IVs and DV groups
interpreted correctly?

Yes

No

False

False

ters II
Slide
90

Steps in logistic regression:


classification accuracy and adding cautions

Overall accuracy rate is


25% > than proportional
by chance accuracy rate?

No

False

Yes

Satisfies preferred ratio of


cases to IV's of 20 to 1
(50 to 1 for stepwise)

No

True with caution

Yes

One or more IV's are


ordinal level variables?

No
True

Yes

True with caution

You might also like