Professional Documents
Culture Documents
LogisticRegression BasicRelationships
LogisticRegression BasicRelationships
Slide 1
Logistic Regression
Describing Relationships
Classification Accuracy
Sample Problems
Compu
ters II
Logistic regression
Slide 2
Compu
ters II
Slide 3
Compu
ters II
Slide 4
Compu
ters II
Assumptions
Slide 5
Compu
ters II
Slide 6
Compu
ters II
Slide 7
Compu
ters II
Computational method
Slide 8
The overall measure of how will the model fits is given by the
likelihood value, which is similar to the residual or error sum of
squares value for multiple regression. A model that fits the data
well will have a small likelihood value. A perfect model would
have a likelihood value of zero.
Compu
ters II
Slide 9
ters II
Slide
10
ters II
Slide
11
Model chi-square is
33.625, significant at
p < 0.001.
ters II
Slide
12
ters II
Slide
13
Numerical problems
ters II
Slide
14
ters II
Slide
15
ters II
Slide
16
Step 1
Observed
EXPECT U.S. IN WORLD
WAR IN 10 YEARS
YES
NO
Predicted
EXPECT U.S. IN
WORLD WAR IN 10
YEARS
YES
NO
20
34
10
72
Overall Percentage
a. The cut value is .500
Percentage
Correct
37.0
87.8
67.6
ters II
Slide
17
Step 0
Observed
EXPECT U.S. IN WORLD
WAR IN 10 YEARS
YES
NO
Predicted
EXPECT U.S. IN
WORLD WAR IN 10
YEARS
YES
NO
0
54
0
82
Overall Percentage
Percentage
Correct
.0
100.0
60.3
ters II
Slide
18
Problem 1
In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of
a statistic? Assume that there is no problem with missing data, outliers, or influential cases,
and that the validation analysis will confirm the generalizability of the results. Use a level of
significance of 0.05 for evaluating the statistical relationship.
The variables "age" [age], "sex" [sex], and "liberal or conservative political views" [polviews]
were useful predictors for distinguishing between groups based on responses to "seen x-rated
movie in last year" [xmovie]. These predictors differentiate survey respondents who have not
seen an x-rated movie from survey respondents who have seen an x-rated movie.
Survey respondents who were older were more likely to have not seen an x-rated movie. A one
unit increase in age increased the odds that survey respondents have not seen an x-rated movie
by 3.9%. Survey respondents who were female were approximately six and three quarters times
more likely to have not seen an x-rated movie. Survey respondents who were more
conservative were more likely to have not seen an x-rated movie. A one unit increase in liberal
or conservative political views increased the odds that survey respondents have not seen an xrated movie by approximately one and a quarter times.
1.
2.
3.
4.
True
True with caution
False
Inappropriate application of a statistic
ters II
Slide
19
Dissecting problem 1 - 1
In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of
a statistic? Assume that there is no problem with missing data, outliers, or influential cases,
and that the validation analysis will confirm the generalizability of the results. Use a level
of significance of 0.05 for evaluating the statistical relationship.
The variables "age" [age], "sex" [sex], and "liberal or conservative political views" [polviews]
For these problems, we will
were useful predictors for distinguishing
between
groupsis based
on responses to "seen x-rated
assume
that there
no problem
movie in last year" [xmovie]. These predictors
differentiate
survey
with missing
data, outliers,
or respondents who have not
seen an x-rated movie from survey respondents
who have
an x-rated movie.
influential cases,
and seen
that the
unit increase in age increased the odds that survey respondents have not seen an x-rated movie
this problem,
we are told to six and three quarters times
by 3.9%. Survey respondents who wereInfemale
were approximately
use
0.05
as
alpha
for the
more likely to have not seen an x-rated movie. Survey respondents
who were more
logistic regression.
conservative were more likely to have not seen an x-rated movie. A one unit increase in liberal
or conservative political views increased the odds that survey respondents have not seen an xrated movie by approximately one and a quarter times.
1.
2.
3.
4.
True
True with caution
False
Inappropriate application of a statistic
ters II
Slide
20
Dissecting problem 1 - 2
The
variables
listed first is
in the
the following
problem statement true, false, or an incorrect application of
In the
dataset
GSS2000.sav,
statement
are the
a statistic?
Assume
thatindependent
there is no variables
problem with missing data, outliers, or influential cases,
(IVs): "age" [age], "sex" [sex], and "liberal
and that
the validation analysis will confirm the generalizability of the results. Use a level of
or conservative political views" [polviews].
significance of 0.05 for evaluating the statistical relationship.
The variables "age" [age], "sex" [sex], and "liberal or conservative political views"
[polviews] were useful predictors for distinguishing between groups based on responses to
"seen x-rated movie in last year" [xmovie]. These predictors differentiate survey respondents
who have not seen an x-rated movie from survey respondents who have seen an x-rated movie.
The variable
used to
define
Survey
respondents
who
were older were more likely to have not seen an x-rated movie. A one
groups is the dependent
unit
increase in age increased the odds that survey respondents have not seen an x-rated movie
variable (DV): "seen x-rated
bymovie
3.9%. in
Survey
respondents
who were female were approximately six and three quarters times
last year"
[xmovie].
more likely to have not seen an x-rated movie. Survey
respondents
were
When
a problemwho
states
thatmore
a list of
conservative were more likely to have not seen an x-rated
movie.
A one unit
in liberal
independent
variables
can increase
distinguish
among
groups
and doeshave
not identify
or conservative political views increased the odds that
survey
respondents
not seen an xcontrol variable or an order of
rated movie by approximately one and a quarter times.
importance for the variables, we do a
logistic regression entering all of the
variables simultaneously.
ters II
Slide
21
Dissecting problem 1 - 3
SPSS logistic regression models the relationship by computing
the changes in the likelihood of falling in the category of the
dependent variable which had the highest numerical code.
responses to
an x-rated
movie were
In the datasetThe
GSS2000.sav,
is seeing
the following
statement
true,coded:
false, or an incorrect application of
1=
Yes
and
2
=
No.
a statistic? Assume that there is no problem with missing data, outliers, or influential cases,
and that the validation analysis will confirm the generalizability of the results. Use a level of
The SPSS output will model the changes in the likelihood of
significance of
0.05
for evaluating
statistical
not
seeing
an x-rated the
movie
becauserelationship.
the code for No is 2.
The variables "age" [age], "sex" [sex], and "liberal or conservative political views" [polviews]
were useful predictors for distinguishing between groups based on responses to "seen x-rated
movie in last year" [xmovie]. These predictors differentiate survey respondents who have
not seen an x-rated movie from survey respondents who have seen an x-rated movie.
Survey respondents who were older were more likely to have not seen an x-rated movie. A
one unit increase in age increased the odds that survey respondents have not seen an xrated movie by 3.9%. Survey respondents who were female were approximately six and
three quarters times moreThe
likely
to have not
seen
an x-rated
movie. Survey respondents
statements
of the
specific
relationships
who were more conservative
were independent
more likely variables
to have not
between
and seen
the an x-rated movie. A one
dependent variable
are views
all phrased
in terms
unit increase in liberal or conservative
political
increased
the odds that survey
on movie
not seeing
an x-rated movie.
respondents have not seenofanimpact
x-rated
by approximately
one and a quarter times.
ters II
Slide
22
Dissecting problem 1 - 4
In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of
a statistic? Assume that there is no problem with missing data, outliers, or influential cases,
and that the validation analysis will
generalizability
of the results. Use a level of
Theconfirm
specific the
relationships
for the independent
significance of 0.05 for evaluatingvariables
the statistical
relationship.
listed in the problem indicate the direction
of the relationship, increasing or decreasing the
likelihood of falling in the modeled group, and the
The variables "age" [age], "sex" [sex],
and "liberal or conservative political views" [polviews]
amount of change in the odds associated with a
were useful predictors for distinguishing
between groups based on responses to "seen x-rated
one-unit change in the independent variable.
movie in last year" [xmovie]. These predictors differentiate survey respondents who have not
seen an x-rated movie from survey respondents who have seen an x-rated movie.
Survey respondents who were older were more likely to have not seen an x-rated movie. A
one unit increase in age increased the odds that survey respondents have not seen an xrated movie by 3.9%. Survey respondents who were female were approximately six and
three quarters times more likely to have not seen an x-rated movie. Survey respondents
who were more conservative were more likely to have not seen an x-rated movie. A one
unit increase in liberal or conservative political views increased the odds that survey
respondents have not seen an x-rated movie by approximately one and a quarter times.
1.
2.
3.
4.
ters II
Slide
23
LEVEL OF MEASUREMENT - 1
In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of
a statistic? Assume that there is no problem with missing data, outliers, or influential cases,
and that the validation analysis will confirm the generalizability of the results. Use a level of
significance of 0.05 for evaluating the statistical relationship.
The variables "age" [age], "sex" [sex], and "liberal or conservative political views" [polviews]
were useful predictors for distinguishing between groups based on responses to "seen x-rated
movie in last year" [xmovie]. These predictors differentiate survey respondents who have not
seen an x-rated movie from survey respondents who have seen an x-rated movie.
Survey respondents who were older were more likely to have not seen an x-rated movie. A one
unit increase in ageLogistic
increased
the odds
that survey
respondents
regression
requires
that the
dependenthave not seen an x-rated movie
variable
be
non-metric
and
the
independent
by 3.9%. Survey respondents who were female were approximately six and three quarters times
or movie.
dichotomous.
xmore likely to havevariables
not seenbe
anmetric
x-rated
Survey"seen
respondents
who were more
rated movie in last year" [xmovie] is an
conservative were more
likely tovariable,
have not
seensatisfies
an x-rated
dichotomous
which
the movie.
level of A one unit increase in liberal
or conservative political
measurement
views increased
requirement.
the odds that survey respondents have not seen an xrated movie by approximately one and a quarter times.
1.
2.
3.
4.
True
True with caution
False
Inappropriate application of a statistic
ters II
Slide
24
LEVEL OF MEASUREMENT - 2
"Age" [age] is an interval level
"Sex" [sex] is a dichotomous
variable,
which
satisfies
the
level
or dummy-coded
In the dataset GSS2000.sav, is the following statement true,
false, or an nominal
incorrect application of
of measurement requirements for
variable
which may
be
a
statistic?
Assume
that
there
is
no
problem
with
missing
data,
outliers,
or
influential cases,
logistic regression analysis.
included in logistic regression.
and that the validation analysis will confirm the generalizability of the results. Use a level of
significance of 0.05 for evaluating the statistical relationship.
The variables "age" [age], "sex" [sex], and "liberal or conservative political views" [polviews]
were useful predictors for distinguishing between groups based on responses to "seen x-rated
movie in last year" [xmovie]. These predictors differentiate survey respondents who have not
seen an x-rated movie from survey respondents who have seen an x-rated movie.
Survey respondents who were older were more likely to have not seen an x-rated movie. A one
"Liberal
or conservative
views"
unit increase in age increased the odds that survey
respondents
havepolitical
not seen
an x-rated movie
[polviews]
is
an
ordinal
level
variable.
If
by 3.9%. Survey respondents who were female were approximately six and three quarters
times
we
follow
the
convention
of
treating
more likely to have not seen an x-rated movie. Survey respondents who were more
ordinal
level variables
metric
conservative were more likely to have not seen
an x-rated
movie. Aas
one
unit increase in liberal
variables, the level of measurement
or conservative political views increased the odds
that
survey
respondents
have not seen an xrequirement for logistic regression
rated movie by approximately one and a quarter
times.
analysis
is satisfied. Since some data
1.
2.
3.
4.
True
True with caution
False
Inappropriate application of a statistic
ters II
Slide
25
ters II
Slide
26
ters II
Slide
27
ters II
Slide
28
ters II
Slide
29
Click on the OK
button to request
the output for the
logistic regression.
ters II
Slide
30
Unselected Cases
Total
N
Included in Analysis
Missing Cases
Total
177
93
270
0
270
Percent
65.6
34.4
100.0
.0
100.0
ters II
Slide
31
Step
Block
Model
Chi-square
39.668
39.668
39.668
df
3
3
3
Sig.
.000
.000
.000
ters II
Slide
32
NUMERICAL PROBLEMS
Variables in the Equation
Step
a
1
AGE
SEX
POLVIEWS
Constant
B
.038
1.901
.306
-4.590
S.E.
.014
.410
.135
1.045
Wald
7.629
21.452
5.110
19.302
df
1
1
1
1
Sig.
.006
.000
.024
.000
Exp(B)
1.039
6.689
1.358
.010
ters II
Slide
33
AGE
SEX
POLVIEWS
Constant
B
.038
1.901
.306
-4.590
S.E.
.014
.410
.135
1.045
Wald
7.629
21.452
5.110
19.302
df
1
1
1
1
Sig.
.006
.000
.024
.000
Exp(B)
1.039
6.689
1.358
.010
ters II
Slide
34
AGE
SEX
POLVIEWS
Constant
B
.038
1.901
.306
-4.590
S.E.
.014
.410
.135
1.045
Wald
7.629
21.452
5.110
19.302
df
1
1
1
1
Sig.
.006
.000
.024
.000
Exp(B)
1.039
6.689
1.358
.010
ters II
Slide
35
AGE
SEX
POLVIEWS
Constant
B
.038
1.901
.306
-4.590
S.E.
.014
.410
.135
1.045
Wald
7.629
21.452
5.110
19.302
df
1
1
1
1
Sig.
.006
.000
.024
.000
Exp(B)
1.039
6.689
1.358
.010
ters II
Slide CLASSIFICATION USING THE LOGISTIC REGRESSION MODEL:
by chance accuracy rate
36
The independent variables could be characterized as useful
predictors distinguishing survey respondents who have not
seen an x-rated movie from survey respondents who have
seen an x-rated movie if the classification accuracy rate was
substantially higher than the accuracy attainable by chance
alone. Operationally, the classification accuracy rate should
be 25% or more higher than the proportional by chance
accuracy rate.
Classification Tablea,b
Step 0
Observed
SEEN X-RATED MOVIE
IN LAST YEAR
YES
NO
Overall Percentage
Predicted
SEEN X-RATED MOVIE
IN LAST YEAR
YES
NO
0
45
0
132
Percentage
Correct
.0
100.0
74.6
Thecut
proportional
b. The
value is .500 by chance accuracy rate was computed by first
calculating the proportion of cases for each group based on the number
of cases in each group in the classification table at Step 0. The
proportion in the "YES" group is 45/177 = 0.254. The proportion in the
"No" group is 132/177 = 0.746.
Then, we square and sum the proportion of cases in each group (0.254
+ 0.746 = 0.621). 0.621 is the proportional by chance accuracy rate.
ters II
Slide CLASSIFICATION USING THE LOGISTIC REGRESSION MODEL:
criteria for classification accuracy
37
Classification Tablea
Step 1
Observed
SEEN X-RATED MOVIE
IN LAST YEAR
YES
NO
Predicted
SEEN X-RATED MOVIE
IN LAST YEAR
YES
NO
19
26
9
123
Overall Percentage
a. The cut value is .500
Percentage
Correct
42.2
93.2
80.2
ters II
Slide
38
ters II
Slide
39
seen an x-rated movie from survey respondents who have seen an x-rated movie.
Survey respondents who were older were more likely to have not seen an x-rated movie. A
one unit increase in age increased the odds that survey respondents have not seen an xrated movie by 3.9%. Survey respondents who were female were approximately six and
three quarters times more likely to have not seen an x-rated movie. Survey respondents
who were more conservative were more likely to have not seen an x-rated movie. A one
unit increase in liberal or conservative political views increased the odds that survey
respondents have not seen an x-rated movie by approximately one and a quarter times.
1.
2.
3.
4.
True
True with caution
False
Inappropriate application of a statistic
ters II
Slide
40
Problem 2
In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of
a statistic? Assume that there is no problem with missing data, outliers, or influential cases,
and that the validation analysis will confirm the generalizability of the results. Use a level of
significance of 0.05 for evaluating the statistical relationship.
After controlling for the effect of the variable "sex" [sex] on "should marijuana be made legal"
[grass], the variable "general happiness" [happy] and "confidence in the executive branch of the
federal government" [confed] were useful predictors for distinguishing between groups based
on responses to "should marijuana be made legal" [grass]. These predictors differentiate survey
respondents who have been less supportive that the use of marijuana should be made legal
from survey respondents who have been more supportive that the use of marijuana should be
made legal.
Survey respondents who were less happy overall were less likely to have been less supportive
that the use of marijuana should be made legal. A one unit increase in general happiness
decreased the odds that survey respondents have been less supportive that the use of
marijuana should be made legal by 66.9%. Survey respondents who had less confidence in the
executive branch of the federal government were less likely to have been less supportive that
the use of marijuana should be made legal. A one unit increase in confidence in the executive
branch of the federal government decreased the odds that survey respondents have been less
supportive that the use of marijuana should be made legal by 42.8%.
1.
2.
3.
4.
True
True with caution
False
Inappropriate application of a statistic
ters II
Slide
41
Dissecting problem 2 - 1
In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of
a statistic? Assume that there is no problem with missing data, outliers, or influential cases,
and that the validation analysis will confirm the generalizability of the results. Use a level
of significance of 0.05 for evaluating the statistical relationship.
After controlling for the effect of the variable "sex" [sex] on "should marijuana be made legal"
For [happy]
these problems,
we will in the executive branch of the
[grass], the variable "general happiness"
and "confidence
federal government" [confed] were useful
predictors
forisdistinguishing
assume
that there
no problem between groups based
on responses to "should marijuana be with
mademissing
legal" data,
[grass].
These or
predictors differentiate survey
outliers,
respondents who have been less supportive
that
the
use
of
marijuana
should be made legal
influential cases, and that the
from survey respondents who have been
more supportive
that
the use of marijuana should be
validation
analysis will
confirm
made legal.
the generalizability of the
results
Survey respondents who were less happy overall were less likely to have been less supportive
In this
problem,
areincrease
told to in general happiness
that the use of marijuana should be made
legal.
A onewe
unit
use 0.05
as alpha
decreased the odds that survey respondents
have
been for
lessthe
supportive that the use of
marijuana should be made legal by 66.9%.
Survey
respondents who had less confidence in the
logistic
regression.
executive branch of the federal government were less likely to have been less supportive that
the use of marijuana should be made legal. A one unit increase in confidence in the executive
branch of the federal government decreased the odds that survey respondents have been less
supportive that the use of marijuana should be made legal by 42.8%.
1.
2.
3.
4.
True
True with caution
False
Inappropriate application of a statistic
ters II
Slide
42
Dissecting problem 2 - 2
The variables listed first in the problem statement are
the independent variables (IVs): "sex" [sex] , "general
happiness" [happy], and "confidence in the executive
branch of the federal government" [confed].
Sex is a control variable and general happiness and
In the dataset
GSS2000.sav,
is the branchy
followingare
statement
true, false, or an incorrect application of
confidence
in the executive
predictors.
a statistic? Assume that there is no problem with missing data, outliers, or influential cases, and
that the validation analysis will confirm the generalizability of the results. Use a level of
significance of 0.05 for evaluating the statistical relationship.
After controlling for the effect of the variable "sex" [sex] on "should marijuana be made
legal" [grass], the variable "general happiness" [happy] and "confidence in the executive
branch of the federal government" [confed] were useful predictors for distinguishing
between groups based on responses to "should marijuana be made legal" [grass]. These
predictors differentiate survey respondents who have been less supportive that the use of
marijuana should be made legal from survey respondents who have been more supportive that
the use of marijuana should be made legal.
ters II
Slide
43
Dissecting problem 2 - 3
a statistic? Assume that there is no problem with missing data, outliers, or influential cases,
and that
the validation
willchanges
confirminthe
of the results. Use a level of
The SPSS
output willanalysis
model the
thegeneralizability
likelihood of
significance
of
0.05
for
evaluating
the
statistical
relationship.
being less supportive of legalizing marijuana because 2
corresponds to not legalizing marijuana.
After controlling for the effect of the variable "sex" [sex] on "should marijuana be made legal"
[grass], the variable "general happiness" [happy] and "confidence in the executive branch of the
federal government" [confed] were useful predictors for distinguishing between groups based
on responses to "should marijuana be made legal" [grass]. These predictors differentiate
survey respondents who have been less supportive that the use of marijuana should be
made legal from survey respondents who have been more supportive that the use of
marijuana should be made legal.
Survey respondents who were less happy overall were less likely to have been less supportive
that the use of marijuana should be made legal. A one unit increase in general happiness
decreased the odds that survey respondents have been less supportive that the use of
marijuana should be made
legal by 66.9%.
had less confidence in the
The statements
of the Survey
specificrespondents
relationshipswho
between
executive branch of the
federal
government
were
less
likely
to
have
been less supportive that
independent variables and the dependent variable are all
the use of marijuana should be made legal. A one unit increase in confidence in the executive
in terms of impact on being less supportive of
branch of the federal phrased
government
decreased the odds that survey respondents have been less
legalizing
marijuana.
supportive that the use of marijuana
should be made legal by 42.8%.
ters II
Slide
44
Dissecting problem 2 - 4
In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of
a statistic? Assume that there is no problem with missing data, outliers, or influential cases,
and that the validation analysis will confirm the generalizability of the results. Use a level of
significance of 0.05 for evaluating the statistical relationship.
The specific relationships for the independent
variables listed in the problem indicate the direction
After controlling for the effect ofofthe
variable "sex" [sex] on "should marijuana be made legal"
the relationship, increasing or decreasing the
[grass], the variable "general happiness"
[happy] and "confidence in the executive branch of the
likelihood of falling in the modeled group, and the
federal government" [confed] were useful predictors for distinguishing between groups based
of change
in the odds
associated
with
a
on responses to "should marijuanaamount
be made
legal" [grass].
These
predictors
differentiate
survey
one-unit
change
in
the
independent
variable.
respondents who have been less supportive that the use of marijuana should be made legal
from survey respondents who have been more supportive that the use of marijuana should be
made legal.
Survey respondents who were less happy overall were less likely to have been less
supportive that the use of marijuana should be made legal. A one unit increase in general
happiness decreased the odds that survey respondents have been less supportive that the
use of marijuana should be made legal by 66.9%. Survey respondents who had less
confidence in the executive branch of the federal government were less likely to have been
less supportive that the use of marijuana should be made legal. A one unit increase in
confidence in the executive branch of the federal government decreased the odds that
survey respondentsInhave
less
supportive
that question
the use to
of be
marijuana
orderbeen
for the
logistic
regression
true, the should be made
legal by 42.8%.
relationship between the predictors and the dependent variable
1.
2.
3.
4.
ters II
Slide
45
LEVEL OF MEASUREMENT - 1
In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of
a statistic? Assume that there is no problem with missing data, outliers, or influential cases,
and that the validation analysis will confirm the generalizability of the results. Use a level of
significance of 0.05 for evaluating the statistical relationship.
After controlling for the effect of the variable "sex" [sex] on "should marijuana be made legal"
[grass], the variable "general happiness" [happy] and "confidence in the executive branch of the
federal government" [confed] were useful predictors for distinguishing between groups based
on responses to "should marijuana be made legal" [grass]. These predictors differentiate
survey respondents who have been less supportive that the use of marijuana should be
made legal from survey respondents who have been more supportive that the use of
marijuana should be made legal.
Survey respondents who were less happy overall were less likely to have been less supportive
that the use of Logistic
marijuana
should be
maderequires
legal. A that
one the
unitdependent
increase in general happiness
regression
analysis
decreased the odds
thatbesurvey
respondents
have
been less variables
supportive that the use of
variable
dichotomous
and the
independent
be be
metric
or legal
dichotomous.
"Should
be made
marijuana should
made
by 66.9%.
Surveymarijuana
respondents
who had less confidence in the
legal"
[grass]
is
a
dichotomous
variable,
which
satisfies
executive branch of the federal government were less likely to have been less supportive that
the level
of measurement
requirement
forincrease
the dependent
the use of marijuana
should
be made legal.
A one unit
in confidence in the executive
variable.
branch of the federal government decreased the odds that survey respondents have been less
supportive thatItthe
use of marijuana should be made legal by 42.8%.
contains two categories:
survey respondents who have been less supportive that
the use of marijuana should be made legal
True
survey respondents who have been more supportive
True with caution
that the use of marijuana should be made legal
1.
2.
3. False
4. Inappropriate application of a statistic
ters II
Slide
46
LEVEL OF MEASUREMENT - 2
"Sex" [sex] is a dichotomous or
dummy-coded nominal variable which
In the dataset
GSS2000.sav,
following
statement true, false, or an incorrect application of
may
be includedisinthe
logistic
regression
a statistic? Assume
that there is no problem with missing data, outliers, or influential cases,
analysis.
and that the validation analysis will confirm the generalizability of the results. Use a level of
significance of 0.05 for evaluating the statistical relationship.
After controlling for the effect of the variable "sex" [sex] on "should marijuana be made
legal" [grass], the variable "general happiness" [happy] and "confidence in the executive
branch of the federal government" [confed] were useful predictors for distinguishing
between groups based on responses to "should marijuana be made legal" [grass]. These
predictors differentiate survey respondents who have been less supportive that the use of
marijuana should be made legal from survey respondents who have been more supportive that
the use of marijuana should be made legal.
"General happiness" [happy] and "confidence in the
executive
the less
federal
government"
Survey respondents who were less
happy branch
overallof
were
likely
to have been less supportive
[confed]
are
ordinal
level
variables.
If we
follow the
that the use of marijuana should be made legal. A one unit increase
in general
happiness
convention
of
treating
ordinal
level
variables
decreased the odds that survey respondents have been less supportive thatasthe use of
variables,
the
level of measurement
marijuana should be made legalmetric
by 66.9%.
Survey
respondents
who had less confidence in the
requirement for logistic regression analysis is
executive branch of the federalsatisfied.
government
less
likely
to have
been
less with
supportive that
Sincewere
some
data
analysts
do not
agree
the use of marijuana should be this
made
legal.
A
one
unit
increase
in
confidence
in
the executive
convention, a note of caution should be included
branch of the federal government
decreased
the odds that survey respondents have been less
in our
interpretation.
supportive that the use of marijuana should be made legal by 42.8%.
ters II
Slide
47
ters II
Slide
48
ters II
Slide
49
ters II
Slide
50
ters II
Slide
51
In our hierarchical
regression, we will specify
that all of the variables in
each block be entered
simultaneously when the
block is entered.
ters II
Slide
52
Click on the OK
button to request
the output for the
logistic regression.
ters II
Slide
53
Unselected Cases
Total
N
Included in Analysis
Missing Cases
Total
163
107
270
0
270
Percent
60.4
39.6
100.0
.0
100.0
ters II
Slide
54
ters II
Slide
55
NUMERICAL PROBLEMS
Variables in the Equation
Step
a
1
SEX
HAPPY
CONFED
Constant
B
.154
-1.104
-.559
3.721
S.E.
.351
.354
.270
1.066
Wald
.194
9.739
4.290
12.195
df
1
1
1
1
Sig.
.660
.002
.038
.000
Exp(B)
1.167
.331
.572
41.308
ters II
Slide
56
SEX
HAPPY
CONFED
Constant
B
.154
-1.104
-.559
3.721
S.E.
.351
.354
.270
1.066
Wald
.194
9.739
4.290
12.195
df
1
1
1
1
Sig.
.660
.002
.038
.000
Exp(B)
1.167
.331
.572
41.308
ters II
Slide
57
SEX
HAPPY
CONFED
Constant
B
.154
-1.104
-.559
3.721
S.E.
.351
.354
.270
1.066
Wald
.194
9.739
4.290
12.195
df
1
1
1
1
Sig.
.660
.002
.038
.000
Exp(B)
1.167
.331
.572
41.308
ters II
Slide CLASSIFICATION USING THE LOGISTIC REGRESSION MODEL:
by chance accuracy rate
58
The independent variables could be characterized as useful
predictors distinguishing survey respondents who have been
less supportive that the use of marijuana should be made
legal from survey respondents who have been more
supportive that the use of marijuana should be made legal if
the classification accuracy rate was substantially higher than
the accuracy attainable by chance alone. Operationally, the
classification accuracy rate should be 25% or more higher
than the proportional by chance accuracy rate.
Classification Tablea,b
Step 0
Observed
SHOULD MARIJUANA
BE MADE LEGAL
LEGAL
NOT LEGAL
Predicted
SHOULD MARIJUANA BE
MADE LEGAL
LEGAL
NOT LEGAL
0
57
0
106
Overall Percentage
a. Constant is included in the model.
b. The cut value is .500
Percentage
Correct
.0
100.0
65.0
ters II
Slide CLASSIFICATION USING THE LOGISTIC REGRESSION MODEL:
criteria for classification accuracy
59
Classification Tablea
Step 1
Observed
SHOULD MARIJUANA
BE MADE LEGAL
LEGAL
NOT LEGAL
Predicted
SHOULD MARIJUANA BE
MADE LEGAL
LEGAL
NOT LEGAL
18
39
13
93
Overall Percentage
a. The cut value is .500
Percentage
Correct
31.6
87.7
68.1
ters II
Slide
60
True
True with caution
False
Inappropriate application of a
statistic
ters II
Slide
61
respondents who have been less supportive that the use of marijuana should be made legal
from survey respondents who have been more supportive that the use of marijuana should be
made legal.
Survey respondents who were less happy overall were less likely to have been less supportive
that the use of marijuana should be made legal. A one unit increase in general happiness
decreased the odds that survey respondents have been less supportive that the use of
marijuana should be made legal by 66.9%. Survey respondents who had less confidence in the
executive branch of the federal government were less likely to have been less supportive that
the use of marijuana should be made legal. A one unit increase in confidence in the executive
branch of the federal government decreased the odds that survey respondents have been less
supportive that the use of marijuana should be made legal by 42.8%.
1.
2.
3.
4.
True
True with caution
False
Inappropriate application of a statistic
ters II
Slide
62
Problem 3
In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of
a statistic? Assume that there is no problem with missing data, outliers, or influential cases,
and that the validation analysis will confirm the generalizability of the results. Use a level of
significance of 0.05 for evaluating the statistical relationship.
From the list of variables "highest academic degree" [degree], "total family income"
[income98], and "satisfaction with financial situation" [satfin], the most useful predictor for
distinguishing between groups based on responses to "expect u.s. in world war in 10 years"
[uswary] was "total family income" [income98]. These predictors differentiate survey
respondents who have been less positive that the United States would fight in another world
war within the next ten years from survey respondents who have been more positive that the
United States would fight in another world war within the next ten years.
The most important predictor for identifying survey respondents who have been less positive
that the United States would fight in another world war within the next ten years was total
family income.
Survey respondents who had higher total family incomes were more likely to have been less
positive that the United States would fight in another world war within the next ten years. A
one unit increase in total family income increased the odds that survey respondents have been
less positive that the United States would fight in another world war within the next ten years
by 10.0%.
1.
2.
3.
4.
True
True with caution
False
Inappropriate application of a statistic
ters II
Slide
63
Dissecting Problem 3 - 1
In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of
a statistic? Assume that there is no problem with missing data, outliers, or influential cases,
and that the validation analysis will confirm the generalizability of the results. Use a level
of significance of 0.05 for evaluating the statistical relationship.
From the list of variables "highest academic degree" [degree], "total family income"
[income98], and "satisfaction with financial
situation"
[satfin],
For these
problems,
we willthe most useful predictor for
distinguishing between groups based on
responses
to
"expect
u.s. in world war in 10 years"
assume that there is no problem
[uswary] was "total family income" [income98].
These
with missing
data,predictors
outliers, ordifferentiate survey
respondents who have been less positive
that
the
United
would fight in another world
influential cases, andStates
that the
war within the next ten years from survey
respondents
who
have
been more positive that the
validation analysis will confirm
United States would fight in another world
war within the
the generalizability
of next
the ten years.
results
The most important predictor for identifying survey respondents who have been less positive
that the United States would fight in another
world war
the
In this problem,
we within
are told
to next ten years was total
family income.
use 0.05 as alpha for the
logistic regression.
Survey respondents who had higher total family incomes were more likely to have been less
positive that the United States would fight in another world war within the next ten years. A
one unit increase in total family income increased the odds that survey respondents have been
less positive that the United States would fight in another world war within the next ten years
by 10.0%.
1.
2.
3.
4.
True
True with caution
False
Inappropriate application of a statistic
ters II
Slide
64
Dissecting Problem 3 - 2
The variables listed first in the
The variable used to
problem statement are the
define groups is the
independent variables (IVs): "highest
dependent variable (DV):
academic degree" [degree], "total
"expect u.s. in world war
family income" [income98], and
in 10
[uswary].
In the "satisfaction
dataset GSS2000.sav,
is thesituation"
following statement true, false,
or years"
an incorrect
application of
with financial
a statistic?
Assume
that
there
is
no
problem
with
missing
data,
outliers,
or
influential
cases, and
[satfin].
that the validation analysis will confirm the generalizability of the results. Use a level of
significance of 0.05 for evaluating the statistical relationship.
From the list of variables "highest academic degree" [degree], "total family income"
[income98], and "satisfaction with financial situation" [satfin], the most useful predictor for
distinguishing between groups based on responses to "expect u.s. in world war in 10 years"
[uswary] was "total family income" [income98]. These predictors differentiate survey
respondents who have been less positive that the United States would fight in another world war
within the next ten years from survey respondents who have been more positive that the United
States would fight in another world war within the next ten years.
The most important predictor for identifying survey respondents who have been less positive
that the United States would fight in another world war within the next ten years was total
family income.
Since the problem identifies
themore
mostlikely
usefultoofhave been less
Survey respondents who had higher total family incomes were
positive that the United States would fight in another world important
war withinpredictor,
the nextwe
ten do
years. A one
a
stepwise
logistic
unit increase in total family income increased the odds that survey respondents have been less
positive that the United States would fight in another world regression.
war within the next ten years by
10.0%.
ters II
Slide
65
Dissecting Problem 3 - 3
In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of
a statistic? Assume that there is no problem with missing data, outliers, or influential cases, and
that the validation analysis will confirm the generalizability of the results. Use a level of
significance of 0.05 for evaluating the statistical relationship.
From the list of variables "highest academic degree" [degree], "total family income" [income98],
and "satisfaction with financial situation" [satfin], the most useful predictor for distinguishing
between groups based on responses to "expect u.s. in world war in 10 years" [uswary] was "total
family income" [income98]. These predictors differentiate survey respondents who have been
less positive that the United States would fight in another world war within the next ten
years from survey respondents who have been more positive that the United States would
fight in another world war within the next ten years.
The most important predictor for identifying survey respondents who have been less positive
that the United States would fight in another world war within the next ten years was total
family income.
SPSS logistic regression models the relationship by computing the
changes
the
likelihood
fallingincomes
in the category
of the
Survey respondents
who in
had
higher
totalof
family
were more
likely to have been less
positive that thedependent
United States
would
fight
in the
another
world
war within
variable
which
had
highest
numerical
code.the next ten years. A one
unit increase in total family income increased the odds that survey respondents have been less
positive that theThe
United
Statesto
would
fight
in in
another
world
waryears
withinwere
the next ten years by
responses
expect
u.s.
world war
in 10
10.0%.
coded: 1= Yes and 2 = No.
The SPSS output will model the changes in the likelihood of being
less positive that the United States would fight in another world
war within the next ten years.
ters II
Slide
66
Dissecting Problem 3 - 4
In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of
a statistic? Assume that there is no problem with missing data, outliers, or influential cases,
and that the validation analysis will confirm the generalizability of the results. Use a level of
The statements of the specific
significance of 0.05 for evaluating the statistical
relationship.
relationships between independent
variables and the dependent variable are
From the list of variables "highest academic
"total
family
income"
alldegree"
phrased[degree],
in terms of
impact
on being
[income98], and "satisfaction with financialless
situation"
[satfin],
the
most
useful
predictor for
positive that the United States would
distinguishing between groups based on responses
to "expect
u.s.
in within
world the
warnext
in 10 years"
fight in another
world
war
[uswary] was "total family income" [income98].
These
predictors
differentiate
survey
ten years.
respondents who have been less positive that the United States would fight in another world
war within the next ten years from survey respondents who have been more positive that the
United States would fight in another world war within the next ten years.
The most important predictor for identifying survey respondents who have been less
positive that the United States would fight in another world war within the next ten years
was total family income.
Survey respondents who had higher total family incomes were more likely to have been less
positive that the United States would fight in another world war within the next ten years.
A one unit increase in total family income increased the odds that survey respondents have
been less positive that the United States would fight in another world war within the next
ten years by 10.0%.
ters II
Slide
67
Dissecting Problem 3 - 5
From the list of variables "highest academic degree" [degree], "total family income"
[income98], and "satisfaction with financial situation" [satfin], the most useful predictor for
distinguishing between groups based on responses to "expect u.s. in world war in 10 years"
[uswary] was "total family income"
[income98]. These predictors differentiate survey
The specific relationships for the independent
respondents who have been less positive
that the United States would fight in another world
variables listed in the problem indicate the direction
war within the next ten years from
survey respondents who have been more positive that the
of the
relationship,
increasing
decreasing
United States would fight in another
world
war within
the nextorten
years. the
likelihood of falling in the modeled group, and the
amount of change associated with a one-unit
The most important predictor forchange
identifying
respondents
who have been less positive
in thesurvey
independent
variable.
that the United States would fight in another world war within the next ten years was total
family income.
Survey respondents who had higher total family incomes were more likely to have been less
positive that the United States would fight in another world war within the next ten years.
A one unit increase in total family income increased the odds that survey respondents have
been less positive that the United States would fight in another world war within the next
ten years by 10.0%.
1.
2.
3.
4.
True
True with caution
In order for the logistic regression question to be true, the
False
relationship between the predictors selected for inclusion and the
Inappropriate application
a statistic
dependent of
variable
must be statistically significant, there must be
no evidence of a flawed numerical analysis, the classification
accuracy rate must be substantially better than could be obtained
by chance alone, and the order of entry and each significant
relationship must be interpreted correctly.
ters II
Slide
68
LEVEL OF MEASUREMENT - 1
In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of
a statistic? Assume that there is no problem with missing data, outliers, or influential cases, and
that the validation analysis will confirm the generalizability of the results. Use a level of
significance of 0.05 for evaluating the statistical relationship.
From the list of variables "highest academic degree" [degree], "total family income" [income98],
and "satisfaction with financial situation" [satfin], the most useful predictor for distinguishing
between groups based on responses to "expect u.s. in world war in 10 years" [uswary] was "total
family income" [income98]. These predictors differentiate survey respondents who have been
less positive that the United States would fight in another world war within the next ten years
from survey respondents who have been more positive that the United States would fight in
another world war within the next ten years.
The most important predictor for identifying survey respondents who have been less positive
that the United States would fight in another world war within the next ten years was total
family income. Logistic regression analysis requires that the dependent variable
unit increase in total family income increased the odds that survey respondents have been less
positive that the United States would fight in another world war within the next ten years by
10.0%.
It contains two categories:
survey
respondents
States would fight in
True
survey respondents
True with caution
States would fight in
1.
2.
3. False
4. Inappropriate application of a statistic
ters II
Slide
69
LEVEL OF MEASUREMENT - 2
"Highest academic degree" [degree], "total family
income" [income98], and "satisfaction with financial
situation" [satfin] are ordinal level variables. If we
follow the convention of treating ordinal level
variables as metric variables, the level of
measurement requirement for logistic regression
In the dataset GSS2000.sav, is the
following
statement
true,
false,
an incorrect
analysis
is satisfied.
Since
some
dataoranalysts
do notapplication of
with this
convention,
a note
of caution
should
a statistic? Assume that there isagree
no problem
with
missing data,
outliers,
or influential
cases, and
included
our interpretation.
that the validation analysis will be
confirm
theingeneralizability
of the results. Use a level of
From the list of variables "highest academic degree" [degree], "total family income" [income98],
and "satisfaction with financial situation" [satfin], the most useful predictor for distinguishing
between groups based on responses to "expect u.s. in world war in 10 years" [uswary] was "total
family income" [income98]. These predictors differentiate survey respondents who have been
less positive that the United States would fight in another world war within the next ten years
from survey respondents who have been more positive that the United States would fight in
another world war within the next ten years.
The most important predictor for identifying survey respondents who have been less positive
that the United States would fight in another world war within the next ten years was total
family income.
Survey respondents who had higher total family incomes were more likely to have been less
positive that the United States would fight in another world war within the next ten years. A one
unit increase in total family income increased the odds that survey respondents have been less
positive that the United States would fight in another world war within the next ten years by
10.0%.
ters II
Slide
70
ters II
Slide
71
ters II
Slide
72
ters II
Slide
73
ters II
Slide
74
ters II
Slide
75
ters II
Slide
76
Click on the
Continue button to
close the dialog box.
ters II
Slide
77
Click on the OK
button to request
the output for the
logistic regression.
ters II
Slide
78
Unselected Cases
Total
N
Included in Analysis
Missing Cases
Total
136
134
270
0
270
Percent
50.4
49.6
100.0
.0
100.0
ters II
Slide
79
ters II
Slide
80
NUMERICAL PROBLEMS
Variables in the Equation
Step
a
1
INCOME98
Constant
B
.095
-1.033
S.E.
.033
.527
Wald
8.436
3.847
df
1
1
Sig.
.004
.050
Exp(B)
1.100
.356
ters II
Slide
81
INCOME98
Constant
B
.095
-1.033
S.E.
.033
.527
Wald
8.436
3.847
df
1
1
Sig.
.004
.050
Exp(B)
1.100
.356
ters II
Slide
82
Step
1
Improvement
Chi-square
df
9.001
Sig.
.003
Chi-square
9.001
Model
df
Sig.
1
Correct
Class %
.003
67.6%
Variable
IN:
INCOME9
8
ters II
Slide CLASSIFICATION USING THE LOGISTIC REGRESSION MODEL:
by chance accuracy rate
83
The independent variables could be characterized as useful
predictors distinguishing survey respondents who have been
less positive that the United States would fight in another
world war within the next ten years from survey respondents
who have been more positive that the United States would
fight in another world war within the next ten years if the
classification accuracy rate was substantially higher than the
accuracy attainable by chance alone. Operationally, the
classification accuracy rate should be 25% or more higher
than the proportional by chance accuracy rate.
Classification Tablea,b
Step 0
Observed
EXPECT U.S. IN WORLD
WAR IN 10 YEARS
YES
NO
Predicted
EXPECT U.S. IN
WORLD WAR IN 10
YEARS
YES
NO
0
54
0
82
Overall Percentage
a. Constant is included in the model.
b. The
cut proportional
value is .500 by chance accuracy rate was computed by
The
Percentage
Correct
.0
100.0
60.3
ters II
Slide CLASSIFICATION USING THE LOGISTIC REGRESSION MODEL:
criteria for classification accuracy
84
Classification Tablea
Step 1
Observed
EXPECT U.S. IN WORLD
WAR IN 10 YEARS
YES
NO
Predicted
EXPECT U.S. IN
WORLD WAR IN 10
YEARS
YES
NO
20
34
10
72
Overall Percentage
a. The cut value is .500
Percentage
Correct
37.0
87.8
67.6
ters II
Slide
85
positive that the United States would fight in another world war within the next ten years. A one
was no
evidence
of numerical
problems in
unit increase in total family incomeThere
increased
the
odds that
survey respondents
have been less
the fight
solution.
positive that the United States would
in another world war within the next ten years by
10.0%.
1.
2.
3.
4.
True
True with caution
False
Inappropriate application of a statistic
ters II
Slide
86
respondents who have been less positive that the United States would fight in another world
war within the next ten
from survey
respondents
who have
been more positive that the
We years
also verified
the order
of importance
for the
United States would fight
in another
world included
war within
thestepwise
next ten years.
independent
variables
in the
analysis.
The most important predictor for identifying survey respondents who have been less positive
that the United States would fight in another world war within the next ten years was total
family income.
Survey respondents who had higher total family incomes were more likely to have been less
positive that the United States would fight in another world war within the next ten years. A
Theodds
answer
the question
is true have been
one unit increase in total family income increased the
thattosurvey
respondents
with caution.
less positive that the United States would fight in another
world war within the next ten years
by 10.0%.
1.
2.
3.
4.
True
True with caution
False
Inappropriate application of a statistic
ters II
Slide
87
No
Inappropriate
application of
a statistic
Yes
Ratio of cases to
independent variables at
least 10 to 1?
Yes
Run logistic regression, using method for including
variables identified in the research question.
No
Inappropriate
application of
a statistic
ters II
Slide
88
No
No
False
Hierarchical method of
entry used to include
independent variables?
Presence of relationship
confirmed by test of
model chi-square?
Yes
Presence of relationship
confirmed by test of
block chi-square?
No
False
Yes
Yes
Standard errors of
coefficients indicate
presence of numerical
problems (s.e. > 2.0)?
No
Yes
False
ters II
Slide
89
Yes
No
Entry order of variables
interpreted correctly?
No
Yes
Relationships between
individual IVs and DV groups
interpreted correctly?
Yes
No
False
False
ters II
Slide
90
No
False
Yes
No
Yes
No
True
Yes