Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 98

Logistic Regression

Logistic Regression and Odds


Ratios

Example of Odds Ratio


Using Relationship between
Death Penalty and Race
Probability and Odds
• We begin with a frequency distribution for the variable “Death Penalty
for Crime”

 The probability of receiving a death sentence is 0.34 or 34%


(50/147)

 The odds of receiving a death sentence = death


sentence/not death sentence = 50/97 = 0.5155
Interpreting Odds
• The odds of 0.5155 can be stated in different ways:
– Defendants can expect to receive a death sentence
instead of life imprisonment in about half of their
trials
– Receiving a death sentence is half as likely as
receiving a sentence of life imprisonment

• Or, inverting the odds,


– Receiving a life imprisonment sentence is twice as
likely as receiving the death penalty.
Impact of an Independent Variable
• If an independent variable impacts or has a relationship to a dependent
variable, it will change the odds of being in the key dependent variable
group, e.g. death sentence.

• The following table shows the relationship between race and sentence:
Odds for Independent Variable Groups

• We can compute the odds of receiving a death penalty for each of the
groups:

• The odds of receiving a death sentence if the defendant was Black =


28/45 = 0.6222
• The odds of receiving a death sentence if the defendant was not Black =
22/52 = 0.4231
The Odds Ratio Measures the Effect
• The impact of being black on receiving a death penalty is measured by the odds
ratio which equals:
= the odds if black ÷ the odds if not black
= 0.6222 ÷ 0.4231 = 1.47

• Which we interpret as:


• Blacks are 1.47 times more likely to receive a death sentence as non
blacks
• The risk of receiving a death sentence are 1.47 times greater for blacks
than non blacks
• The odds of a death sentence for blacks are 47% higher than the odds
of a death sentence for non blacks. (1.47 - 1.00)
• The predicted odds for black defendants are 1.47 times the odds for
non black defendants.
• A one unit change in the independent variable race (nonblack to black)
increases the odds of receiving a death penalty by a factor of 1.47.
SPSS Output for this Relationship

The Exp(B) output using SPSS is the


change in the odds ratio.

The odds ratio is output in SPSS in the


column labeled Exp(B).

Variable s in the Equation

B S.E. Wald df Sig. Exp(B)


Step
a
BLACKD .386 .350 1.213 1 .271 1.471
1 Constant -.860 .254 11.439 1 .001 .423
a. Variable(s) entered on step 1: BLACKD.
Logistic Regression – Basic Relationships

Logistic Regression

Describing Relationships

Classification Accuracy

Sample Problems
Logistic regression
• Logistic regression is used to analyze relationships between a
dichotomous dependent variable and metric or dichotomous
independent variables. (SPSS now supports Multinomial Logistic
Regression that can be used with more than two groups, but our
focus here is on binary logistic regression for two groups.)
• Logistic regression combines the independent variables to estimate
the probability that a particular event will occur, i.e. a subject will
be a member of one of the groups defined by the dichotomous
dependent variable. In SPSS, the model is always constructed to
predict the group with higher numeric code. If responses are
coded 1 for Yes and 2 for No, SPSS will predict membership in the
No category. If responses are coded 1 for No and 2 for Yes, SPSS will
predict membership in the Yes category. We will refer to the
predicted event for a particular analysis as the modeled event.
• This will create some awkward wording in our problems. Our only
option for changing this is to recode the variable.
What logistic regression predicts
• The variate or value produced by logistic regression is a probability
value between 0.0 and 1.0.

• If the probability for group membership in the modeled category is


above some cut point (the default is 0.50), the subject is predicted to
be a member of the modeled group. If the probability is below the
cut point, the subject is predicted to be a member of the other group.

• For any given case, logistic regression computes the probability that a
case with a particular set of values for the independent variable is a
member of the modeled category.
• SW388R7
Level of measurement requirements
• Logistic regression analysis requires that the
dependent variable be dichotomous.

• If an independent variable is nominal level and not


dichotomous, the logistic regression procedure in SPSS
has a option to dummy code the variable for you.

• If an independent variable is ordinal, we will attach


the usual caution.
Assumptions
• Logistic regression does not make any
assumptions of normality, linearity, and
homogeneity of variance for the independent
variables.

• Because it does not impose these


requirements, it is preferred to discriminant
analysis when the data does not satisfy these
assumptions.
Sample size requirements
• The minimum number of cases per independent
variable is 10, using a guideline provided by Hosmer
and Lemeshow, authors of Applied Logistic
Regression, one of the main resources for Logistic
Regression.

• For preferred case-to-variable ratios, we will use 20


to 1 for simultaneous and hierarchical logistic
regression and 50 to 1 for stepwise logistic
regression.
Methods for including variables
• There are three methods available for including
variables in the regression equation:
– the simultaneous method in which all independents are included at the same time
– The hierarchical method in which control variables are entered in the analysis before
the predictors whose effects we are primarily concerned with.
– The stepwise method (forward conditional in SPSS) in which variables are selected in
the order in which they maximize the statistically significant contribution to the model.

• For all methods, the contribution to the model is


measures by model chi-square is a statistical measure of
the fit between the dependent and independent
variables, like R².
Computational method
• Multiple regression uses the least-squares method to find the
coefficients for the independent variables in the regression equation,
i.e. it computed coefficients that minimized the residuals for all cases.
• Logistic regression uses maximum-likelihood estimation to compute
the coefficients for the logistic regression equation. This method finds
attempts to find coefficients that match the breakdown of cases on
the dependent variable.
• The overall measure of how will the model fits is given by the
likelihood value, which is similar to the residual or error sum of
squares value for multiple regression. A model that fits the data well
will have a small likelihood value. A perfect model would have a
likelihood value of zero.
• Maximum-likelihood estimation is an interative procedure that
successively tries works to get closer and closer to the correct answer.
When SPSS reports the "iterations," it is telling us how may cycles it
took to get the answer.
Overall test of relationship
• The overall test of relationship among the independent variables
and groups defined by the dependent is based on the reduction in
the likelihood values for a model which does not contain any
independent variables and the model that contains the
independent variables.

• This difference in likelihood follows a chi-square distribution, and


is referred to as the model chi-square.

• The significance test for the model chi-square is our statistical


evidence of the presence of a relationship between the dependent
variable and the combination of the independent variables.
Beginning logistic regression model

• The SPSS output for logistic


regression begins with output for a
model that contains no
independent variables. It labels
this output "Block 0: Beginning
Block" and (if we request the
optional iteration history) reports
the initial -2 Log Likelihood, which
we can think of as a measure of the
error associated trying to predict
the dependent variable without The initial -2 log
using any information from the likelihood is 213.891.
independent variables.

We will not routinely request


the iteration history because
it does not usually yield us
additional useful information.
Ending logistic regression model
• After the independent variables
are entered in Block 1, the -2
log likelihood is again measured
(180.267 in this problem).
• The difference between ending
and beginning -2 log likelihood
is the model chi-square that is
used in the test of overall
statistical significance.
• In this problem, the model chi- Model chi-square is
square is 33.625 (213.891 – 33.625, significant at
p < 0.001.
180.267), which is statistically
significant at p<0.001.
Relationship of Individual Independent
Variables and Dependent Variable
• There is a test of significance for the relationship between an individual
independent variable and the dependent variable, a significance test of the
Wald statistic .

• The individual coefficients represent change in the probability of being a


member of the modeled category. Individual coefficients are expressed in log
units and are not directly interpretable. However, if the b coefficient is used as
the power to which the base of the natural logarithm (2.71828) is raised, the
result represents the change in the odds of the modeled event associated with
a one-unit change in the independent variable.
• If a coefficient is positive, its transformed log value will be greater than one,
meaning that the modeled event is more likely to occur. If a coefficient is
negative, its transformed log value will be less than one, and the odds of the
event occurring decrease. A coefficient of zero (0) has a transformed log value
of 1.0, meaning that this coefficient does not change the odds of the event one
way or the other.
Numerical problems
• The maximum likelihood method used to calculate logistic regression is
an iterative fitting process that attempts to cycle through repetitions to
find an answer.
• Sometimes, the method will break down and not be able to converge or
find an answer.
• Sometimes the method will produce wildly improbable results, reporting
that a one-unit change in an independent variable increases the odds of
the modeled event by hundreds of thousands or millions. These
implausible results can be produced by multicollinearity, categories of
predictors having no cases or zero cells, and complete separation
whereby the two groups are perfectly separated by the scores on one or
more independent variables.
• The clue that we have numerical problems and should not interpret the
results are standard errors for some independent variables that are
larger than 2.0.
Strength of logistic regression relationship
• While logistic regression does compute correlation
measures to estimate the strength of the
relationship (pseudo R square measures, such as
Nagelkerke's R²), these correlations measures do
not really tell us much about the accuracy or errors
associated with the model.

• A more useful measure to assess the utility of a


logistic regression model is classification accuracy,
which compares predicted group membership
based on the logistic model to the actual, known
group membership, which is the value for the
dependent variable.
Evaluating usefulness for logistic models
• The benchmark that we will use to characterize a logistic regression
model as useful is a 25% improvement over the rate of accuracy
achievable by chance alone.

• Even if the independent variables had no relationship to the groups


defined by the dependent variable, we would still expect to be
correct in our predictions of group membership some percentage
of the time. This is referred to as by chance accuracy.

• The estimate of by chance accuracy that we will use is the


proportional by chance accuracy rate, computed by summing the
squared percentage of cases in each group.
Comparing accuracy rates
• To characterize our model as useful, we compare the overall percentage
accuracy rate produced by SPSS at the last step in which variables are entered
to 25% more than the proportional by chance accuracy. (Note: SPSS does not
compute a cross-validated accuracy rate for logistic regression.)

Classification Tablea

Predicted
EXPECT U.S. IN
WORLD WAR IN 10
YEARS Percentage
Observed YES NO Correct
Step 1 EXPECT U.S. IN WORLD YES 20 34 37.0
WAR IN 10 YEARS NO 10 72 87.8
Overall Percentage 67.6
a. The cut value is .500

SPSS reports the overall accuracy rate in


the footnotes to the table "Classification
Table." The overall accuracy rate
computed by SPSS was 67.6%.
Computing by chance accuracy
The number of cases in each group is found in the Classification Table at Step 0
(before any independent variables are included). The proportion of cases in the
largest group is equal to the overall percentage (60.3%).

Classification Tablea,b

Predicted
EXPECT U.S. IN
WORLD WAR IN 10
YEARS Percentage
Observed YES NO Correct
Step 0 EXPECT U.S. IN WORLD YES 0 54 .0
WAR IN 10 YEARS NO 0 82 100.0
Overall Percentage 60.3
a. Constant is included in the model.
b. The cut value is .500

The proportional by chance accuracy rate was computed by


calculating the proportion of cases for each group based on the
number of cases in each group in the classification table at Step
0, and then squaring and summing the proportion of cases in
each group (0.397² + 0.603² = 0.521).

The proportional by chance accuracy criteria is 65.2% (1.25 x


52.1% = 65.2%).
Problem 1
In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a statistic?
Assume that there is no problem with missing data, outliers, or influential cases, and that the validation
analysis will confirm the generalizability of the results. Use a level of significance of 0.05 for evaluating the
statistical relationship.

The variables "age" [age], "sex" [sex], and "liberal or conservative political views" [polviews] were useful
predictors for distinguishing between groups based on responses to "seen x-rated movie in last year"
[xmovie]. These predictors differentiate survey respondents who have not seen an x-rated movie from
survey respondents who have seen an x-rated movie.

Survey respondents who were older were more likely to have not seen an x-rated movie. A one unit
increase in age increased the odds that survey respondents have not seen an x-rated movie by 3.9%.
Survey respondents who were female were approximately six and three quarters times more likely to have
not seen an x-rated movie. Survey respondents who were more conservative were more likely to have not
seen an x-rated movie. A one unit increase in liberal or conservative political views increased the odds that
survey respondents have not seen an x-rated movie by approximately one and a quarter times.

1. True
2. True with caution
3. False
4. Inappropriate application of a statistic
Dissecting problem 1 - 1
In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a statistic?
Assume that there is no problem with missing data, outliers, or influential cases, and that the validation
analysis will confirm the generalizability of the results. Use a level of significance of 0.05 for evaluating
the statistical relationship.

The variables "age" [age], "sex" [sex], and "liberal or conservative political views" [polviews] were useful
For these problems, we will
predictors for distinguishing between groups based on responses to "seen x-rated movie in last year"
assume that there is no problem
[xmovie]. These predictors differentiate survey respondents
with missing data,who have not
outliers, or seen an x-rated movie from
survey respondents who have seen an x-rated movie.cases, and that the
influential
validation analysis will confirm
Survey respondents who were older werethe generalizability of the
more likely to have not seen an x-rated movie. A one unit
results
increase in age increased the odds that survey respondents have not seen an x-rated movie by 3.9%.
Survey respondents who were female were Inapproximately
this problem, six
weand
arethree quarters times more likely to have
told to
not seen an x-rated movie. Survey respondents whoas
use 0.05 were more
alpha forconservative
the logistic were more likely to have not
seen an x-rated movie. A one unit increaseregression.
in liberal or conservative political views increased the odds that
survey respondents have not seen an x-rated movie by approximately one and a quarter times.

1. True
2. True with caution
3. False
4. Inappropriate application of a statistic
SW388R7
Data Analysis & Computers II
Dissecting problem 1 - 2
In theThe variables
dataset listed first
GSS2000.sav, in the
is the problem
following statement true, false, or an incorrect application of a statistic?
statement are the independent variables
Assume that there is no problem with missing data, outliers, or influential cases, and that the validation
(IVs): "age" [age], "sex" [sex], and "liberal
analysis will confirm the generalizability of the results. Use a level of significance of 0.05 for evaluating the
or conservative political views" [polviews].
statistical relationship.

The variables "age" [age], "sex" [sex], and "liberal or conservative political views" [polviews] were
useful predictors for distinguishing between groups based on responses to "seen x-rated movie in last
year" [xmovie]. These predictors differentiate survey respondents who have not seen an x-rated movie
from survey respondents who have seen an x-rated movie.

Survey respondents
The variable usedwho were older were more likely to have not seen an x-rated movie. A one unit
to define
increase in age increased the odds that survey respondents have not seen an x-rated movie by 3.9%.
groups is the dependent
variable
Survey (DV): "seen
respondents whox-rated
were female were approximately six and three quarters times more likely to have
movie in last year" [xmovie].
not seen an x-rated movie. Survey respondents who were more conservative were more likely to have not
When a problem states that a list of
seen an x-rated movie. A one unit increase in liberal or conservative political views increased the odds that
independent variables can distinguish
survey respondents have not seen an x-rated movie by approximately
among groupsoneandanddoes
a quarter times.
not identify
control variable or an order of
importance for the variables, we do a
logistic regression entering all of the
variables simultaneously.
SW388R7
Data Analysis & Computers II
Dissecting problem 1 - 3
SPSS logistic regression models the relationship by computing
the changes in the likelihood of falling in the category of the
dependent variable which had the highest numerical code.

In the dataset GSS2000.sav,


The responses is the following
to seeing an statement true, were
x-rated movie false, or an incorrect application of a statistic?
coded:
1= Yes and 2 = No.
Assume that there is no problem with missing data, outliers, or influential cases, and that the validation
analysis will confirm the generalizability of the results. Use a level of significance of 0.05 for evaluating the
The SPSS output will model the changes in the likelihood of
statistical relationship.
not seeing an x-rated movie because the code for No is 2.

The variables "age" [age], "sex" [sex], and "liberal or conservative political views" [polviews] were useful
predictors for distinguishing between groups based on responses to "seen x-rated movie in last year"
[xmovie]. These predictors differentiate survey respondents who have not seen an x-rated movie from
survey respondents who have seen an x-rated movie.

Survey respondents who were older were more likely to have not seen an x-rated movie. A one unit
increase in age increased the odds that survey respondents have not seen an x-rated movie by 3.9%.
Survey respondents who were female were approximately six and three quarters times more likely to
have not seen an x-rated movie. Survey respondents who were more conservative were more likely to
The statements of the specific relationships
have not seen an x-rated movie. A one unit increase in liberal or conservative political views increased
between independent variables and the
the odds that survey respondents have not
dependent seen anare
variable x-rated movie by
all phrased approximately one and a quarter
in terms
times. of impact on not seeing an x-rated movie. SW388R7
Data Analysis & Computers II
Dissecting problem 1 - 4
In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a statistic?
Assume that there is no problem with missing data, outliers, or influential cases, and that the validation
analysis will confirm the generalizability
Theof the results.
specific Use a level
relationships forofthe
significance of 0.05 for evaluating the
independent
statistical relationship. variables listed in the problem indicate the direction
of the relationship, increasing or decreasing the
The variables "age" [age], "sex" [sex],likelihood of or
and "liberal falling in the modeled
conservative politicalgroup,
views" and the were useful
[polviews]
predictors for distinguishing betweenamount of change
groups based in the odds
on responses associated
to "seen x-ratedwith a in last year"
movie
one-unit change in the independent variable.
[xmovie]. These predictors differentiate survey respondents who have not seen an x-rated movie from
survey respondents who have seen an x-rated movie.

Survey respondents who were older were more likely to have not seen an x-rated movie. A one unit
increase in age increased the odds that survey respondents have not seen an x-rated movie by 3.9%.
Survey respondents who were female were approximately six and three quarters times more likely to
have not seen an x-rated movie. Survey respondents who were more conservative were more likely to
have not seen an x-rated movie. A one unit increase in liberal or conservative political views increased
the odds that survey respondents have not seen an x-rated movie by approximately one and a quarter
times.

1. True In order for the logistic regression question to be


2. True with caution true, the overall relationship must be statistically
3. False significant, there must be no evidence of a flawed
numerical analysis, the classification accuracy
4. Inappropriate application of a statistic
rate must be substantially better than could be
obtained by chance alone, and each significant
relationship must be interpreted correctly. SW388R7
Data Analysis & Computers II
LEVEL OF MEASUREMENT - 1
In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a statistic?
Assume that there is no problem with missing data, outliers, or influential cases, and that the validation
analysis will confirm the generalizability of the results. Use a level of significance of 0.05 for evaluating the
statistical relationship.

The variables "age" [age], "sex" [sex], and "liberal or conservative political views" [polviews] were useful
predictors for distinguishing between groups based on responses to "seen x-rated movie in last year"
[xmovie]. These predictors differentiate survey respondents who have not seen an x-rated movie from
survey respondents who have seen an x-rated movie.

Survey respondents who were older were more likely to have not seen an x-rated movie. A one unit
Logistic
increase in age increased regression
the odds requires
that survey that the have
respondents dependent
not seen an x-rated movie by 3.9%.
variable be non-metric and the independent
Survey respondents who were female were approximately six and three quarters times more likely to have
variables
not seen an x-rated movie. be respondents
Survey metric or dichotomous. "seen
who were more x-
conservative were more likely to have not
rated movie in last year" [xmovie] is an
seen an x-rated movie. A one unit increase in liberal or conservative political views increased the odds that
dichotomous variable, which satisfies the level of
survey respondents have not seen an x-rated movie by approximately one and a quarter times.
measurement requirement.

1. True It contains two categories: survey respondents


2. True with cautionwho had seen an x-rated movie in the last year
3. False and survey respondents who had not seen an x-
rated movie in the last year.
4. Inappropriate application of a statistic

SW388R7
Data Analysis & Computers II
LEVEL OF MEASUREMENT - 2
"Age" [age] is an interval level "Sex" [sex] is a dichotomous
variable, which satisfies the level
In the dataset GSS2000.sav, is the following statement true, false, or an incorrectnominal
or dummy-coded application of a statistic?
of Assume that there
measurement is no problem
requirements forwith missing data, outliers, or influential
variable whichcases,
may andbethat the validation
analysis
logistic will confirm
regression the generalizability of the results. Use a level
analysis. of significance
included in logistic of 0.05 for evaluating the
regression.
statistical relationship.

The variables "age" [age], "sex" [sex], and "liberal or conservative political views" [polviews] were useful
predictors for distinguishing between groups based on responses to "seen x-rated movie in last year"
[xmovie]. These predictors differentiate survey respondents who have not seen an x-rated movie from
survey respondents who have seen an x-rated movie.

Survey respondents who were older were more likely to have not seen an x-rated movie. A one unit
"Liberal or
increase in age increased the odds that survey respondents haveconservative
not seen anpolitical views"by 3.9%.
x-rated movie
[polviews]
Survey respondents who were female were approximately is an
six and ordinal
three level times
quarters variable.
moreIflikely to have
not seen an x-rated movie. Survey respondents who we were more
follow theconservative
conventionwere more likely to have not
of treating
seen an x-rated movie. A one unit increase in liberalordinal
or conservative political
level variables asviews
metricincreased the odds that
survey respondents have not seen an x-rated movievariables,
by approximately
the levelone and a quarter times.
of measurement
requirement for logistic regression
1. True analysis is satisfied. Since some data
2. True with caution analysts do not agree with this
convention, a note of caution should be
3. False
included in our interpretation.
4. Inappropriate application of a statistic

SW388R7
Data Analysis & Computers II
Request simultaneous logistic regression

Select the Regression |


Binary Logistic…
command from the
Analyze menu.

SW388R7
Data Analysis & Computers II
Selecting the dependent variable

First, highlight the


dependent variable
xmovie in the list
of variables. Second, click on the right
arrow button to move the
dependent variable to the
Dependent text box.

SW388R7
Data Analysis & Computers II
Selecting the independent variables

Move the independent


variables listed in the
problem to the
Covariates list box.

SW388R7
Data Analysis & Computers II
Specifying the method for including variables

SPSS provides us with two methods for including


variables: to enter all of the independent variables
at one time, and a stepwise method for selecting
variables using a statistical test to determine the
order in which variables are included.

SPSS also supports the specification of "Blocks" of


variables for testing hierarchical models.

Since the problem


states that there is a
relationship without
requesting the best
predictors, we specify
Enter as the method for
including variables.
SW388R7
Data Analysis & Computers II
Completing the logistic regression request

Click on the OK
button to request
the output for the
logistic regression.

The logistic procedure supports the selection of subsets of


cases, automatic recoding of nominal variables, saving
diagnostic statistics like standardized residuals and Cook's
distance, and options for additional statistics. However,
none of these are needed for this analysis. SW388R7
Data Analysis & Computers II
Sample size – ratio of cases to variables
Case Processing Summary
a
Unweighted Cases N Percent
Selected Cases Included in Analysis 177 65.6
Missing Cases 93 34.4
Total 270 100.0
Unselected Cases 0 .0
Total 270 100.0
a. If weight is in effect, see classification table for the total
number of cases.

The minimum ratio of valid cases to independent


variables for logistic regression is 10 to 1, with a
preferred ratio of 20 to 1. In this analysis, there
are 177 valid cases and 3 independent
variables. The ratio of cases to independent
variables is 59.0 to 1, which satisfies the
minimum requirement. In addition, the ratio of
59.0 to 1 satisfies the preferred ratio of 20 to 1.

SW388R7
Data Analysis & Computers II
OVERALL RELATIONSHIP BETWEEN
INDEPENDENT AND DEPENDENT VARIABLES
Omnibus Tests of Model Coefficients

Chi-square df Sig.
Step 1 Step 39.668 3 .000
Block 39.668 3 .000
Model 39.668 3 .000

The presence of a relationship between the dependent


variable and combination of independent variables is
based on the statistical significance of the model chi-
square at step 1 after the independent variables have
been added to the analysis.

In this analysis, the probability of the model chi-square


(39.668) was <0.001, less than or equal to the level of
significance of 0.05. The null hypothesis that there is no
difference between the model with only a constant and
the model with independent variables was rejected. The
existence of a relationship between the independent
variables and the dependent variable was supported.

SW388R7
Data Analysis & Computers II
NUMERICAL PROBLEMS
Variables in the Equation

B S.E. Wald df Sig. Exp(B)


Step
a
AGE .038 .014 7.629 1 .006 1.039
1 SEX 1.901 .410 21.452 1 .000 6.689
POLVIEWS .306 .135 5.110 1 .024 1.358
Constant -4.590 1.045 19.302 1 .000 .010
a. Variable(s) entered on step 1: AGE, SEX, POLVIEWS.

Multicollinearity in the logistic regression solution is detected


by examining the standard errors for the b coefficients. A
standard error larger than 2.0 indicates numerical problems,
such as multicollinearity among the independent variables,
zero cells for a dummy-coded independent variable because
all of the subjects have the same value for the variable, and
'complete separation' whereby the two groups in the
dependent event variable can be perfectly separated by
scores on one of the independent variables. Analyses that
indicate numerical problems should not be interpreted.

None of the independent variables in this analysis had a


standard error larger than 2.0. (The check for standard
errors larger than 2.0 does not include the standard error for
the Constant.) SW388R7
Data Analysis & Computers II
RELATIONSHIP OF INDIVIDUAL INDEPENDENT
VARIABLES TO DEPENDENT VARIABLE - 1

The probability of the Wald statistic for the variable age


was 0.006, less than or equal to the level of significance
of 0.05. The null hypothesis that the b coefficient for age
was equal to zero was rejected. This supports the
relationship that "survey respondents who were older
were more likely to have not seen an x-rated movie."

Variables in the Equation

B S.E. Wald df Sig. Exp(B)


Step
a
AGE .038 .014 7.629 1 .006 1.039
1 SEX 1.901 .410 21.452 1 .000 6.689
POLVIEWS .306 .135 5.110 1 .024 1.358
Constant -4.590 1.045 19.302 1 .000 .010
a. Variable(s) entered on step 1: AGE, SEX, POLVIEWS.

The value of Exp(B) was 1.039 which implies that a


one unit increase in age increased the odds that
survey respondents have not seen an x-rated movie
by 3.9%. This confirms the statement of the amount
of change in the likelihood of belonging to the modeled
group of the dependent variable associated with a one
unit change in the independent variable, age.
SW388R7
Data Analysis & Computers II
RELATIONSHIP OF INDIVIDUAL INDEPENDENT
VARIABLES TO DEPENDENT VARIABLE - 2

The probability of the Wald statistic for the variable sex


was <0.001, less than or equal to the level of significance
of 0.05. The null hypothesis that the b coefficient for sex
was equal to zero was rejected. This supports the
relationship that "survey respondents who were female
were approximately six and three quarters times more
likely to have not seen an x-rated movie."

Variables in the Equation

B S.E. Wald df Sig. Exp(B)


Step
a
AGE .038 .014 7.629 1 .006 1.039
1 SEX 1.901 .410 21.452 1 .000 6.689
POLVIEWS .306 .135 5.110 1 .024 1.358
Constant -4.590 1.045 19.302 1 .000 .010
a. Variable(s) entered on step 1: AGE, SEX, POLVIEWS.

The value of Exp(B) was 6.689 which implies


that a one unit increase in sex increased the
odds by approximately six and three
quarters times that survey respondents
have not seen an x-rated movie.

SW388R7
Data Analysis & Computers II
RELATIONSHIP OF INDIVIDUAL INDEPENDENT
VARIABLES TO DEPENDENT VARIABLE - 3
The probability of the Wald statistic for the variable liberal or
conservative political views was 0.024, less than or equal to the
level of significance of 0.05. The null hypothesis that the b
coefficient for liberal or conservative political views was equal to
zero was rejected. This supports the relationship that "survey
respondents who were more conservative were more likely to have
not seen an x-rated movie." Liberal or conservative political views is
an ordinal variable that is coded so that higher numeric values are
associated with survey respondents who were more conservative.

Variables in the Equation

B S.E. Wald df Sig. Exp(B)


Step
a
AGE .038 .014 7.629 1 .006 1.039
1 SEX 1.901 .410 21.452 1 .000 6.689
POLVIEWS .306 .135 5.110 1 .024 1.358
Constant -4.590 1.045 19.302 1 .000 .010
a. Variable(s) entered on step 1: AGE, SEX, POLVIEWS.

The value of Exp(B) was 1.358 which implies that


a one unit increase in liberal or conservative
political views increased the odds that survey
respondents have not seen an x-rated movie by
approximately one and a quarter times. SW388R7
Data Analysis & Computers II
CLASSIFICATION USING THE LOGISTIC REGRESSION MODEL:
by chance accuracy rate

The independent variables could be characterized as useful


predictors distinguishing survey respondents who have not
seen an x-rated movie from survey respondents who have
seen an x-rated movie if the classification accuracy rate was
substantially higher than the accuracy attainable by chance
alone. Operationally, the classification accuracy rate should
be 25% or more higher than the proportional by chance
accuracy rate.

Classification Tablea,b

Predicted
SEEN X-RATED MOVIE
IN LAST YEAR Percentage
Observed YES NO Correct
Step 0 SEEN X-RATED MOVIE YES 0 45 .0
IN LAST YEAR NO 0 132 100.0
Overall Percentage 74.6
a. Constant is included in the model.
The proportional by chance accuracy rate was computed by first
b. The cut value is .500
calculating the proportion of cases for each group based on the number
of cases in each group in the classification table at Step 0. The
proportion in the "YES" group is 45/177 = 0.254. The proportion in the
"No" group is 132/177 = 0.746.

Then, we square and sum the proportion of cases in each group (0.254² SW388R7
Data Analysis
+ 0.746² = 0.621). 0.621 is the proportional by chance accuracy rate. & Computers II
CLASSIFICATION USING THE LOGISTIC REGRESSION MODEL:
criteria for classification accuracy

Classification Tablea

Predicted
SEEN X-RATED MOVIE
IN LAST YEAR Percentage
Observed YES NO Correct
Step 1 SEEN X-RATED MOVIE YES 19 26 42.2
IN LAST YEAR NO 9 123 93.2
Overall Percentage 80.2
a. The cut value is .500

The accuracy rate computed by SPSS was 80.2%


which was greater than or equal to the
proportional by chance accuracy criteria of
77.6% (1.25 x 62.1% = 77.6%).

The criteria for classification accuracy is


satisfied.

SW388R7
Data Analysis & Computers II
Answering the question in problem 1 - 1
In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a statistic?
Assume that there is no problem with missing data, outliers, or influential cases, and that the validation
analysis will confirm the generalizability of the results. Use a level of significance of 0.05 for evaluating the
statistical relationship.

The variables "age" [age], "sex" [sex], and "liberal or conservative political views" [polviews] were
useful predictors for distinguishing between groups based on responses to "seen x-rated movie in last
year" [xmovie]. These predictors differentiate survey respondents who have not seen an x-rated movie
from survey respondents who have seen an x-rated movie.

Survey respondents who were older were more likely to have not seen an x-rated movie. A one unit
increase in age increased the odds thatWe
survey
found respondents havesignificant
a statistically not seen an x-rated movie by 3.9%.
overall
Survey respondents who were female relationship
were approximately
between sixthe
andcombination
three quarters
of times more likely to have
not seen an x-rated movie. Survey respondents
independentwhovariables
were more conservative
and were more likely to have not
the dependent
variable.
seen an x-rated movie. A one unit increase in liberal or conservative political views increased the odds that
survey respondents have not seen an x-rated movie by approximately one and a quarter times.
There was no evidence of numerical problems in
the solution.

Moreover, the classification accuracy surpassed


the proportional by chance accuracy criteria,
supporting the utility of the model.
SW388R7
Data Analysis & Computers II
Answering the question in problem 1 - 2
In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a statistic?
Assume that there is no problem with missing data, outliers, or influential cases, and that the validation
analysis will confirm the generalizability of the results. Use a level of significance of 0.05 for evaluating the
statistical relationship. We verified that each statement about the
relationship between an independent variable and
The variables "age" [age], "sex" the
[sex],dependent variable
and "liberal was correct
or conservative in both
political views" [polviews] were useful
direction of the relationship and the
predictors for distinguishing between groups based on responses to "seen change in movie in last year"
x-rated
likelihood
[xmovie]. These predictors differentiate associated
survey with awho
respondents one-unit
have notchange
seen of
an the
x-rated movie from
independent variable.
survey respondents who have seen an x-rated movie.

Survey respondents who were older were more likely to have not seen an x-rated movie. A one unit
increase in age increased the odds that survey respondents have not seen an x-rated movie by 3.9%.
Survey respondents who were female were approximately six and three quarters times more likely to
have not seen an x-rated movie. Survey respondents who were more conservative were more likely to
have not seen an x-rated movie. A one unit increase in liberal or conservative political views increased
the odds that survey respondents have not seen an x-rated movie by approximately one and a quarter
times.

1. True
The answer to the question is true
2. True with caution with caution.
3. False
4. Inappropriate application of a statistic A caution is added because of the
inclusion of ordinal level variables.
SW388R7
Data Analysis & Computers II
Problem 2
In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a statistic?
Assume that there is no problem with missing data, outliers, or influential cases, and that the validation
analysis will confirm the generalizability of the results. Use a level of significance of 0.05 for evaluating the
statistical relationship.

After controlling for the effect of the variable "sex" [sex] on "should marijuana be made legal" [grass], the
variable "general happiness" [happy] and "confidence in the executive branch of the federal government"
[confed] were useful predictors for distinguishing between groups based on responses to "should
marijuana be made legal" [grass]. These predictors differentiate survey respondents who have been less
supportive that the use of marijuana should be made legal from survey respondents who have been more
supportive that the use of marijuana should be made legal.

Survey respondents who were less happy overall were less likely to have been less supportive that the use
of marijuana should be made legal. A one unit increase in general happiness decreased the odds that
survey respondents have been less supportive that the use of marijuana should be made legal by 66.9%.
Survey respondents who had less confidence in the executive branch of the federal government were less
likely to have been less supportive that the use of marijuana should be made legal. A one unit increase in
confidence in the executive branch of the federal government decreased the odds that survey respondents
have been less supportive that the use of marijuana should be made legal by 42.8%.

1. True
2. True with caution
3. False
4. Inappropriate application of a statistic

SW388R7
Data Analysis & Computers II
Dissecting problem 2 - 1
In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a statistic?
Assume that there is no problem with missing data, outliers, or influential cases, and that the validation
analysis will confirm the generalizability of the results. Use a level of significance of 0.05 for evaluating
the statistical relationship.

After controlling for the effect of the variable "sex" [sex] on "should marijuana be made legal" [grass], the
variable "general happiness" [happy] and For "confidence in the executive
these problems, we willbranch of the federal government"
[confed] were useful predictors for distinguishing between groups based on responses to "should
assume that there is no problem
marijuana be made legal" [grass]. These predictors differentiate survey respondents who have been less
with missing data, outliers, or
supportive that the use of marijuana should be made legal from survey respondents who have been more
supportive that the use of marijuana should influential
be madecases,
legal. and that the
validation analysis will confirm
the generalizability of the
Survey respondents who were less happy results
overall were less likely to have been less supportive that the use
of marijuana should be made legal. A one unit increase in general happiness decreased the odds that
survey respondents have been less supportive that the use of marijuana should be made legal by 66.9%.
Survey respondents who had less confidence In this problem,
in the webranch
executive are told to federal government were less
of the
use 0.05 as alpha for the logistic
likely to have been less supportive that the use of marijuana should be made legal. A one unit increase in
confidence in the executive branch of the regression.
federal government decreased the odds that survey respondents
have been less supportive that the use of marijuana should be made legal by 42.8%.

1. True
2. True with caution
3. False
4. Inappropriate application of a statistic

SW388R7
Data Analysis & Computers II
Dissecting problem 2 - 2
The variables listed first in the problem statement are
the independent variables (IVs): "sex" [sex] , "general
happiness" [happy], and "confidence in the executive
branch of the federal government" [confed].

Sex is a control variable and general happiness and


In the dataset GSS2000.sav,
confidence is the following
in the executive statement
branchy true, false, or an incorrect application of a statistic?
are predictors.
Assume that there is no problem with missing data, outliers, or influential cases, and that the validation
analysis will confirm the generalizability of the results. Use a level of significance of 0.05 for evaluating the
statistical relationship.

After controlling for the effect of the variable "sex" [sex] on "should marijuana be made legal" [grass],
the variable "general happiness" [happy] and "confidence in the executive branch of the federal
government" [confed] were useful predictors for distinguishing between groups based on responses to
"should marijuana be made legal" [grass]. These predictors differentiate survey respondents who have
been less supportive that the use of marijuana should be made legal from survey respondents who have
been more supportive that the use of marijuana should be made legal.
The variable used to define groups
Survey respondents who were less happy overall were less likely to have been less supportive that the use
is the dependent variable (DV):
of marijuana should be made legal. A one unit increase in general happiness decreased the odds that
"should marijuana be
survey respondents havemade
beenlegal"
less supportive that the use of marijuana should be made legal by 66.9%.
[grass].
Survey respondents who had less confidence in the executive branch of the federal government were less
When
likely to have been less supportive that the use of marijuana a problem
should be madeidentifies control
legal. A one unit increase in
confidence in the executive branch of the federal government decreased the odds that survey respondents
variables, we do a hierarchical
have been less supportive that the use of marijuana should be made
logistic legal by 42.8%.
regression entering the
variables in SPSS blocks.
SW388R7
Data Analysis & Computers II
Dissecting problem 2 - 3
SPSS logistic regression models the relationship by computing
the changes in the likelihood of falling in the category of the
dependent variable which had the highest numerical code.

The responses to seeing an x-rated movie were coded:


In the1=dataset
Legal GSS2000.sav,
and 2 = Not isLegal.
the following statement true, false, or an incorrect application of a statistic?
Assume that there is no problem with missing data, outliers, or influential cases, and that the validation
analysis
Thewill confirm
SPSS thewill
output generalizability of the results.
model the changes in the Use a level of
likelihood of significance of 0.05 for evaluating the
statistical relationship.
being less supportive of legalizing marijuana because 2
corresponds to not legalizing marijuana.
After controlling for the effect of the variable "sex" [sex] on "should marijuana be made legal" [grass], the
variable "general happiness" [happy] and "confidence in the executive branch of the federal government"
[confed] were useful predictors for distinguishing between groups based on responses to "should
marijuana be made legal" [grass]. These predictors differentiate survey respondents who have been less
supportive that the use of marijuana should be made legal from survey respondents who have been
more supportive that the use of marijuana should be made legal.

Survey respondents who were less happy overall were less likely to have been less supportive that the use
of marijuana should be made legal. A one unit increase in general happiness decreased the odds that
survey respondents have been less supportive that the use of marijuana should be made legal by 66.9%.
Survey respondents who had less confidence in the executive branch of the federal government were less
likely to have been less supportive that the use of marijuana should be made legal. A one unit increase in
confidence in the executive
The branch of theoffederal
statements government
the specific decreased
relationships the odds that survey respondents
between
have been less supportive that the use of marijuana should be made legal
independent variables and the dependent variable by 42.8%.
are all
phrased in terms of impact on being less supportive of SW388R7
legalizing marijuana. Data Analysis & Computers II
Dissecting problem 2 - 4
In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a statistic?
Assume that there is no problem with missing data, outliers, or influential cases, and that the validation
analysis will confirm the generalizability of the results. Use a level of significance of 0.05 for evaluating the
statistical relationship.
The specific relationships for the independent
After controlling for the effect of thevariables
variable "sex" [sex]
listed on "should
in the problemmarijuana be made
indicate the legal" [grass], the
direction
variable "general happiness" [happy] and "confidence in the executive branch of the federal government"
of the relationship, increasing or decreasing the
[confed] were useful predictors for distinguishing
likelihood of between
falling ingroups based on
the modeled responses
group, to "should
and the
marijuana be made legal" [grass]. Theseamount of change in the odds associated with a have been less
predictors differentiate survey respondents who
supportive that the use of marijuanaone-unit
should bechange
made legal from
in the survey respondents
independent variable. who have been more
supportive that the use of marijuana should be made legal.

Survey respondents who were less happy overall were less likely to have been less supportive that the
use of marijuana should be made legal. A one unit increase in general happiness decreased the odds
that survey respondents have been less supportive that the use of marijuana should be made legal by
66.9%. Survey respondents who had less confidence in the executive branch of the federal government
were less likely to have been less supportive that the use of marijuana should be made legal. A one unit
increase in confidence in the executive branch of the federal government decreased the odds that
survey respondents have been less supportive that the use of marijuana should be made legal by 42.8%.

1. True
In order for the logistic regression question to be true, the
2. True with caution
relationship between the predictors and the dependent variable
3. False must be statistically significant after entering the control variables
4. Inappropriate application of a statistic
in a previous stage, there must be no evidence of a flawed
numerical analysis, the classification accuracy rate must be
substantially better than could be obtained by chance alone, and
each significant relationship must be interpreted correctly. SW388R7
Data Analysis & Computers II
LEVEL OF MEASUREMENT - 1
In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a statistic?
Assume that there is no problem with missing data, outliers, or influential cases, and that the validation
analysis will confirm the generalizability of the results. Use a level of significance of 0.05 for evaluating the
statistical relationship.

After controlling for the effect of the variable "sex" [sex] on "should marijuana be made legal" [grass], the
variable "general happiness" [happy] and "confidence in the executive branch of the federal government"
[confed] were useful predictors for distinguishing between groups based on responses to "should
marijuana be made legal" [grass]. These predictors differentiate survey respondents who have been less
supportive that the use of marijuana should be made legal from survey respondents who have been
more supportive that the use of marijuana should be made legal.

Survey respondents who were less happy overall were less likely to have been less supportive that the use
of marijuana should be made legal. A one unit increase in general happiness decreased the odds that
survey respondents have regression
Logistic been less supportive that the use
analysis requires thatofthe
marijuana
dependentshould be made legal by 66.9%.
Survey respondents who had
variable less confidence
be dichotomous in the executive
and independentbranch of the federal government were less
variables
likely to have been
be less supportive
metric that the use
or dichotomous. of marijuana
"Should should
marijuana bebe made legal. A one unit increase in
made
confidence in thelegal"
executive branch of the federal government decreased
[grass] is a dichotomous variable, which satisfies the odds that survey respondents
have been less supportive that
the level of the use of marijuana
measurement shouldfor
requirement bethe
made legal by 42.8%.
dependent
variable.
1. True
2. It contains two categories:
True with caution
3. False •survey respondents who have been less supportive that
4. Inappropriatethe use of marijuana
application should be made legal
of a statistic
•survey respondents who have been more supportive that
the use of marijuana should be made legal
SW388R7
Data Analysis & Computers II
LEVEL OF MEASUREMENT - 2
"Sex" [sex] is a dichotomous or
In the datasetdummy-coded
GSS2000.sav, isnominal variable which
the following statement true, false, or an incorrect application of a statistic?
may be included in logistic
Assume that there is no problem with missing regression
data, outliers, or influential cases, and that the validation
analysis.
analysis will confirm the generalizability of the results. Use a level of significance of 0.05 for evaluating the
statistical relationship.

After controlling for the effect of the variable "sex" [sex] on "should marijuana be made legal" [grass],
the variable "general happiness" [happy] and "confidence in the executive branch of the federal
government" [confed] were useful predictors for distinguishing between groups based on responses to
"should marijuana be made legal" [grass]. These predictors differentiate survey respondents who have
been less supportive that the use of marijuana should be made legal from survey respondents who have
been more supportive that the use of marijuana should be made legal.

Survey respondents who were less"General


happy overall were less
happiness" likely toand
[happy] have been less supportive
"confidence in the that the use
of marijuana should be made legal.executive
A one unitbranch
increase ofin general
the federalhappiness
government"decreased the odds that
survey respondents have been less[confed]
supportive that
are the use
ordinal of marijuana
level variables. should be made
If we follow thelegal by 66.9%.
Survey respondents who had less confidence
conventioninof the executive
treating branch
ordinal of the
level federalas
variables government were less
likely to have been less supportive metric
that thevariables,
use of marijuana should be made
the level of measurement legal. A one unit increase in
confidence in the executive branchrequirement
of the federalfor
government decreased the
logistic regression analysis isodds that survey respondents
have been less supportive that the satisfied.
use of marijuana
Since someshould be made
data analystslegaldobynot
42.8%.
agree with
this convention, a note of caution should be included
in our interpretation.

SW388R7
Data Analysis & Computers II
Request hierarchical logistic regression

Select the Regression |


Binary Logistic…
command from the
Analyze menu.

SW388R7
Data Analysis & Computers II
Selecting the dependent variable
First, highlight the
dependent variable
grass in the list of
variables.

Second, click on the right


arrow button to move the
dependent variable to the
Dependent text box.

SW388R7
Data Analysis & Computers II
Selecting the control independent variables

Second, click on the


First, move the control Next button to add the
independent variable, new block that will
sex, listed in the contain the predictors.
problem to the
Covariates list box.

SW388R7
Data Analysis & Computers II
Adding the predictor independent variables

First, move the


predictors to the
Covariates list box.

SW388R7
Data Analysis & Computers II
Specifying the method for including variables

In our hierarchical
regression, we will specify
that all of the variables in
each block be entered
simultaneously when the
block is entered.

SW388R7
Data Analysis & Computers II
Completing the logistic regression request

Click on the OK
button to request
the output for the
logistic regression.

The logistic procedure supports the selection of subsets of


cases, automatic recoding of nominal variables, saving
diagnostic statistics like standardized residuals and Cook's
distance, and options for additional statistics. However,
none of these are needed for this analysis. SW388R7
Data Analysis & Computers II
Sample size – ratio of cases to variables
Case Processing Summary
a
Unweighted Cases N Percent
Selected Cases Included in Analysis 163 60.4
Missing Cases 107 39.6
Total 270 100.0
Unselected Cases 0 .0
Total 270 100.0
a. If weight is in effect, see classification table for the total
number of cases.

The minimum ratio of valid cases to independent


variables for logistic regression is 10 to 1, with a
preferred ratio of 20 to 1. In this analysis, there
are 163 valid cases and 3 independent
variables. The ratio of cases to independent
variables is 54.33 to 1, which satisfies the
minimum requirement. In addition, the ratio of
54.33 to 1 satisfies the preferred ratio of 20 to
1.

SW388R7
Data Analysis & Computers II
OVERALL RELATIONSHIP BETWEEN
INDEPENDENT AND DEPENDENT VARIABLES

In a hierarchical logistic regression, the presence of a relationship


between the dependent variable and combination of independent
variables entered after the control variables have been included is
based on the statistical significance of the block chi-square for the
second block of variables in which the predictor independent
variables are included.

In this analysis, the probability of the block chi-square (17.467)


was <0.001, less than or equal to the level of significance of
0.05. The null hypothesis that there is no difference between the
model with only a constant and the control variables versus the
model with the predictor independent variables was rejected. The
contribution of the relationship between the predictor independent
variables and the dependent variable was supported. SW388R7
Data Analysis & Computers II
NUMERICAL PROBLEMS
Variables in the Equation

B S.E. Wald df Sig. Exp(B)


Step
a
SEX .154 .351 .194 1 .660 1.167
1 HAPPY -1.104 .354 9.739 1 .002 .331
CONFED -.559 .270 4.290 1 .038 .572
Constant 3.721 1.066 12.195 1 .000 41.308
a. Variable(s) entered on step 1: HAPPY, CONFED.

Multicollinearity in the logistic regression solution is detected


by examining the standard errors for the b coefficients. A
standard error larger than 2.0 indicates numerical problems,
such as multicollinearity among the independent variables,
zero cells for a dummy-coded independent variable because
all of the subjects have the same value for the variable, and
'complete separation' whereby the two groups in the
dependent event variable can be perfectly separated by
scores on one of the independent variables. Analyses that
indicate numerical problems should not be interpreted.

None of the independent variables in this analysis had a


standard error larger than 2.0. (The check for standard
errors larger than 2.0 does not include the standard error for
the Constant.) SW388R7
Data Analysis & Computers II
RELATIONSHIP OF INDIVIDUAL INDEPENDENT
VARIABLES TO DEPENDENT VARIABLE - 1

The probability of the Wald statistic for the variable general


happiness was 0.002, less than or equal to the level of
significance of 0.05. The null hypothesis that the b coefficient
for general happiness was equal to zero was rejected. This
supports the relationship that "survey respondents who were
less happy overall were less likely to have been less supportive
that the use of marijuana should be made legal." General
happiness is an ordinal variable that is coded so that lower
numeric values are associated with survey respondents who
were happier overall.

Variables in the Equation

B S.E. Wald df Sig. Exp(B)


Step
a
SEX .154 .351 .194 1 .660 1.167
1 HAPPY -1.104 .354 9.739 1 .002 .331
CONFED -.559 .270 4.290 1 .038 .572
Constant 3.721 1.066 12.195 1 .000 41.308
a. Variable(s) entered on step 1: HAPPY, CONFED.

The value of Exp(B) was 0.331 which implies that a


one unit increase in general happiness decreased the
odds that survey respondents have been less
supportive that the use of marijuana should be made SW388R7
legal by 66.9%. Data Analysis & Computers II
RELATIONSHIP OF INDIVIDUAL INDEPENDENT
VARIABLES TO DEPENDENT VARIABLE - 2

The probability of the Wald statistic for the variable confidence in the
executive branch of the federal government was 0.038, less than or
equal to the level of significance of 0.05. The null hypothesis that the
b coefficient for confidence in the executive branch of the federal
government was equal to zero was rejected. This supports the
relationship that "survey respondents who had less confidence in the
executive branch of the federal government were less likely to have
been less supportive that the use of marijuana should be made legal."
Confidence in the executive branch of the federal government is an
ordinal variable that is coded so that lower numeric values are
associated with survey respondents who had more confidence in the
executive branch of the federal government.
Variables in the Equation

B S.E. Wald df Sig. Exp(B)


Step
a
SEX .154 .351 .194 1 .660 1.167
1 HAPPY -1.104 .354 9.739 1 .002 .331
CONFED -.559 .270 4.290 1 .038 .572
Constant 3.721 1.066 12.195 1 .000 41.308
a. Variable(s) entered on step 1: HAPPY, CONFED.

The value of Exp(B) was 0.572 which implies


that a one unit increase in confidence in the
executive branch of the federal government
decreased the odds that survey respondents
have been less supportive that the use of SW388R7
Data Analysis & Computers II
marijuana should be made legal by 42.8%.
CLASSIFICATION USING THE LOGISTIC REGRESSION MODEL:
by chance accuracy rate

The independent variables could be characterized as useful


predictors distinguishing survey respondents who have been
less supportive that the use of marijuana should be made
legal from survey respondents who have been more
supportive that the use of marijuana should be made legal if
the classification accuracy rate was substantially higher than
the accuracy attainable by chance alone. Operationally, the
classification accuracy rate should be 25% or more higher
than the proportional by chance accuracy rate.

Classification Tablea,b

Predicted
SHOULD MARIJUANA BE
MADE LEGAL Percentage
Observed LEGAL NOT LEGAL Correct
Step 0 SHOULD MARIJUANA LEGAL 0 57 .0
BE MADE LEGAL NOT LEGAL 0 106 100.0
Overall Percentage 65.0
a. Constant is included in the model.
b. The cut value
The is .500 by chance accuracy rate was computed by
proportional
calculating the proportion of cases for each group based on
the number of cases in each group in the classification table
at Step 0, and then squaring and summing the proportion of
cases in each group (0.350² + 0.650² = 0.545). SW388R7
Data Analysis & Computers II
CLASSIFICATION USING THE LOGISTIC REGRESSION MODEL:
criteria for classification accuracy

Classification Tablea

Predicted
SHOULD MARIJUANA BE
MADE LEGAL Percentage
Observed LEGAL NOT LEGAL Correct
Step 1 SHOULD MARIJUANA LEGAL 18 39 31.6
BE MADE LEGAL NOT LEGAL 13 93 87.7
Overall Percentage 68.1
a. The cut value is .500

The accuracy rate computed by SPSS was 68.1%


which was greater than or equal to the
proportional by chance accuracy criteria of
68.1% (1.25 x 54.5% = 68.1%).

The criteria for classification accuracy is


satisfied.

SW388R7
Data Analysis & Computers II
Answering the question in problem 2 - 1
In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a statistic?
Assume that there is no problem with missing data, outliers, or influential cases, and that the validation
analysis will confirm the generalizability of the results. Use a level of significance of 0.05 for evaluating the
statistical relationship.

After controlling for the effect of the variable "sex" [sex] on "should marijuana be made legal" [grass], the
variable "general happiness" [happy] and "confidence in the executive branch of the federal government"
[confed] were useful predictors for distinguishing between groups based on responses to "should
marijuana be made legal" [grass]. These predictors differentiate survey respondents who have been less
supportive that the use of marijuana should be made legal from survey respondents who have been more
supportive that the use of marijuana should be made legal.

Survey respondents who were less happy overall were less likely to have been less supportive that the use
of marijuana should be made legal. A one unit increase in general happiness decreased the odds that
survey respondents have been less supportive that the use of marijuana should be made legal by 66.9%.
Survey respondents who had less confidence
We foundin the executive branch
a statistically of theoverall
significant federal government were less
likely to have been less supportive that the use of marijuana should be made legal. A one unit increase in
relationship between the predictor independent
confidence in the executive branch of variables
the federaland
government decreased
the dependent the odds that survey respondents
variable.
have been less supportive that the use of marijuana should be made legal by 42.8%.
There was no evidence of numerical problems in
1. True the solution.
2. True with caution
Moreover, the classification accuracy surpassed
3. False
the proportional by chance accuracy criteria,
4. Inappropriate application of a statistic
supporting the utility of the model.
SW388R7
Data Analysis & Computers II
Answering the question in problem 2 - 2
In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a statistic?
Assume that there is no problem with missing data, outliers, or influential cases, and that the validation
analysis will confirm the generalizability of the results. Use a level of significance of 0.05 for evaluating the
statistical relationship.
We verified that each statement about the
relationship between an independent variable and
After controlling for the effect ofthe
thedependent
variable "sex" [sex] on "should marijuana be made legal" [grass], the
variable was correct in both
variable "general happiness" [happy] and "confidence in the executive branch of the federal government"
direction of the relationship and the change in
[confed] were useful predictors for distinguishing between groups based on responses to "should
marijuana be made legal" [grass]. likelihood associated
These predictors with a one-unit
differentiate change of the
survey respondents who have been less
independent variable.
supportive that the use of marijuana should be made legal from survey respondents who have been more
supportive that the use of marijuana should be made legal.

Survey respondents who were less happy overall were less likely to have been less supportive that the use
of marijuana should be made legal. A one unit increase in general happiness decreased the odds that
survey respondents have been less supportive that the use of marijuana should be made legal by 66.9%.
Survey respondents who had less confidence in the executive branch of the federal government were less
likely to have been less supportive that the use of marijuana should be made legal. A one unit increase in
confidence in the executive branch of the federal government decreased the odds that survey respondents
have been less supportive that the use of marijuana should be made legal by 42.8%.

1. True
2. True with caution The answer to the question is true
3. False with caution.
4. Inappropriate application of a statistic
A caution is added because of the
inclusion of ordinal level variables.
SW388R7
Data Analysis & Computers II
Problem 3
In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a statistic?
Assume that there is no problem with missing data, outliers, or influential cases, and that the validation
analysis will confirm the generalizability of the results. Use a level of significance of 0.05 for evaluating the
statistical relationship.

From the list of variables "highest academic degree" [degree], "total family income" [income98], and
"satisfaction with financial situation" [satfin], the most useful predictor for distinguishing between groups
based on responses to "expect u.s. in world war in 10 years" [uswary] was "total family income"
[income98]. These predictors differentiate survey respondents who have been less positive that the United
States would fight in another world war within the next ten years from survey respondents who have been
more positive that the United States would fight in another world war within the next ten years.

The most important predictor for identifying survey respondents who have been less positive that the
United States would fight in another world war within the next ten years was total family income.

Survey respondents who had higher total family incomes were more likely to have been less positive that
the United States would fight in another world war within the next ten years. A one unit increase in total
family income increased the odds that survey respondents have been less positive that the United States
would fight in another world war within the next ten years by 10.0%.

1. True
2. True with caution
3. False
4. Inappropriate application of a statistic

SW388R7
Data Analysis & Computers II
Dissecting Problem 3 - 1
In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a statistic?
Assume that there is no problem with missing data, outliers, or influential cases, and that the validation
analysis will confirm the generalizability of the results. Use a level of significance of 0.05 for evaluating
the statistical relationship.

From the list of variables "highest academic degree" [degree], "total family income" [income98], and
"satisfaction with financial situation" [satfin],
For the most
these useful predictor
problems, we will for distinguishing between groups
based on responses to "expect u.s. in world war in 10 years" [uswary] was "total family income"
assume that there is no problem
[income98]. These predictors differentiate survey respondents who have been less positive that the United
with missing data, outliers, or
States would fight in another world war within the next ten years from survey respondents who have been
more positive that the United States would influential cases, world
fight in another and that
warthe
within the next ten years.
validation analysis will confirm
the generalizability of the
The most important predictor for identifying survey respondents who have been less positive that the
results
United States would fight in another world war within the next ten years was total family income.
In this problem, we are told to
Survey respondents who had higher total use
family incomes
0.05 wereformore
as alpha the likely to have been less positive that
logistic
the United States would fight in another world war within
regression. the next ten years. A one unit increase in total
family income increased the odds that survey respondents have been less positive that the United States
would fight in another world war within the next ten years by 10.0%.

1. True
2. True with caution
3. False
4. Inappropriate application of a statistic

SW388R7
Data Analysis & Computers II
Dissecting Problem 3 - 2
The variables listed first in the The variable used to
problem statement are the define groups is the
independent variables (IVs): "highest dependent variable (DV):
academic degree" [degree], "total "expect u.s. in world war
family income" [income98], and inincorrect
10 years" [uswary].of a statistic?
In the dataset GSS2000.sav, is the following statement true, false, or an application
"satisfaction with financial situation"
Assume that there is no problem with missing data, outliers, or influential cases, and that the validation
analysis[satfin].
will confirm the generalizability of the results. Use a level of significance of 0.05 for evaluating the
statistical relationship.

From the list of variables "highest academic degree" [degree], "total family income" [income98], and
"satisfaction with financial situation" [satfin], the most useful predictor for distinguishing between
groups based on responses to "expect u.s. in world war in 10 years" [uswary] was "total family income"
[income98]. These predictors differentiate survey respondents who have been less positive that the
United States would fight in another world war within the next ten years from survey respondents who
have been more positive that the United States would fight in another world war within the next ten years.

The most important predictor for identifying survey respondents who have been less positive that the
United States would fight in another world war within the next ten years was total family income.

Survey respondents who had higher total family incomes were more likely to have been less positive that
the United States would fight in another world war within the nextSince
ten years. A one unit
the problem increase in total
identifies
family income increased the odds that survey respondents have beenthe less
mostpositive
usefulthat
of the United States
would fight in another world war within the next ten years by 10.0%.
important predictor, we do
a stepwise logistic
regression. SW388R7
Data Analysis & Computers II
Dissecting Problem 3 - 3
In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a statistic?
Assume that there is no problem with missing data, outliers, or influential cases, and that the validation
analysis will confirm the generalizability of the results. Use a level of significance of 0.05 for evaluating the
statistical relationship.

From the list of variables "highest academic degree" [degree], "total family income" [income98], and
"satisfaction with financial situation" [satfin], the most useful predictor for distinguishing between groups
based on responses to "expect u.s. in world war in 10 years" [uswary] was "total family income"
[income98]. These predictors differentiate survey respondents who have been less positive that the
United States would fight in another world war within the next ten years from survey respondents who
have been more positive that the United States would fight in another world war within the next ten
years.

The most important predictor for identifying survey respondents who have been less positive that the
United States would fight in another world war within the next ten years was total family income.

Survey respondents whologistic


SPSS had higher total family
regression incomes
models were more likely
the relationship to have been
by computing the less positive that
the United Stateschanges
would fight in another
in the world
likelihood of war within
falling thecategory
in the next ten years.
of the A one unit increase in total
family income increased the odds that survey respondents have been
dependent variable which had the highest numerical code.less positive that the United States
would fight in another world war within the next ten years by 10.0%.
The responses to “expect u.s. in world war in 10 years” were
coded: 1= Yes and 2 = No.

The SPSS output will model the changes in the likelihood of being
less positive that the United States would fight in another world
war within the next ten years.
SW388R7
Data Analysis & Computers II
Dissecting Problem 3 - 4
In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a statistic?
Assume that there is no problem with missing data, outliers, or influential cases, and that the validation
analysis will confirm the generalizability of the results. Use a level of significance of 0.05 for evaluating the
statistical relationship. The statements of the specific
relationships between independent
variables and the dependent variable are
From the list of variables "highest academic degree" [degree], "total family income" [income98], and
allmost
"satisfaction with financial situation" [satfin], the phrased in predictor
useful terms of for
impact on being between groups
distinguishing
less positive that the United States
based on responses to "expect u.s. in world war in 10 years" [uswary] was "total family would
income"
fight in another world war within the next
[income98]. These predictors differentiate survey respondents who have been less positive that the United
States would fight in another world war within the ten next
years.
ten years from survey respondents who have been
more positive that the United States would fight in another world war within the next ten years.

The most important predictor for identifying survey respondents who have been less positive that the
United States would fight in another world war within the next ten years was total family income.

Survey respondents who had higher total family incomes were more likely to have been less positive
that the United States would fight in another world war within the next ten years. A one unit increase in
total family income increased the odds that survey respondents have been less positive that the United
States would fight in another world war within the next ten years by 10.0%.

SW388R7
Data Analysis & Computers II
Dissecting Problem 3 - 5
From the list of variables "highest academic degree" [degree], "total family income" [income98], and
"satisfaction with financial situation" [satfin], the most useful predictor for distinguishing between groups
based on responses to "expect u.s. in world war in 10 years" [uswary] was "total family income"
[income98]. These predictors differentiate survey respondents who have been less positive that the United
States would fight in another world warThewithin
specifictherelationships for
next ten years the survey
from independent
respondents who have been
variables listed in the problem indicate
more positive that the United States would fight in another world war within the thenext
direction
ten years.
of the relationship, increasing or decreasing the
likelihood of falling in the modeled group, and the
The most important predictor for identifying survey respondents who have been less positive that the
United States would fight in another amount of within
world war change associated
the with was
next ten years a one-unit
total family income.
change in the independent variable.
Survey respondents who had higher total family incomes were more likely to have been less positive
that the United States would fight in another world war within the next ten years. A one unit increase in
total family income increased the odds that survey respondents have been less positive that the United
States would fight in another world war within the next ten years by 10.0%.

1. True
2. True with caution
3. False
4. Inappropriate application of a statistic
In order for the logistic regression question to be true, the
relationship between the predictors selected for inclusion and the
dependent variable must be statistically significant, there must be
no evidence of a flawed numerical analysis, the classification
accuracy rate must be substantially better than could be obtained
by chance alone, and the order of entry and each significant
relationship must be interpreted correctly. SW388R7
Data Analysis & Computers II
LEVEL OF MEASUREMENT - 1
In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a statistic?
Assume that there is no problem with missing data, outliers, or influential cases, and that the validation
analysis will confirm the generalizability of the results. Use a level of significance of 0.05 for evaluating the
statistical relationship.

From the list of variables "highest academic degree" [degree], "total family income" [income98], and
"satisfaction with financial situation" [satfin], the most useful predictor for distinguishing between groups
based on responses to "expect u.s. in world war in 10 years" [uswary] was "total family income"
[income98]. These predictors differentiate survey respondents who have been less positive that the United
States would fight in another world war within the next ten years from survey respondents who have been
more positive that the United States would fight in another world war within the next ten years.

The most important predictor for identifying survey respondents who have been less positive that the
United States would fight in another world war within the next ten years was total family income.
Logistic regression analysis requires that the dependent variable
Survey respondents who had higher total family incomes were more likely to have been less positive that
the United Statesbewould
dichotomous and the
fight in another independent
world war withinvariables
the next be
tenmetric orone unit increase in total
years. A
dichotomous.
family income increased "Expect
the odds u.s. inrespondents
that survey world war in 10 been
have years" [uswary]
less positive is a the United States
that
dichotomous variable, which satisfies the level
would fight in another world war within the next ten years by 10.0%. of measurement
requirement for the dependent variable.
1. True
2. True with caution
It contains two categories:
3. False survey respondents who have been less positive that the United
4. InappropriateStates
application
wouldoffight
a statistic
in another world war within the next ten years
survey respondents who have been more positive that the United
States would fight in another world war within the next ten years.
SW388R7
Data Analysis & Computers II
LEVEL OF MEASUREMENT - 2
"Highest academic degree" [degree], "total family
income" [income98], and "satisfaction with financial
situation" [satfin] are ordinal level variables. If we
follow the convention of treating ordinal level
variables as metric variables, the level of
measurement requirement for logistic regression
In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a statistic?
Assume that there is no problem withanalysis is satisfied.
missing Since
data, outliers, orsome datacases,
influential analysts
anddo not
that the validation
agree with this convention, a note of caution should
analysis will confirm the generalizability of the results. Use a level of significance of 0.05 for evaluating the
statistical relationship. be included in our interpretation.

From the list of variables "highest academic degree" [degree], "total family income" [income98], and
"satisfaction with financial situation" [satfin], the most useful predictor for distinguishing between groups
based on responses to "expect u.s. in world war in 10 years" [uswary] was "total family income"
[income98]. These predictors differentiate survey respondents who have been less positive that the United
States would fight in another world war within the next ten years from survey respondents who have been
more positive that the United States would fight in another world war within the next ten years.

The most important predictor for identifying survey respondents who have been less positive that the
United States would fight in another world war within the next ten years was total family income.

Survey respondents who had higher total family incomes were more likely to have been less positive that
the United States would fight in another world war within the next ten years. A one unit increase in total
family income increased the odds that survey respondents have been less positive that the United States
would fight in another world war within the next ten years by 10.0%.
SW388R7
Data Analysis & Computers II
Request stepwise logistic regression

Select the Regression |


Binary Logistic…
command from the
Analyze menu.

SW388R7
Data Analysis & Computers II
Selecting the dependent variable

First, highlight the


dependent variable
uswary in the list
of variables.
Second, click on the right
arrow button to move the
dependent variable to the
Dependent text box.

SW388R7
Data Analysis & Computers II
Adding the independent variables

First, move the


predictors to the
Covariates list box.

SW388R7
Data Analysis & Computers II
Specifying the method for including variables

In our stepwise logistic


regression, we specify
the Forward
Conditional method for
adding variables.

SW388R7
Data Analysis & Computers II
Adding options to the output

To add a summary of steps


at the end of the analysis
and specifications for
stepwise method, click on
the Options… button. SW388R7
Data Analysis & Computers II
Including a summary of steps

To obtain a summary of the steps


on which variables were added or
removed from the analysis, mark
the option button At last step in
the Display panel.

SW388R7
Data Analysis & Computers II
Specifications for stepwise method

Click on the
Continue button to
close the dialog box.

We can change the criteria for adding and


removing variables from the analysis by
changing the probability for entry and removal.
We will use the default level of significance of
0.05 for entry and 0.10 for removal.

SW388R7
Data Analysis & Computers II
Completing the logistic regression request

Click on the OK
button to request
the output for the
logistic regression.

SW388R7
Data Analysis & Computers II
Sample size – ratio of cases to variables
Case Processing Summary
a
Unweighted Cases N Percent
Selected Cases Included in Analysis 136 50.4
Missing Cases 134 49.6
Total 270 100.0
Unselected Cases 0 .0
Total 270 100.0
a. If weight is in effect, see classification table for the total
number of cases.

The minimum ratio of valid cases to independent


variables for stepwise logistic regression is 10 to
1, with a preferred ratio of 50 to 1. In this
analysis, there are 136 valid cases and 3
independent variables. The ratio of cases to
independent variables is 45.33 to 1, which
satisfies the minimum requirement. However,
the ratio of 45.33 to 1 does not satisfy the
preferred ratio of 50 to 1. A caution should be
added to the interpretation of the analysis and a
split sample validation should be conducted. SW388R7
Data Analysis & Computers II
OVERALL RELATIONSHIP BETWEEN
INDEPENDENT AND DEPENDENT VARIABLES

The presence of a relationship between the dependent variable


and combination of independent variables is based on the
statistical significance of the model chi-square.

In this analysis, the probability of the model chi-square (9.001)


was 0.003, less than or equal to the level of significance of 0.05.
The null hypothesis that there is no difference between the model
with only a constant and the model with independent variables
was rejected. The existence of a relationship between the
independent variables and the dependent variable was supported.

SW388R7
Data Analysis & Computers II
NUMERICAL PROBLEMS
Variables in the Equation

B S.E. Wald df Sig. Exp(B)


Step
a
INCOME98 .095 .033 8.436 1 .004 1.100
1 Constant -1.033 .527 3.847 1 .050 .356
a. Variable(s) entered on step 1: INCOME98.

Multicollinearity in the logistic regression solution is detected


by examining the standard errors for the b coefficients. A
standard error larger than 2.0 indicates numerical problems,
such as multicollinearity among the independent variables,
zero cells for a dummy-coded independent variable because
all of the subjects have the same value for the variable, and
'complete separation' whereby the two groups in the
dependent event variable can be perfectly separated by
scores on one of the independent variables. Analyses that
indicate numerical problems should not be interpreted.

None of the independent variables in this analysis had a


standard error larger than 2.0. (The check for standard
errors larger than 2.0 does not include the standard error for
the Constant.)
SW388R7
Data Analysis & Computers II
RELATIONSHIP OF INDIVIDUAL INDEPENDENT
VARIABLES TO DEPENDENT VARIABLE

The probability of the Wald statistic for the variable total family
income was 0.004, less than or equal to the level of significance
of 0.05. The null hypothesis that the b coefficient for total family
income was equal to zero was rejected. This supports the
relationship that "survey respondents who had higher total family
incomes were more likely to have been less positive that the
United States would fight in another world war within the next
ten years." Total family income is an ordinal variable that is
coded so that higher numeric values are associated with survey
respondents who had higher total family incomes.

Variables in the Equation

B S.E. Wald df Sig. Exp(B)


Step
a
INCOME98 .095 .033 8.436 1 .004 1.100
1 Constant -1.033 .527 3.847 1 .050 .356
a. Variable(s) entered on step 1: INCOME98.

The value of Exp(B) was 1.100 which implies that a


one unit increase in total family income increased the
odds that survey respondents have been less positive
that the United States would fight in another world war SW388R7
within the next ten years by 10.0%. Data Analysis & Computers II
IMPORTANCE OF INDIVIDUAL INDEPENDENT
VARIABLES TO DEPENDENT VARIABLE

The order of importance is based on the entry


order of the variables included in the stepwise
logistic regression. The entry order is
summarized in the Step Summary table, in
which we see which variable was added or
removed at each step.

Step Summarya,b

Improvement Model Correct


Step Chi-square df Sig. Chi-square df Sig. Class % Variable
1 IN:
9.001 1 .003 9.001 1 .003 67.6% INCOME9
8
a. No more variables can be deleted from or added to the current model.
b. End block: 1
The most important predictor for identifying
survey respondents who have been less
positive that the United States would fight in
another world war within the next ten years
was total family income [INCOME98].

The importance of the predictors stated in


the problem is correct. SW388R7
Data Analysis & Computers II
CLASSIFICATION USING THE LOGISTIC REGRESSION MODEL:
by chance accuracy rate

The independent variables could be characterized as useful


predictors distinguishing survey respondents who have been
less positive that the United States would fight in another
world war within the next ten years from survey respondents
who have been more positive that the United States would
fight in another world war within the next ten years if the
classification accuracy rate was substantially higher than the
accuracy attainable by chance alone. Operationally, the
classification accuracy rate should be 25% or more higher
than the proportional by chance accuracy rate.

Classification Tablea,b

Predicted
EXPECT U.S. IN
WORLD WAR IN 10
YEARS Percentage
Observed YES NO Correct
Step 0 EXPECT U.S. IN WORLD YES 0 54 .0
WAR IN 10 YEARS NO 0 82 100.0
Overall Percentage 60.3
a. Constant is included in the model.
The
b. The cutproportional
value is .500 by chance accuracy rate was computed by
calculating the proportion of cases for each group based on
the number of cases in each group in the classification table
at Step 0, and then squaring and summing the proportion of SW388R7
cases in each group (0.397² + 0.603² = 0.521). Data Analysis & Computers II
CLASSIFICATION USING THE LOGISTIC REGRESSION MODEL:
criteria for classification accuracy

Classification Tablea

Predicted
EXPECT U.S. IN
WORLD WAR IN 10
YEARS Percentage
Observed YES NO Correct
Step 1 EXPECT U.S. IN WORLD YES 20 34 37.0
WAR IN 10 YEARS NO 10 72 87.8
Overall Percentage 67.6
a. The cut value is .500

The accuracy rate computed by SPSS was


67.6% which was greater than or equal to the
proportional by chance accuracy criteria of
65.2% (1.25 x 52.1% = 65.2%).

The criteria for classification accuracy is


satisfied.

SW388R7
Data Analysis & Computers II
Answering the question in problem 3 - 1
In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a statistic?
Assume that there is no problem with missing data, outliers, or influential cases, and that the validation
analysis will confirm the generalizability of the results. Use a level of significance of 0.05 for evaluating the
statistical relationship.

From the list of variables "highest academic degree" [degree], "total family income" [income98], and
"satisfaction with financial situation" [satfin], the most useful predictor for distinguishing between groups
based on responses to "expect u.s. in world war in 10 years" [uswary] was "total family income"
[income98]. These predictors differentiate survey respondents who have been less positive that the United
States would fight in another world war within the next ten years from survey respondents who have been
more positive that the United States would fight in another world war within the next ten years.

The most important predictor for identifying survey respondents who have been less positive that the
United States would fight in another world war within the next ten years was total family income.

Survey respondents who had higher total


We family
found incomes were more
a statistically likely to
significant have been less positive that
overall
the United States would fight in another world war between
relationship within thethe
next ten years.
predictor A one unit increase in total
independent
family income increased the odds thatvariables
survey respondents have beenvariable.
and the dependent less positive that the United States
would fight in another world war within the next ten years by 10.0%.
There was no evidence of numerical problems in
1. True the solution.
2. True with caution
3. False Moreover, the classification accuracy surpassed
the proportional by chance accuracy criteria,
4. Inappropriate application of a statistic
supporting the utility of the model.
SW388R7
Data Analysis & Computers II
Answering the question in problem 3 - 2
In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a statistic?
Assume that there is no problem with missing data, outliers, or influential cases, and that the validation
analysis will confirm the generalizability of the results. Use a level of significance of 0.05 for evaluating the
statistical relationship. We verified that each statement about the
relationship between an independent variable and
From the list of variables
the"highest academic
dependent degree"
variable was[degree],
correct in"total
bothfamily income" [income98], and
"satisfaction with financial situation"
direction of the[satfin], the mostand
relationship useful
the predictor
change in for distinguishing between groups
based on responses to "expect u.s. in world war in 10 years" [uswary]
likelihood associated with a one-unit change of the was "total family income"
[income98]. These predictors differentiate
independent variable. survey respondents who have been less positive that the United
States would fight in another world war within the next ten years from survey respondents who have been
more positive that the United
We also States would
verified thefight in another
order world war
of importance for within
the the next ten years.
independent variables included in the stepwise
The most important predictor for identifying survey respondents who have been less positive that the
analysis.
United States would fight in another world war within the next ten years was total family income.

Survey respondents who had higher total family incomes were more likely to have been less positive that
the United States would fight in another world war within the next ten years. A one unit increase in total
family income increased the odds that survey respondents have been less positive that the United States
would fight in another world war within the next ten years by 10.0%.
The answer to the question is true
with caution.
1. True
2. True with caution A caution is added to the findings
3. False because of the inclusion of ordinal
4. Inappropriate application of a statistic level independent variables. A
caution is added to the findings
because of the preferred sample SW388R7
size is not met. Data Analysis & Computers II
Steps in binary logistic regression:
level of measurement and initial sample size

The following is a guide to the decision process for answering


problems about the basic relationships in logistic regression:

Dependent dichotomous? No Inappropriate


Independent variables application of
metric or dichotomous? a statistic

Yes

Ratio of cases to No Inappropriate


independent variables at application of
least 10 to 1?
a statistic

Yes

Run logistic regression, using method for including


variables identified in the research question.

SW388R7
Data Analysis & Computers II
Steps in logistic regression:
overall relationship and numerical problems

Hierarchical method of
entry used to include
independent variables?
No Yes

Presence of relationship Presence of relationship


confirmed by test of confirmed by test of
model chi-square? block chi-square?
No No

False Yes False


Yes

Standard errors of
Yes
coefficients indicate
False
presence of numerical
problems (s.e. > 2.0)?

SW388R7
No
Data Analysis & Computers II
Steps in logistic regression:
relationships between IV's and DV

Stepwise method of entry


used to include
independent variables?
Yes

No
Entry order of variables
interpreted correctly?
No

False
Yes

Relationships between No
individual IVs and DV groups False
interpreted correctly?

Yes
SW388R7
Data Analysis & Computers II
Steps in logistic regression:
classification accuracy and adding cautions

Overall accuracy rate is No


25% > than proportional False
by chance accuracy rate?

Yes

Satisfies preferred ratio of No


cases to IV's of 20 to 1 True with caution
(50 to 1 for stepwise)

Yes

One or more IV's are Yes


ordinal level variables? True with caution

No
SW388R7
Data Analysis & Computers II
True

You might also like