Chapter 6. Special Research Topic II: Race & Gender Inequality

3/4/2011 SPSS
�Gregg Lee Carter�
Bryant College
Harcourt College Publishers
Chapter 6. Special Research Topic II: Race & Gender Inequality
We continue our investigation of the status attainment model as sketched previously. We will begin by
setting up a brief set of interpretations to explain inequality among the races and between the sexes,
including the role that education plays in explaining these inequalities. We will then continue testing the
status attainment model, focusing on the race/gender aspects of it, as well as learn some new statistical
techniques to assist our tests. Many of the race/gender relationships have been confirmed with data gathered
in other industrialized countries; thus, your findings and interpretations will have applicability well beyond
the United States.
Explaining Sex and Race Differences in Status Attainment
Race Differences
The plight of many African Americans in the United States has worsened in the past twenty years, despite
tremendous reductions in discrimination. This fact helps us to understand the underlying causes of the
cataclysmic rioting in Los Angeles in the spring of 1992, as well as other inner-city riots that occurred
sporadically in the United States throughout the 1990s. Although there are many causes for the
impoverishment and forlornness of inner-city residents, a major focus of social science research has been on
the mismatch between the low education of many inner-city blacks and the high educational demands of
current urban jobs, which are concentrated in white-collar service sectors. Loss of employment opportunities,
in turn, has had devastating effects on African-American families� generating desertions, separations, out-of-
wedlock births, single-parent homes, welfare dependency, and crime.
In short, a key explanation of African-American poverty is the argument that African-Americans have, on
…bryant.edu/…/Spss Chapters 6 and 7… 1/27
3/4/2011 SPSS
average, lower educational attainment than whites and other groups. Indeed, we did a partial testing of this
idea in the last chapter. In this chapter, we will refine our testing and learn more about SPSS and how
researchers think through their models and theories.
Sex Differences
In 1999, the median income for American males working year round and full-time was approximately
$32,000; for females, it was about $24,700. Women� s income is about three-quarters of men� s income, and
this ratio has risen only slightly over the past 25 years� despite the massive influx of women into the world
of full-time work. Such gender inequality is found in virtually all nations, with women in Western Europe
faring best by earning 85 percent of what men make, and doing worst in Latin America and Eastern Europe,
where the same figure is about 65 percent; the percentage for African and Asian nations falls in between at
about 75.1
The growing labor-force participation of women� in virtually all nations� brings into the limelight the
problem of economic inequality between the sexes. In days gone by, when a woman� s income was seen as a
mere supplement to her husband� s or to the overall family income, one might have used a convoluted
model of fairness to justify men being paid more than women for the same work or job.
However, such a model is clearly inappropriate today, when it is common for women to be heads of families
(that is, raising children with no husband present), or live alone, or, if married, to be contributing heavily to
family income and significantly defining the standard of living.
In this chapter, we will use SPSS to discover the degree to which sex differences in status attainment can be
accounted for by sex differences in educational attainment. If controlling for education, we still find
significant differences in status attainment between the sexes, then we must confront the high probability that
what can account for such differences is, in part, one of the many faces of sex discrimination.2
More Data Analysis Techniques
In what follows, we will carry out further univariate analyses of our GSS educational attainment variable
(educ). We will then proceed to test more fundamental pieces of this model� doing this by introducing new,
powerful data-analysis tools.
One-Variable Analysis (Measuring Skewness and Kurtosis)
As we have noted several times along the way in this workbook, you should embellish your analyses with
graphical representations of your findings. Good one-variable graphical representations help us to get a feel
for the variable� s distribution, its central tendency, and its potential usefulness. Earlier, you were
introduced to the histogram. Let�s refine our use of this graphical tool, delving back into the concepts of the
normal curve, skewness, and (a new concept for you) kurtosis. We will re-examine educ to see how we can
better understand the ways in which it fits, and does not fit, the normal curve.
Let� s use SPSS to further our understanding. Open your Gss_1998_ Std_Ver.sav file.
Click on Graphs / Histogram

From the variable list on your left, click on educ and move it over to the Variable box
Next click on the small box next to Display normal curve
Finally, click on OK.
If you have done everything correctly, your Output should look like the screen shot below:

3/4/2011 SPSS
Inspect this excellent graphic, and see how the normal-curve overlay helps us to detect how much it deviates
from normality. In your mind, what stands out?
(thinking � thinking)
Among the answers you might have given are (a) it is somewhat normal looking; however, it (b) seems a little
too peaked in the middle; and, it (c) appears to be negatively (left) skewed. We can verify if these assessments
are correct by examining statistics that give formal measurements of skewness and kurtosis.
The formula for skewness (available in any standard introductory statistics textbook) provides us with a
numeric index of how far the distribution of cases deviates from normality regarding its � tails� (the left and
right ends of the curve as they come closer and closer to the X axis). When skewness equals zero (0), the
distribution is not skewed (though this determination must always be made in conjunction with a graphic
display of the data, as weirdly shaped distributions can wreak havoc on univariate statistical formulas). As
skewness tends to + or � one (1), + or � two (2), and + or � (3), the curve is becoming increasingly skewed
(left, if the skewness sign is � ; and right, if the skewness sign is +). When skewness exceeds the absolute
value of two, you can reject the idea that the distribution you are examining approaches anything resembling
� normality.�
Similarly, the formula for kurtosis (again, available in any standard introductory statistics textbook) provides
us with a numeric index of how far the distribution of cases deviates from normality regarding its height (if it
is overly peaked, the curve is labeled leptokurtic; if it is overly flat, the curve is deemed platykurtic). Like
skewness, when kurtosis equals zero (0), it does not deviate from normality; that is, the distribution is neither
too peaked nor too flat (though, once again, this determination must always be made in conjunction with a
graphic display of the data, as weirdly shaped distributions can wreak havoc on univariate statistical
formulas). As kurtosis tends to + or � one (1), + or � two (2), and + or � (3), the curve is becoming
increasing lepto- (if the sign is +) or platy- (if the sign is � ) kurtic. When kurtosis exceeds an absolute value of
two, you would reject the notion that the distribution you are examining has a close resemblance to the
normal curve.

3/4/2011 SPSS
Okay, let� s see if our visual assessment of the amount and type of skewness and of kurtosis in our
histogram coheres with the formal statistical estimates.
Click on Analyze / Descriptive Statistics / Explore

From the variable list on your left, click on educ and move it over to the Dependent List box
Next, under Display, click on the small circle next to Statistics
Voila! Notice how the formal statistical estimates match closely what we estimated by using only our eyes
and artistic sensibilities. Indeed, the distribution is fairly normal-looking, but is skewed slightly to the left
(Skewness=� .106), and is a bit too peaked (Kurtosis=1.039).
Two-Variable Analysis (The t-test; Anova)
The t-test
The impact of sex on status attainment is a combination of direct and indirect effects. An example of an
indirect effect would be that women face more discrimination in the work world than men do, and, in turn,
discrimination decreases the odds of their achieving the highest levels of status attainment. Another indirect
effect would operate through education: although younger women are as likely to graduate from college as
younger men, when we take the entire adult population, women are less likely to have received extra
schooling (due to the constraints of their gender roles, e.g., marrying young; having children; instead of
planning on a career, assuming that much of their adult lives would be spent as homemakers). Lacking
schooling, in turn, decreases women� s odds of reaching the higher levels of status attainment. Because
discrimination is almost always � unmeasured,� its effect is usually taken as the direct effect of sex on status
attainment after controlling for education (and other intervening variables). Considering both direct and
indirect effects, we might sketch the relationships among sex, education, and status attainment as follows:
If this model is true, we would expect the following: (a) that women should have a significantly lower
average on the � years-of-schooling� variable compared to men; (b) similarly, women should have a lower
average on the � level-of-status-attainment� variable compared to men; and (c) the relationship between sex
and status attainment should reduce significantly when we control for education. Let� s check out one of
these predictions now, using a very common test for comparing means between two groups: the t-test. More
specifically, let us see if women have a significantly different mean on status attainment compared to men.
The t-test is best understood as a formal hypothesis test. Formal hypothesis testing begins with the notion
there are always two possible hypotheses when testing for the existence of a statistically significant
relationship: First, there is the null hypothesis, stating that there is no significant relationship. More
specifically, in the case of the t-test, that the population means (on Y) between the groups being tested (X) are
the same. Thus, H0 (the formal abbreviation for the null hypothesis) is that the mean of group1 is equal to the
mean of group2 (in the case at hand, that the average level of prestg80 for women is equal to the average level
for men). If we reject the null hypothesis� that is, find that the probability of the two means is drawn from the
same population�then we are, by default willing to accept our research hypothesis, usually denoted as H1 ,
that there is indeed a strong probability that the means actually differ (and, in the case at hand, that the
average level of prestg80 for women is not equal to the average level for men).
3/4/2011 SPSS
Some researchers place special importance on whether their conclusions regarding a particular
relationship are correct (for example, whether a new cancer drug really works). These researchers see
two possible errors in a particular conclusion: First, that they reject H0 when it is true (they label this a
Type I Error� that is, � a type one error� ); and second, that they accept H0 when it is false (labeled a
Type II Error� that is, � a type two error� ). The significance level (e.g., .01, or .05, or .10 � recall that
scientists usually require a relationship to be at the .05 level or lower to be considered � statistically
significant�) is the probability of rejecting H0 when it is true. Thus, if a Type I error would have very
serious (�bad�) consequences, the researcher sets the significance level very low (e.g., .01, or even
.001). In contrast, if a Type II error would have the more detrimental consequences, the significance level
is usually raised� even to .10 or .15. (Note: this is very abstract stuff, so you will probably have to read
this paragraph more than once!)
Okay, let� s use one of the most popular statistical techniques for doing formal hypothesis-testing, the t-test,
to assess whether the means for men and women on prestg80 really differ.
Open your Gss_1998_Std_Ver.sav data file. And�
Click on Analyze / Compare Means / Independent-Samples T Test
This will bring you to the Independent-Samples T Test box, where you will move prestg80 over to the
Test Variable(s) box
Then move sex over to the Grouping Variable box
Next, just below the Grouping Variable box, click on the Define Groups tab
Under Use specified values, key-in the numeral 1 for Group 1, then 2 for Group 2 (recall that 1=Male,
and that 2= Female for our GSS sex variable); then click on Continue
If for some reason you want to change the level of significance from the conventional standard of .05 (a
95% confidence interval, or conversely, a 5% chance the differences between the means is due to
chance alone), you could click Options and do so; however, let�s leave our significance level as it is
already set; and simply click on OK.
If you have done everything correctly, you should have gotten the output displayed below:

3/4/2011 SPSS
Interesting ... huh? If you inspect the Group Statistics box, you can see that the Male/Female means are
nearly identical (45.25 vs. 46.27), and more surprisingly, the Female mean is larger! Next, note in the
Independent Samples Test box, the value of t is well below the minimum value at which one begins to see
significant relationships, that is, 1.96 (usually rounded, in most researchers� minds, to simply 2). Finally,
note that the 2-tailed test of significance is .182� a value far above the usually required .05.
Anova
The impact of race on status attainment, like sex, is a combination of direct and indirect effects. An example of
an indirect effect would be that African-Americans face more discrimination in the work world than whites
and, in turn, discrimination decreases the odds of reaching the highest levels of status attainment. Another
indirect effect would operate through education: African-Americans are less likely to get extra schooling,
which, in turn decreases their odds of being � high� on measures of status attainment. Because
discrimination is usually � unmeasured,� its effect on status attainment is usually taken as the effect of race
on status attainment after controlling for education (and other intervening variables). Considering both direct
and indirect effects, we might sketch the relationships among race, education, and status attainment as
follows:
If this model is true, we would expect the following: (a) that blacks should have a significantly lower average
for the � years-of-schooling� variable compared to whites (or to other races); (b) similarly, blacks should
have a lower average for the � level-of-status-attainment� variable; and (c) the relationship between race and
status attainment should reduce significantly when we control for education. Let� s check out one of these
predictions now, using a common test for comparing means among three or more groups: analysis of
variance (Anova).
Anova is similar to the t-test in that it is best understood as a formal hypothesis test. And, as with t-test, the
researcher sets up the null hypothesis� implying no significant relationship between X and Y (that is, the
population means on Y among the various categories of X are the same). If we reject the null hypothesis� that
is, find that the probability of the means is drawn from the same population� then we are, by default willing
to accept our research hypothesis that the means actually differ. Although we need not delve into the
mathematics for how this is done, intuitively Anova calculates the value of the F statistic, and determines the
probability for a particular value of F to have occurred by change alone. If this probability is low, then we feel
comfortable in concluding that indeed, X is a predictor of Y. As the variability among means of the various
categories of X increases, so does the size of F, and as this occurs, the likelihood of F being due to chance
alone decreases. Similarly, as the variability among the cases within a category of X decreases, the size of F
increases, and again its probability of it being due to chance alone decreases. Examined intuitively, the
formula for F is:

3/4/2011 SPSS
To make this concrete, if the three categories of race (White, Black, Other) have very different means on educ,
while at the same time the variability among the individuals who are black is very small, as well as the
variability among the individuals who are white, as well as the variability among the individuals who are
� other,� then we will achieve a very high value of F and a very low probability that the differences among
means is due to chance alone (give yourself some time to absorb this paragraph! Reread it more than a few
times!).
Let� s go ahead and do an Anova of educ (dependent variable) and race (independent variable): Open
your Gss_1998_Std_Ver.sav data file. Then�
Click on Analyze / Compare Means / One-Way ANOVA
This will bring you to the One-Way ANOVA box, where you will move educ over to the Dependent
List box
Then move race over to the Factor box
Next, click on the Options tab; and in the new box, click on Descriptives and Means plot
Finally, click on Continue and then on OK.
If you have done everything correctly, you should have output that� after some minor editing (see what we
did with our scatterplot in Chapter 5)� looks similar to this (sans the � F=�� ):

3/4/2011 SPSS
Unlike what happened earlier with our t-test, we see that our analysis-of-variance findings confirm our
model� that is, in fact, there are significant differences among Blacks, Whites, and Others in their average
level of education (with Whites having the highest mean, Blacks the lowest, and Others falling in between).
Multivariable Analysis Using Dummy Variables
In the preceding chapter, we learned some of the rudiments of regression analysis. We learned that it was a
means for establishing the effect of X on Y when the relationship was linear. We learned that this effect was
equal to the slope of X. We learned that multiple regression is a basic extension of bivariate regression, and
that the partial slope of any particular X in the equation is the effect of that X on Y holding constant the other
Xs (independent variables). We learned that regression was a parametric technique that assumed all of the
variables, both the dependent and the independent(s), were at the interval- or ratio-level of measurement.
And, finally, we learned that nominal variables with two values could be treated as interval-level variables
because the distance between two values met the equivalence standard (since there is but one distance, say
the distance between males and females, then it, of course, equals itself). Researchers call dichotomous
nominal variables�usually coded as 0s (zeros) and 1s (ones) (e.g., Male=0; Female=1)� dummy variables.
Virtually any variable, at any measurement level, can be converted into a set of dummy variables. Let us
show how this can be done when there are more than two values, such as we have with race.
Let� s start by converting race into a set of dummy variables. Instead of considering it as a single variable, we
will treat each value as a variable� with those possessing it being coded as � 1� s� and those not
possessing it coded as � 0� s.� For mathematical reasons that need not concern us here, whenever we
convert a multi-valued original variable into a set of dummy variables, we must have one less dummy
variable than the number of values of the original variable. The remaining value equals the intercept in the
basic regression equation:
Yi = a +b1X1 i + b2X2 i � + bnXni + e i
where: Yi = the dependent variable at value i
a = the intercept
b = the slope of the independent variable X
n = the total number of independent variables

3/4/2011 SPSS
ei = the error term for Yi3
Dummy-variable analysis is a natural extension of both Anova and ordinary regression analysis, and indeed
allows us to combine the two in a way that leads to a better understanding of the true effect of a particular X
on Y, as well as to readily see whether there are any interaction effects between or among the Xs (you were
introduced to the idea of interaction in Chapter 4, see p. 155 ff.).
Let� s make some of this concrete. First, we will convert race into a set of dummy variables. Open your
Gss_1998_Std_Ver.sav file, then:
Click on Transform / Recode / Into Different Variable
Highlight race and move it over to the center box
In the Output Variable box on the right, key in white under Name, then click on Change
Click on the Old and New Values tab, and insert 1 under Old Value (because this was the value
for � White� in our original race variable) in the top left box, where you should find Value
checked off by default
Then in the box on the right side, under New Value, key in 1 and then click on the Add tab
Return to the left side of the screen, click on Range, and key in 2 in the left box (before the word
�through� ) and 3 in the right box (these are the values for Black and Other)
Next, as before, in the box on the right side, under New Value, key in 0 and then click on the Add
tab; if you have done everything right up to now, your screen should look like the image
below:
If it does, then click on Continue, then on OK.
Repeat the above command sequence (clicking on the Reset tab when you are first brought to the Recode into
Different Variables box). However, this time around, you will label your new variable black, set the original
variable value of race=2 to the new value of 1, and finally let Whites (race=1) and Others (race=3) equal 0. If
you have done this transformation successfully, your final screen shot (just before clicking on Continue and
OK) should look like this:

3/4/2011 SPSS
(If it does, then save your file so that you can have access to your new dummy variables later on.)
Recall that we do not create a dummy variable for the final value of our original value (its effect on Y will
equal the intercept, or a term, in our multiple regression equation).
Our new set of dummy variables, white and black, may be used by themselves in a new equation, or in
combination with other independent variables. As we discovered in Chapter 5, maeduc (mother� s
education) correlates with educ (respondent� s education), so let us work our way through two regression
equations:
(I) educ regressed on our new dummy variables white and black;
(II) educ regressed on our new dummy variables, but adding maeduc to the mix.
Regression Model (I)
Let us see how we can better understand our original Anova analysis by the use of dummy variables:
Click on Analyze / Regression / Linear
Move educ over to the Dependent box, then black and white over to the Independent(s) box
Your Output screen should look like the one below:

3/4/2011 SPSS
Notice how the F-test for the regression equation is identical to our earlier results using Anova� thus, the
regression equation as whole is statistically significant (Sig.=.007). However, also notice in the Coefficients
table that the effects of neither White nor Black are individually powerful enough to achieve statistical
significance at the .05 level� even though the signs (positive for White; negative for Black) are in the
expected directions.
Regression Model (II)
Let� s build on this initial equation and see how the addition of maeduc may produce a better model with a
higher value of R 2 .

Your Dependent box should already contain educ, while black and white should already be in the
Independent(s) box; thus, you need only to move maeduc over to the Independent(s) box
Finally, click on OK
3/4/2011 SPSS
The bottom of your Output screen should look like the following:
Several findings jump out at us here. First, the overall regression equation is highly significant, and the value
of R2 increased dramatically. Second, the coefficient for black maintains itself quite well and, indeed, sees its
significance level coming closer to the .05 standard; however, the coefficient and its significance level for
white show that in the context of educ and black, white has little impact on educ.
There is a split in the social and behavioral sciences over the preference of using Anova versus dummy-
variable regression analysis. As we saw above, converting race into dummy variables yields the exact same
results as the Anova� where race was taken by itself. However, because one can refine the effects of race (or
any nominal independent variable) on educ (or any dependent variable measured at the interval or ratio
level) when the researcher converts it to a set of dummy variables, sociologists, economists, and political
scientists tend to prefer using dummy-variable analysis, while psychologists tend to prefer Anova. Because
sociologists, economists, and political scientists are more likely to be analyzing survey data instead of
experimental data, they are much more likely to be developing multivariable models, which can incorporate
dummy variables quite easily; moreover, dummy variables allow for the refined investigation of interaction
effects (see Chapter 4, p. 115 ff.). As opposed to nominal variables with three or more categories, two-category
dummy variables can be readily used to develop interaction terms with other independent variables (usually
by multiplying the dummy variable by another X in the equation).
Endnotes
1 For example, one
�obstacle that women face in the occupational realm is that they tend to be saddled, more so than men, with
familial obligations. Even when both spouses work, and even though men have taken on more responsibilities for these tasks in recent
decades, women are still expected to take on more responsibility for raising the children, keeping up the home, and taking care of sick
3/4/2011 ySPSS g g g
th
relatives�; see Thomas J. Sullivan, Introduction to Social Problems, 5 ed. (Boston: Allyn & Bacon, 2000), pp. 236�237. Dropping out of
the workforce for one or more years to raise children or care for aging parents seriously hurts women economically when they return to
their jobs�for example, they have lost seniority and opportunities for training and advancement that their male counterparts did not
forfeit. If a society devotes very little of its resources to child-care facilities/ support, then women are hurt more than men. Other faces of
discrimination that haunt women more than men include being kept out of the �old-boy� network; tokenism; lack of �mentorism�
(men are more likely to groom other men for promotion, likewise for women�but fewer women are in high-level positions); fewer
opportunities for training and promotion because of actual or feared (by employers) child-care obligations; and sexual harassment.
2
See Sharlene Hesse-Biber and Gregg Lee Carter, Working Women in America: Split Dreams (NY: Oxford University Press, 2000), Figure
3.7 on p. 63. Parts of this and of the preceding section were excerpted from Gregg Lee Carter, Analyzing Contemporary Social Issues, 2 nd
edition (Boston: Allyn & Bacon, 2001), pp. 129, 130, and 157. Used by permission.
3
As you progress in your research-methods and data-analysis training, you will learn that the assumptions made concerning e (the
error term) are the foundation of regression, Anova, correlation, and related statistics that are aimed at uncovering the linear
relationships among variables �all of these fall under what has been deemed the general linear model . Recall from examining your
scatterplot that our observed data points rarely fall exactly on the regression line. The vertical distance represents the �error� in our
prediction. When we have multivariable models, we cannot produce a two-dimensional scatterplot, but the idea is the same�no matter
how many independent variables we have, there will always be a gap (which we are labeling here as �error�) between the predicted
and actual values of a given value Y. For each predicted value of Y, the general linear model assumes the following: (a) if we were to
draw repeated samples and recalculate our equation for each sample, the e values for each value of Y would be normally distributed;
and (b) the variances (the squared standard deviations) of each of the e values for each predicted Y would be equal. Furthermore, we
assume that (c) the sample of data we are working with has been drawn randomly; and (d) the underlying relationship between Y and
its predictor(s)�that is, the independent variable(s)�is linear (or can be �linearized� with a transformation, e.g., the log or square-
root function).
Return to Top of Page
Return to the Table of Contents
Return to Main Home Page
Chapter 7. Special Research Topic III: Explaining Crime & Deviance
Our final special research topic examines explanations for crime and deviance. Along the way, we will learn
more about research methods, data analysis, and SPSS� including how researchers transform their data to
develop better models.
Anomie Theory
Robert K. Merton� s classic essay on the causes of crime and deviance emphasizes that a society� s cultural
goals may not be accompanied by realistic means for attaining them� at least not for everyone.1 Those who
want the accepted goals but do not have ready access to the accepted means to attain them may �innovate�
and come up with their own means. Thus, for example, the inner-city adolescent whose family has few
resources to send him to private school or to help him get a job or to start a career� and who attends a school
where the majority of students will drop out�may well find himself out on the streets as a young man with
few prospects. From the popular culture�television, movies, magazines�and from his friends and family he
has absorbed the cultural goal of material success. Unable to achieve it by conventional means� that is, by
getting an education and using his networks to get a good first job� he �innovates� and tries to reach it
3/4/2011 SPSS
through criminal means (he steals, sells drugs, or whatever).
We tend to associate crime with poor, minority-group neighborhoods. Given this association, Merton�s
argument has great intuitive appeal. However, it has not stood up well to all tests of empirical confirmation.2
The argument implies an inverse relationship between social class and crime. In studies of street crime
(muggings, robberies, assaults, murders), the relationship can be confirmed; but when crime is defined more
broadly, to include white-collar and business crimes, and when crime is measured by indicators other than
official police reports (e.g., by self-report), the relationship is weak and inconsistent.
Control Theory
Merton� s theory of crime and deviance has become known as � anomie theory� in sociological jargon. He
adopted the concept of � anomie� from Durkheim�s writings.3 Anomie means being without norms or in a
state of normative confusion; it can be used to characterize individuals, groups, or societies. Whereas Merton
viewed anomie as arising from blocked opportunity, Durkheim saw it more as a product of the weakening of
the quality and quantity of social ties�which may be caused by rapid social change, divorce, and other
threats to the solidarity of the groups to which individuals belong.
Following in the footsteps of Durkheim, Travis Hirschi (see footnote #2 below) demonstrated the importance
of an individual� s attachment to the group in keeping the individual in check, i.e., in line with normative
expectations for behavior. Hirschi emphasizes that the key question in the study of deviance is not � Why do
individuals commit deviant acts?� but rather, � Why don�t individuals commit deviant acts?� His answer
is founded on a fundamental principle of human reality: People like being liked and like being accepted; to
achieve these aims they conform to expectations of those whom they want to like and accept them. Thus,
Hirschi found that adolescents who had strong ties to their families and to their schools were less likely to
commit delinquent acts�because such acts are frowned upon by these two groups. Hirschi�s interpretation
of Durkheim has become known as � control theory� in social-science jargon, implying that individuals
with strong attachments to mainstream groups (e.g., family, school, church) are held in check, or controlled,
from committing deviant acts.
More Data Analysis Techniques
In what follows, we will carry out analyses of our state-level data file that are intended to assist us in building
better models for explaining crime and deviance.
As a prelude to the new techniques that we will be learning, let us test several hypotheses that stem from
anomie theory, then do the same for control theory.
Testing Three Hypotheses Based on Anomie Theory
As noted above, Merton�s anomie theory of crime and deviance appears to work best for � street crime,� as
opposed to business/corporate/white-collar crime. Let us see how well hypotheses based on his theory are
in harmony with the patterns in our data.
Our hypotheses to be tested are:
Poverty is positively related to violent crime.

Poverty is positively related to property crime.
Poverty is positively related to murder
3/4/2011 SPSS
Poverty is positively related to murder.
Load your States_Std_Ver.sav data file into SPSS, then execute your Analyze / Regression / Linear
command sequence� letting vcrime be your first dependent variable and poverty be your independent
variable. If you have done your SPSS work correctly, you should arrive at the following findings (here is the
bottom half of your Output Viewer):
If it is not obvious to you, it should be after some more pondering of these findings: Our first hypothesis has
been strongly confirmed!
Repeat your regression command sequence, but this time replace vcrime with pcrime. Your findings should
3/4/2011
p y g q SPSS p g
look like this:
Again, it should be obvious to you: Our second hypothesis has been strongly confirmed!
Finally, repeat your command sequence, but this time replace pcrime with murder. This last regression
analysis produces the output shown below:

3/4/2011 SPSS
Of course, by now, you are getting the drift here: Of the three hypotheses, this final one has received the
greatest degree of confirmation.
At this point, we are satisfied that Merton� s anomie theory provides some explanatory power at the state-
level of data analysis.
Testing Three Hypotheses Based on Control Theory
As noted above, Hirschi� s interpretation of Durkheim� s anomie theory sees social problems like crime and
deviance stemming from the weakening of social ties. One common measurement of this concept is divorce.
States where the divorce rate is high are filled with individuals who have weakened many of the basic bonds
that keep them on a straight and narrow path. We might expect such people to be more prone to all types of
deviance�including crime, drug abuse, and suicide (not that all divorced people head down such paths, but
enough of them such that we will be able to observe differences in deviance rates between states with high
versus low rates of divorce). Let us see how well hypotheses based on control theory are in harmony with the
patterns in our data.
The hypotheses we will test are:
Divorce is positively related to violent crime

3/4/2011
Divorce is positively related to violentSPSS
crime.
Divorce is positively related to property crime.
Divorce is positively related to murder.
Once again execute your Analyze / Regression / Linear command sequence� letting vcrime be your first
dependent variable and divorce be your independent variable. Record your basic findings (r, R2 , the value
and significance of F for the entire regression, the value and significance level of B). Repeat this process two
more times: first substituting vcrime with pcrime, then substituting pcrime with murder. Examine your basic
findings for all three conclusions. What is your basic conclusion? (Say it aloud, or write in down). Then check
the footnote below to see if it is correct.4
At this point, our conclusion is that street crime is better predicted by poverty than by divorce, and that
Merton� s anomie theory fits the patterns in our data better than Hirschi�s control theory. However, we
need to spend more time with the variables that we have been working with� most importantly, we must
check out the associated scatterplots to make sure that our regression/correlation assumptions have been
met.
One-Variable Analysis (Sorting)
Perhaps we can get our regression line to better fit each of our analyses if we delete any � outliers.� Recall
that outliers are data points that fall far away from the trend in most of the other data points. Let� s inspect
one of our scatterplots to see if there are any outliers and how, by deleting them, we may be able to improve
our predictive ability.
Let� s use our murder/poverty regression as our basic example:
Click on Graphs / Scatter / Define
Then highlight poverty in the left box, and using the appropriate center arrow, move it over to the X
Axis box
Repeat the process, but this time move murder over to the Y Axis box; then click on OK
We want to improve the look of our scatterplot, so double-click on the graph; this will bring you to
the SPSS Chart Editor
Then click on Chart / Options, and check the box by Total under Fit Line; then click OK (you
should now see the regression line)
Next click on Chart / Axis / X-scale / OK, and change Title Justification from Left/bottom to
Center, then clicking on OK
Then click on Chart / Axis / Y-scale / OK, and change Title Justification from Left/bottom to
Center, then clicking on OK
Click again on Chart / Inner Frame � which should remove the check mark next to it (you want no
check mark next to either Outer or Inner Frame)
Finally, close your Chart Editor box (clicking on x in the upper right corner), then click outside the
picture area of the scatter-plot; your screen display should now look like this:

3/4/2011 SPSS
Quite obviously, we have an egregious outlier� the free-standing data point in the upper-right area of the
plot.
To repeat an important point made earlier in this workbook: parametric statistics� such as correlation and
regression coefficients� are very sensitive to outliers. When these are detected, the researcher must delete
them if an accurate description of the relationship is to be captured by a correlational or regression analysis.
In the present case, the extreme � high�-murder/� high� -poverty outlier may be inflating our correlation
and regression coefficients, as well as their statistical significance. (In other situations, we may find an outlier
deflating strength and significance of these coefficients.)
The problem now is: How do we find this outlier case so that we can get rid of it? There are many possible
ways to do this in SPSS, but one of the easiest is by sorting our data. The sort command in SPSS will rank-
order our cases by any given variable. After we have done such a sorting, we can easily identify the state
involved (the one with the extremely high murder rate).
Follow this command sequence to sort our cases by murder:
In your Data View window, click on Data / Sort Cases in your menu bar, which will bring you to the
Sort Cases box
Scroll down your variable list until you find murder, then move to the right-hand box (Sort by)
Under Sort Order, click on Descending, then on OK.
Your screen should now look like the image below:

3/4/2011 SPSS
If you scroll to the right a little, until you hit murder, you will notice that the cases (states) have been rank-
ordered by this variable. You should also notice that the District of Columbia is our outlier case for murder.
Click on the gray cell, containing the numeral 1 that is just to the left of � District of C.� The row should now
be highlighted. Right click on your mouse and choose the Cut option to delete this row. Voila! We have
gotten rid of our outlier!
Repeat your regression and scatter plot analyses from above. What did you find? Indeed, the outlier was over
magnifying the strength of the relationship. Though still evident, the size of all the key indicators of
relationship strength have been reduced. Most importantly, notice that R2 � which represents the amount of
variation in Y that can be explained by X� has reduced from .268 to .192.
One precaution: when you have used the Sort command to help identify and delete outliers, be sure not to
save your data file under the same name, as you will want to retain all of your cases for other analyses.
Indeed, so as not confuse the analyses that follow, close you states data file without saving it, then re-open it
so that you have the entire data file (including the District of Columbia!) at your disposal.
Two-Variable Analysis (Curve-Fitting)
Sometimes our scatterplot will reveal not only serious outliers but also an indication of the X� Y relationship
being nonlinear (that is, not fitting a straight-line pattern). When this is the case, we can often still use
regression analysis to capture the relationship, but only after we have transformed the data. Common
transformations include taking the log or square-root of X, or mapping a quadratic relationship (either a U-
shaped or inverted U-shaped pattern to the scatterplot) by entering both X and X2 on the independent-
variable side of the equation.
Let� s show how this can be done by working with a hypothesis that would stem from the control theory of
crime and deviance. Areas with high population growth contain many migrants who have left friends and
family behind; when life becomes stressful, some of these individuals may suffer from their loss of social
support and come to have a greater risk of deviant behavior. Using your previously-given command
sequence for producing a scatterplot, create one for angrowth (X) and pcrime (Y). If you have executed the
3/4/2011 SPSS
command sequence correctly, you should end up with the plot below:
Quite obviously, in the upper-left corner, we can see an extreme outlier. Use the proceduretaught to you
earlier (via sorting) to identify and then delete this case. If you have done everything correctly, your
scatterplot will now look like the screen image below:

3/4/2011 SPSS
As we can readily see, we have a much better fit to the data points after having deleted District of Columbia
(our �outlier� ). However, if you examine the plot closely, you can see the rough contours of the beginnings
of a concave parabola (an inverted-U), which can be better captured by the quadratic equation � Y=a + B1 X +
B2 X2 � than by the linear equation � Y=a + B1 X.� (Identifying such subtle patterns in a scatterplot is a skill
that takes a fair amount of experience to detect. Rest assured, though, that it is a skill that becomes easier and
easier to acquire as you examine more and more scatter-plots.)
Okay, let� s compare our two regressions, the linear and quadratic, and see which one provides better
predictive power (working with our modified data file which does not include the District of Columbia):
Click on Analyze / Regression / Curve Estimation

Move angrowth over the independent variable box (in the middle of your screen), and pcrime over the
dependent variable box (on top of your screen)
Under Models you should find Linear already checked (if it isn� t do so!); to contrast this quadratic
3/4/2011 SPSS
Under Models, you should find Linear already checked (if it isn� t, do so!); to contrast this quadratic
model, check the Quadratic option
Your output screen should now appear similar to the following screen image (I have substituted the colored
quadratic curve with a thicker font to distinguish it from the thinner linear font):
Visually, we can discern that the quadratic curve provides a better fit to the observed points than the straight
line. The change in R2 between the linear and the quadratic equations strongly confirms this conclusion: the
value of R2 jumps almost 18% from .395 for the linear specification of the regression equation to .479 for the
quadratic specification.
We have gained significant predictive power. However, most of the time, researchers want not only to specify
the correct model, but also to be able to interpret it. The question here is why do the effects of angrowth occur
in a positive linear fashion until reaching its higher values, where it begins to produce a smaller effect (and
thus the quadratic-like curve)? When we have such a pattern, one common, generalized interpretation is
� saturation.� That is, the effects of X on Y begin to taper off at very high levels of X because we have
surpassed the saturation point� the point at which increases in X will no longer produce increases in Y.
Think about the volume control on your radio or television: as you increase it, the set becomes increasingly
loud; however, at some point, increasing the volume provides no detectable increase in loudness. Thus, the
interpretation for our quadratic specification could simply be put � increases in angrowth yield increases in
pcrime until the highest levels of angrowth, where saturation has occurred�and from that point on,
increased angrowth yields smaller and smaller effects.�

3/4/2011 SPSS
Multivariable Analysis (Multivariable Effects)
As we have seen in the preceding two chapters, multiple regression is a powerful tool in the testing of causal
models, which is at the heart of most, quantitative social science research. One common use of the technique
is to develop multivariable models where the various independent variables represent competing
explanations of the dependent variable. Although not as straightforward as it first appears (you will be
introduced to the relevant complications in more advanced courses), comparing the standardized regression
coefficients allows the researcher to assess the relative effects of the independent variables� and thereby get a
sense of which explanation does a better job in accounting for change in the dependent variable at hand.5
There is one word of caution we will mention here: only unstandardized regression coefficients can be
compared across samples. Correlation and standardized coefficients cannot, because their magnitude
depends upon the amount of variance that there is in our independent variable. Thus, the functional
relationship between a particular X and a particular Y may be the same across different samples (and the
unstandardized slope allows us to verify this), but the values of r and of beta (the standardized slope) will
vary because we can almost be assured that the variance of the independent variable will differ between and
among samples.
Okay, lets see how angrowth and poverty stack up against each other when put into the same equation to
explain pcrime. Before a researcher examines his or her findings, he or she always considers beforehand the
possible outcomes and how they might be interpreted. Pause here for a moment and think about the possible
findings that might arise from our multiple regression equation � pcrime=Beta1 (poverty) + Beta2 (angrowth)
��
(thinking � thinking � wait a minute or two before going any further� indeed, write down your answers or
say them aloud� )
Here are some of the more important possible outcomes:
� the slope for poverty decreases significantly, while the slope for angrowth maintains itself (in which
case we would lean toward control theory as providing us with the best understanding of why property
crime rates vary so much among U.S. states);
� the slope for angrowth decreases significantly, while the slope for poverty maintains itself (in which
case we would lean toward Merton� s anomie theory as providing us with the best understanding of why
property crime rates vary so much among U.S. states);
� the slopes for both poverty and angrowth maintain themselves fairly, and we concomitantly witness a
notable increase in the amount of explained variance in Y (that is, a rise in the value of R2 ); in this case, we
would accept that both theories of crime and deviance have explanatory ability and that our best
understanding of property crime is achieved when we combine the two theories.
There are two other possible outcomes that many researchers would take into consideration. First, the
possibility of an interaction effect between our independent variables (recall our discussion of interaction in
Chapter 4)�here we would consider the possibility that beyond the simple effects of poverty and angrowth
that there may be an additional effect when the two variables are combined. The most common interaction
term developed is the product between X1 and X2; thus, for example, there may be an additional gains in
explaining variability in Y when both X1 and X2 are simultaneously both very high or both very low; thus.
That is, the ultimate impact of poverty may only be known when we know the value of angrowth, and that
poverty� s greatest explanatory ability comes when knowing that angrowth is either very high (or very low).
A second consideration is whether the quadratic specification of the regression model we used when
angrowth was considered by itself needs to be carried over to our multiple regression equation that includes
poverty. We will leave both of these considerations for you to explore on your own (if and when the mood
strikes you!).
Okay, we will follow our usual command sequence in calculating our first multiple regression equation. But
3/4/2011 SPSS
first we need to make sure that you have deleted the District of Columbia from our states data file, as we
know that it represents an extreme outlier for property crime when we use annual growth rate as an
independent variable. If you have your States_Std_Ver.sav opened in SPSS and have already deleted the
District of Columbia, then proceed immediately to command sequence below. If you still have the District of
Columbia in your file, delete the entire row (but do not save this file, unless you do so under a different
name), then continue on with your regression commands.

Move pcrime over to the Dependent box, then angrowth over to the Independent(s) box; click on OK
Either record your findings or print them out
Repeat the above three steps, but this time move angrowth out of the Independent(s) box and replace
it with poverty
Finally, do your multiple regression by making sure that both angrowth and poverty are in the
Independent(s) box.
The top section of your Output Viewer for the multiple regression analysis should look like the screen image
below:

3/4/2011 SPSS
And the bottom half or your Output Viewer should appear as follows (note: depending on how you entered
the variables, poverty may be listed before angrowth in your Coefficients table, but all statistics will be the
same):
Wow! Very interesting, huh? Note that the Beta� s for both poverty and angrowth have maintained
themselves almost perfectly from the original bivariate regressions. And although angrowth is clearly the
best � explainer� of pcrime, we do make a small gain in explaining it when we take poverty into
consideration too. In short, we have developed a nice multivariable model in which each of our X� s truly
makes an �independent� contribution in explaining Y, though clearly one is of much greater importance
than the other.
Endnotes
1 Social Theory and Social Structure: Enlarged Edition (NY: Free Press, 1968), Chapter 6. Parts of this section and of the next were excerpted
from Gregg Lee Carter, Analyzing Contemporary Social Issues, 2 nd edition (Boston: Allyn & Bacon, 2001), Chapter 7. Used by permission.
2 See, for example, Travis Hirschi, Causes of Delinquency (Berkeley: University of California Press, 1969); Rodney Stark, Sociology , 5/e
(Belmont, CA: Wadsworth Publishing Company, 1994), p. 189; to
Return Charles
Top R. Pageet al., �The Myth of Social Class and Criminality: An
ofTittle,
Empirical Assessment of the Empirical Evidence,� American Sociological Review 43:5 (October 1978), pp. 643�656; and Charles R. Tittle
C i i l
3/4/2011 SPSS
and Robert F. Meier, �Specifying the SES/Delinquency to the Table Criminology
ReturnRelationship,� of Contents 28 (1990), pp. 271�299.
3
�mile Durkheim, Suicide: A Study in Sociology (New York: Free
Return Press, Home
to Main 1951 [org. 1897]).
Page
4 Clearly, only our second hypothesis has received strong empirical support.
5
Recall from Chapter 5 that the purpose of regression analysis is to calculate the slope of the theoretical relationship between X and Y.
The slope can be expressed in the raw measurement units (for example, �percentage of the population living in poverty�) or in
standardized measurement units. A standardized regression coefficient tells us how many standard deviations (expressed as a beta ) Y
will change given a one standard deviation change in X. Any variable can be standardized by converting its values into standard
deviation units from the mean.

Chapter 6. Special Research Topic II: Race & Gender Inequality

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 6. Special Research Topic II: Race & Gender Inequality

Uploaded by

Copyright:

Available Formats

3/4/2011 SPSS

�Gregg Lee Carter�

Harcourt College Publishers

Chapter 6. Special Research Topic II: Race & Gender Inequality

Explaining Sex and Race Differences in Status Attainment

More Data Analysis Techniques

One-Variable Analysis (Measuring Skewness and Kurtosis)

Click on Graphs / Histogram

…bryant.edu/…/Spss Chapters 6 and 7… 2/27

…bryant.edu/…/Spss Chapters 6 and 7… 3/27

Click on Analyze / Descriptive Statistics / Explore

Two-Variable Analysis (The t-test; Anova)

Open your Gss_1998_Std_Ver.sav data file. And�

Click on Analyze / Compare Means / Independent-Samples T Test

…bryant.edu/…/Spss Chapters 6 and 7… 5/27

…bryant.edu/…/Spss Chapters 6 and 7… 6/27

Click on Analyze / Compare Means / One-Way ANOVA

…bryant.edu/…/Spss Chapters 6 and 7… 7/27

Multivariable Analysis Using Dummy Variables

Yi = a +b1X1 i + b2X2 i � + bnXni + e i

where: Yi = the dependent variable at value i

b = the slope of the independent variable X

n = the total number of independent variables

…bryant.edu/…/Spss Chapters 6 and 7… 8/27

Click on Transform / Recode / Into Different Variable

Highlight race and move it over to the center box

If it does, then click on Continue, then on OK.

…bryant.edu/…/Spss Chapters 6 and 7… 9/27

Regression Model (I)

Click on Analyze / Regression / Linear

Finally, click on OK.

Your Output screen should look like the one below:

…bryant.edu/…/Spss Chapters 6 and 7… 10/27

Regression Model (II)

Click on Analyze / Regression / Linear

Return to Top of Page

Return to the Table of Contents

Return to Main Home Page

Chapter 7. Special Research Topic III: Explaining Crime & Deviance

develop better models.

More Data Analysis Techniques

Testing Three Hypotheses Based on Anomie Theory

Our hypotheses to be tested are:

Poverty is positively related to violent crime.

…bryant.edu/…/Spss Chapters 6 and 7… 16/27

Testing Three Hypotheses Based on Control Theory

Divorce is positively related to violent crime

One-Variable Analysis (Sorting)

Let� s use our murder/poverty regression as our basic example:

Click on Graphs / Scatter / Define

…bryant.edu/…/Spss Chapters 6 and 7… 18/27

Follow this command sequence to sort our cases by murder:

Under Sort Order, click on Descending, then on OK.

Your screen should now look like the image below:

…bryant.edu/…/Spss Chapters 6 and 7… 19/27

gotten rid of our outlier!

Two-Variable Analysis (Curve-Fitting)

…bryant.edu/…/Spss Chapters 6 and 7… 21/27

Click on Analyze / Regression / Curve Estimation

Finally, click on OK.

…bryant.edu/…/Spss Chapters 6 and 7… 23/27

Multivariable Analysis (Multivariable Effects)