Download as pdf
Download as pdf
You are on page 1of 5

8

Multivariate analysis of crosstabs: Elaboration



Chapters 5-7 analyzed the relationship between two variables. In those chapters it was assumed that any association observed in the data between two variables is due to a simple and direct relationship. A strong association in a bivariate table, however, does not necessarily mean that a simple direct relationship ill facr exists; Ibis is only how we have irtlerpreled the data. There may be more complex relationships buried in the data, hut we have no! dug deep enough (0 find them,

The simples! way of extending - elaborating - tile relationship discovered in a crosstab is to look at the possible impact that a third variable has 00 (he original bivariate associatioo, Depending on the outcome, of this elaboration we may have to adjust our model of the relationship between the original two variables to take into account we influence of the third variable. There are three possible conclusions we can reach when we introduce a third variable into the analysis: .

I, a direct relationship still. exists Ithe third variable has no effect); or 2. either a spurious or intervenlng relationship exists; or

J. a conditional relationship exists.

We will investigate these possible outcomes by looking at examples of each in rum.

Direct relationship

We begin with an example where the original bivariate relationship does not change when we introduce a third variable. When the introduction of a third variable does not alief the original bivariate relationship, this will provide evidence that the simple direct model is the appropriate way of characterizing the relationship.

For example, we may have data on income and TV watching. OUT theoretical model argues thai income directly affects the amount of TV someone watches by affording them more or less leisure time. To express this we arrange the data in a crosstao and calculate a measure of association such as gamma (Table 8.1). These descriptive statistics tell us that there is a moderate to strong, positive relationship.

Table 8.1 TV watching by income level

TV watching

Total

Low

210

HigJl

292

502

G'Hlrlln=-().47

When we argue that there is a direct relationship between two variables in this way we are effectively arguingthat the relationship will be the same regardless of any other variable that may cause cases to vary from each other, In this example, we think income affects TV watching in the same way and 10 the same degree, regardless of any other variable that may cause cases LO vary, such as sex, age, hair color, etc. This direct bivariate model, however,

Multivariate analysis of crosstabs: Elaboration

III

riuly appear :0 be overly simplistic. Surety there are other variables which impact Oil the

1I0Wlt of TV someone watches. Another researcher, for example, rr:ay feel :I::al level of

j~ucatiOl! also affects the amount ofTY watched by individuals. .. .

. To assess the possible impact tbis new variable (level of education) nas on the observed rela{jo[]ship between income and amount of TV watched, we divide the sample :1l~C two sub,

oups: (hose who have no post-secondary education and those who have completed some ~st-secol](lary education, ln technica. terms education level is a control variable.

The effect of this control variable is to generate a separate crossrab for each of the subgroupS defined by the control variable. In this example, we fir,st .take only those cases witl, no post-secondary education and create a crosstab between their income and TY w.a;c1l1ng, ignoriog those cases with scme post-secondary ed~cation. We then take only cases With some pes/-secondary education and create a crosstab oetween their mccme ar:d TY watching.

ignoring people with 00 post- secondary education. .

The resulting crosstabs are called partial tables and we generate as many partial tables as there are categories jar the control variable (Table 8.2, Table !!.3). Here the control variable, 'Education level', only has two categories; we therefore generate :wo partial tables. (If we bad three categories for the control variable, say 'no post-sccoodary, 'some post-secondary", 'a 101 of post-secondary', we would generate three partial ta bles.)

Table 8.2 TV watcbing by income level: conlrolling for <"duc3liolllevel (DO poSl-sewndary educa.liOIl)

TV watching Income
Low Hi h TDtal
Low 78 22 100
57% 31';';
Higu 58 48 )06
43% 69%
Total 1)6 70 206
Gamma ~ 0.49 Table 8_' TV watching by income level: controlling [or education level (pest-secondary education)

TV watcblDg Income
Low Hi h Total
Low )7 7J 110
.').')% 31%
High 30 156 186
45% 68%
Total 67 225 296
Gamma ~ 0.45 With this outcome we can see thai the original relationship is reproduced almost exactly for each partial table. The, value for gamma for each of the rwo partial tables is almost the same as that for the original table, before we controlled for education, In other words, regardless of the level of education, the relationship between income and TV watching still holds. The direct relationship we first observed is preserved even after controlling for the third. variable. No matter how eases vary according to education level, the direct bivariate relationship remains basically the same, so we will 1'.01 alter our initial model that characterized income and TV watching in a direct relationship.

112

Statistics for Research

Elaboration of crosstabs using SPSS

We can add control variables when generating 1\ crosstab (Table 8.4, Figure 8.1) as pan of the Analyze/Descrtpnve Statistlcs/Crossfabs COITUl)3nd we introduced in Chapter 5. Note that Steps g and 9 are only optional when elaborating crosstabs, bUI the additional infonnalioD they provide willhelp us interpret the results (Figure S.2).

Taote 8.'1 Crosstabs with control variables on SPSS (file: Cb8.sav)

SPS S command/action

I From the menu ·selcc'AQ.IYl<IDescriptlve S tadsc i cs/C rosstabs

Click on TV w'CctJlng

3 Click on ~ pointing to the target list headed Row(,):

This brings up the Crosstabs dialog box

This highlights TV \\'31,chlilg

This pastes TV .... Cebing into the RQw(.'),. !argelliSl Til is h igh I ights Income

Click On Income

5 Click on , poinlillg 10 the Illrg.' Jist headed Coiumms): This pastes 10C001. into the Cclumnts): large. lis

6 Click on Educ~!loo level

This highlights Educatfon level

Click on , pointing to the ",c<gel list below Layer J of I This pastes Education level into Ibe t",g~t liSl that contains the control variable. A crosstab will be generated for each value of the variable ill this list

8 Click on the Slsti s ties burton and s e .Ie'1 Gamma

This will produce gamma for each partial tab!e

9 Click On Ill. CclL5 buuon and select Column percentages This willgenerate the relative frequencies (0' each partial table based on the column totals

10 Click on 01<

FIgure 8.1 The Crosstabs dialog box

The table in Figure 8.2 is actually two crosstabs combined into one. The 'first half of the table is the crosstab of income and TV warchiug ji» cases with no post-secondary education, and immediately below it is the crosstab for those cases with post-secondary education. The percentage of cases watching a.certain level of TV is the same for all income categories, regardless of education level.

This is reinforced by the values for gamma presented in the Symmelrlc Measures table.

These gamma values are very similar to the value calculated Oil the unsegmented data in Table 8.1. The relationship between income and TV watching retains its strength and direction for each of the partial tables.

Multivariate analysis of crosstabs: Elaboration

113

crosstabs

TV"""<:hIog' h:omo' ~ _I cressr ....... IDIl
Income
EI1U~_EitiOf'llewi !.ow Hi!J:n 'tcrat
No, eost-seccroarv TV v,sb:'n ing '-"'" Co,Jnl 18 11 100
10 wllt1~n ll'rl;lJme ~H'" 1".4% .a.~');
KI~~ count ~~ 'ij 1~~
~ wt,1;nlfl ~H: orne 'a'l> 6M'IIl 6U1,
Total Cl')~"'t 136 )0 )~6
% wNhtrJ, III [:;orlle_ lOOm; 1'0.0% lOU.Or.
post-,s~nd~t)' lV~ntenlfl.g '- count 31 n 110
.... wi11lI(llnto~ $~.~% :3:',9% ~1.ljb
Hat-. count 30 1'~!J lS6
~ w~ltlln Intom.e ,~.S% 69.!'" 61.8'1>
ToMI (OUlll 61 ng 22.
'J.I WiIJllrllr)(;om€ 100.0% 100.0% 10Q,Q'L A5ym~_ hlI"~ T' p,p"1W-
EdU'c;.anQn(~r Valul} SId. EIr"? S'a.
No r;'ost-seCOr'1M"Y o mlnar by Ofl;ff:1'l ~I o.~mmz .491 _118 ]J,'57 .{lOO
N ofval\d' Ci$.ElS 2~B
Po-Bl-:sec:Drujart O,.(II~a( b1 O(l1in~1 G.ammJ .<SO .'·'1 Uf7 .001
N.ofV;:3·1fdCiiI·:).e:s 28~ 3. Not a:8"Sl.~mlng the RlJfI JIYlICJjhesl5,

D_ useo 1I"le liIevmPl.OtJc: S13t'H1":;i-r\"l.:!rr'Qr a$~l,.Irn!n:u: me nun ~o~ests.

Figure 8.2 SJ>SS Crossrabs command output with a control variable

Partial gamma

Assume that when we introduce leve. of education into the analysis we instead obtain the

fo.lowing partial tables (Tables 8.5 and 8.6), rather than IDOse in Tables 8.2 and 8.3. ,

Table 8.5 TV waldting by income level: controlling for education IC\lcl (no posl-=ndary ed\l9itioo)

rv watching Income
Low Hi ToW
Low 102 50 152
75% 71%
High ~ 20 54
25% 29%
Total 136 70 206
Gamma 0.Q9 Table 8.6 TV werching by income level: controlling fOI educationlevel (post-secondary education)

TV wa:c~ing

Income

Low

Hi h

Total

low

45

19% 20%
~ligh 54 184 2JS
8:r'/ts 80"10
Total 67 729 '296
Gamma -0007 The relationship between income and TV watching that we observed in ~e original lab:e has suddenly disappeared for each of the partial tables. II is clear to the naked, eye that there IS 00 association tc speak of between income and TV watching, once we have controlled for

114

STatistics fo.- Research

Multivariate analysis of crosstabs __ Elaboration

115

education level. The original association we found has been 'washed out' by the introductioo of (he control variable. This impression IS reinforced by (be gamma values, which are now negligible in strength, unlike the combined gamma for the original bivariate table. In the original table, where the cases are not separated by level of education, gamma is 0,47. BUI the gamma values for each of the partial tables are very close to zero.

A more precise way of reaching this conclusion is (0 calculate the partial gamma for the data. The partial gamma is 'built-up'from the relationships embodied in the partial tables, rather than being calculated directly from the unsegmented data in Table 8.1. As we discussed in Chapter 7, gamma, is calculated OIl the basis of the number of concordant pairs and the number of discordant pairs. Concordant pairs, you remember, arc pairs C1f cases that are ranked the same on each of the two variables, and thereby embody a positive relatio!lSbip between the variables. Discordant pairs on the other band are pairs of cases that are ranked differently on the two variables, reflecting a negative relationship between tbc variables.

If we add the concordant pairs across both partial tables and the discordant pairs across both partial tables we can calculate the partial gamllla, which measures the direct relationship between the two variables we started with, controlling for the third · .. oriable. It is calculated by summing the concordant and disc-ordant pairs across the partial tables. We still use all the cases in determining the partial gamma, but we an: now doing it after separating the cases into two separate partial tables.

The process of calculating the partial gamma for these data is presented in Table 8.7.

Table 8.1 Calculating partial gamma

Table

Concordant pairS

Discordant paUs

Gamma

The classic example of II spurious relationship is the observed association between the preseoc.e of storks in ao area and the birth rate (a reference to II study of this relationship appears in Chapter 6). Where there are many storks there is also a higher birth rate: the storks must be responsible for delivering babies! Of course this is II ridiculous argument and highligbts tile difference between a statistical relationship and a causal relationship. 'The observed relationship was explained by arguing that the same factors that caused tbe number of storks to vary across regions a.so caused the birth rate to va.)'. Specifically, rural areas attract storks, and they also attract people looking to s(2.J': a family.

In other words, the relationship between tile number of storks and t~c birth rate in a region is spurious. It docs not really exist b.J1 :s an artefact of two other relationships: t!IC relationship between the type of region (rural, non-rural) and the number of storks, and the type of region and the birth rate,

Another researcher may look at the results of our elaboration of the crosstab between income and TV watching and instead cbaracterizc the relationship as in Figure 8A.

In(<)n1e-----~E4lJcatj()n levci-----TV watching F1gure 8,4 Av intervening relationship

This researcher could make the argument that nigher income earners can afford to undertake post-secondary education and then this affects bow much TV they watch. Whether you think this argument is a good one or no! is a matter for theoretical debate, Whether it is a more appropriate explanation of the results of the elaboration rhac tbe model of spurious relationship is open to discussion, bUI tbe statistical analysis itself =01 decide the issue. The statistical analysis merely indicate thai one of these models best explains the results.

Cooditional relatiouship

Assume that a researcher is interested in the extent to which patients respond to a program of exercise aimed at improving their cardiovascular system. The researcher organizes patients into low exercise and high exercise groups and observes whetherthere is 2.Uy improvement in their cardiovascular systems (Table 8.8).

A visual inspection of Table 8.8, looking particularly at the (shaded) modal cells for each column, suggests that there is a strong, positive relationship between the variables, The exercise program does seem to work. To reinforce this impression the researcher calculates' gamma, which produces a value ofO.68.

Table 8..8 Cardiovascular improvement by exercise Ievel

Total

35

Original bivariate table Partial table 1

Partial table 2

Total across p;aniat iables

2M, 115 - 23.4~O 20:< 102 - 2040 []ox! S.t .. 2391 ,00), 'J9( - 44 32

88><95 - 8360 ;<,5Q- [700 ~'x4S- 2430

"00.2')004130

0.47 0.1)9 -0.07 0.04

Improvement

The partial gamma value for these data is only O.C4, indicating :I:a: there is very little direct relationship between income and TV watching, once we add level of education as a control.

Spurious or intervening relationship?

When the partial gamma is much lower than the original gamma calculated on the combined crossrab we should conclude tbat there is either a spurious relationship or Intervenlng relationship between the first two variables, Before explaining each of these types of relationship, we need [0 point out that deciding which one explains the results of the elaboration is a theoretical and not a statistical iss ue . Having found thai the original relationship disappears after elaborating a crosstab, it is up to us to decide how the three variables fit together, based on our understanding of how tile world operates.

We might, [or example, believe that the model represented in Figure S.3 best explains the results we just analyzed.

_/.lncome

Education fevel<'

.. "">,

------TV watching

figure 8.3 A spurious relationship'

There is a spurious relationship between income and TV watching in that the relauousbip we originally observed between them (Table 8.1) does not exist; it is only a statistical outcome based 011 their respective relationships with the control variable. Education separately affects income and TV watching, but the lane. two variables are not directly related to each other.

No

49

Yes

The researcher could leave the results here, and conclude that a direct reiatiocship bas been observed between the independent variable (level of exercise) and the dependent variable (improvement level). However, the researcher believes that the actual relationship is more complex than this, and that there may be other factors left out of this analysis that may determine whether a patient's cardiovascular system improves. Inparticular, the researcher believes that whether a person has been a regular smoker will affect their chances of responding to the exercise program. The researcher therefore generates the crosstabulauoa, this time controlliog for smoking level (Table 8.9 and Table 8.10).

1]6

Statistics for Research

Table 11.9 Cardiovascular improvement by exercise level: smokers only

improvement Exercise ! eve i
Low Hi~h
No 28
74% 70%
Yes, to
16'" 3()~Q
TOI.:II 38 10
Garnm:l - 0.09 Total

35

13

Table Il,IO Cardrovascular improvement by exercise level: uon-smokers only

Multivariate analysis of crosstabs: Elaboration

117

The' combined results for all 1000 people surveyed is presented ill Table 8.11. Tbis table illustrates a moderate association between intelligence, as measured by IQ. and Income, and wight lead to ac interpretation that variation in iotelligcncc causes the variation in income levelS. People's earning capacity is 10 some extent predetermined 'by their respeceive IQs.

In order to avoid such a conclusion, we might argue that tbe IQ test as 8, measure of iolelligeDce is biased. ln particular we may feel that IQ scores are themselves a reflection of social class background, and :.his variable is a key determinant of income. To assess this we COllsrruct two partial Cables, dividing the 1000 respondents into high social class and low social class sub-groups, producing the results in Tables 8_12 and 8.13.

Low
No 10
71%
Yes 4
29%
Toml 14
Garnen; 0,84 Table 8.11 Income and inlelligence
Total IQ lnoome
l<>w Hi To!,1
14 Low 165 95 260
36% 18%
22 High Z9~ 445 740
641'/6 8"1<:1-'
~ .... o
36 Total 460 540 rooc
G=-O.48 J m provem ent - - Exerci.r;c le\'e t

Hieh

4 l8%

18 82%

22

When comparing these partial tables against the complete table we started with il is clear thai the relationship works differently depending on smoking history. Regular smokers gained 00 improvement in their health levels as a result of the exercise program, But for non-smokers the relationship is even stronger than was evident in the complete table, a result that was 'diluted' by we inclusion of'rhe smokers for whom the relanonship does not seem to hold,

This is reinforced by the gamma values for each of these tables_ For non-smokers, the value of gamma is 0.84, as opposed to 0.68 for the table as a whole. For smokers, thou gil, there is practically 00 benefit from the exercise program. We can see that in gauging the effect of the control variable the measure of association is extremely useful, since il quantifies the changes that are brought about wheo the control variable is added,

A5 a result of this observation, the researcher changes the model which may lie the variables together, Instead of a simple one-way direct relationship, the researcher depicts the associauon In terms of a condtnenal rclaricnshlp, as in Figure 8.5.

~ Smoker------7"/~~-_ No improvement +-;

Non-smok..er--_·_ Strong improvemem Figure 11.5 A conditional relationship

A conditional relationship is sometimes called inter actton, Interaction exists where the relationship between two variables depends OD the particular values of a third variable, Sometimes we might find that the relationship is reversed depending on the value of the control variable; for one sub-group the relationship might be positive, whereas for another sub-group tile relalionsl:tip mighl be negative.

Example

We want 10 investigate the relationship between intelligence and income. Intelligence is measured by a standard TQ test and respondents arc divided into low and high IQ. Respondents are also divided into low or high income groups, depending on whether tbey cam below or above the median Oillional income level.

Table 8.12 Income and intelligence: hrgh social class only

IQ Income

__________________ ~LDw ~H~j"~.h--------------------~'o~~~l~-------

}O 60 80

Low

ISO;'

I~%

High

380 86%

470

90 82%

550

1.10

TQlal

Garno,' ~ O.J?

Table 8.13 Income and intelligence: low social class only

IQ - illcorne

LDw Hi,h

Total

Low 145 )5

180

41% J~%
High 205 65
59"/0 65%
Total. 350 100
Ga=~O_ll 270

450

We can see th.at the strength of the bivariate relationship is greatly diminished once we control for social C!2.Ss. There is little difference in the pattern of relative frequencies across the two 'partial tables. In fact, the partial gamma calculated on the basis of the partial tables is only 0..15. We have either a spurious relationship or an intervening relationship.

S~l1troary

We have looked at the way in which the introduction of a third variable may a.ter a relationship we bad previously observed between two, variables. Indeed, tae story can gel even more complex when we allow for the impact of even more variables on toe original bivariate relationship, Taking into account the possible effects of other variables involves mutttvariate analysis, and 'We' have only just skimmedthe surface in this chapter.

118

Stansttcs jar Research

Multivariate analysis oj crosstabs: Elaboration

119

To help in drawing conclusions from the elaboration of crosstabs, Table 8.14 provide~ a useful guide 10 decision making (adapted from J. Healey, 1993, Statistics: A Tool for Social Research, Belmont, CA: Wadsworth, p. 428).

Table 8.14 Possible results when controlling (or a third variable

Partial tables whorl compared Mode] Implications for Likely next step in

wiHI crosSljb shew: {urrh\;r analysis st.!li,slical f)[lJlysis

Same relationship between X Direct Disregard control Select another control

and Y reliltionshlp variable variable to lest further the directness of rho relauonship

Theoretical impli"'llioos -

Model thai X causes Y in a direct way is supported

8.4

What conclusion should be drawn about the relationship, if any, between these three variables?

The following tables are based on a study of the I:kelihood of US courts :0 impose the death penalty, based on the racial characteristics of the victim and the defendant (M. Radelet, 1981, Racia I characteristics and the imposition of !.he death pena.ry _ American Sociological Review, 46, pp. 91 ~27).

Weaker or no relationship between X and Y

Spurious Incorporate

relationship control variable

Focus 011 the relationship between these three variables

Medel that X <a use, r is not supported

All cases
Death pen.!!)'
While
N<J 1&4
y", }O
Total 214
White defendant only
Deatl., penally
While
No 132
Yes 19
Total 151
Black defendant only
Death penalty
White
No 52 Vrcnm

Bla<k

Total

or

Int€rventrtg incorporate retar,ionship control variable

focus On the relationship between tbese (Mee variables

Mode! Lh ill X causes Y is partially sopported bui rnus: be revised to take conrrot into account

290 36 326

Vtctiru

Black

TOI.:lI

MiXed relationships

Jntcracuon/ Incorporate condiriona) control variable rela tiona h j p

Analyze sub-groups based Mode! that X causes r

on control variable partiaHy supported hut must

s.:parately be revised 10 take control into account

9 o 5'

141 19 160

Exercises

8.1 A study finds a strong positive relationship between a child's shoe size ar.d the chiid's skills at mathematical problem solving, Explain.

8.2 What conclusion would you draw about the relationship between X and Y based on !he following elaboration?

All cases
I' x
I 2 Total
I 177 146 323
2 51 346 397
To",1 228 492 720
C:lntrolling for C( I)
Y X
I 2 Tot.,!
153 52 205
44 123 167
lotal 197 )75 372
Controlling for C(2)
y x
I 2 T<H.al
24 94 II S
7 223 230
Towl 31 J 17 }4& Victim -:;--:-;- _

----:c~-----------.B:;';I':-"c::i':k Total

97 !49

y.. II 6 17

Total 63 10) 16]

W!:iat conclusions can you draw about tae relationship between the race of the victim, the race of the defendant, and likelihood to impose the death penalty?

8.3 An investigation of the relationship between age, concern for the environment, and political affiliauon produces the following gamma values:

Gamma (age and concern for the environment): -{l_57

Gamma (age and concern for the environment, liberals only): --{j.22 Gamma (age and concern for the environment, oonservetives ooly): --{j.67 Partial gamma: -<US

You might also like