Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 23

Chapter 14

Preprocessing the Data,

And Cross-Tabs
Figure 1: Histogram and Frequency Polygon
of Incomes of Families in Car Ownership Study
25

20

15

10

1 05k
55k

65k

85k

95k
1 5k

2 5k

3 5k

4 5k

7 5k
0k
Figure 2: Cumulative Distribution of Incomes
of Families in Car Ownership Study
120

100

80

60

40

20

105k
35 k

5 5k

75 k

9 5k
15k

25k

45k

65k

85k
0k
Family Income and Number of Cars Family Owns

Number of Cars

Income 1 or None 2 or More Total

Less than $37,500 48 6 54

More than $37,500 27 19 46

TOTAL 75 25 100
Number of Cars by Family Income

Number of Cars
# of
Cases
Income 1 or None 2 or More Total

Less than $37,500 89% 11% 100% 54

More than $37,500


59% 41% 100% 46
Family Income by Number of Cars

Number of Cars

Income 1 or None 2 or More

Less than $37,500 64% 24%

More than $37,500


36% 76%

Total 100% 100%

(Number of Cases) (75) (25)


Number of Cars and Size of Family

Number of Cars

Size of Family 1 or None 2 or More Total

4 or Less 70 8 78

5 or More 5 17 22

75 25 100
Total
Number of Cars by Size of Family

Number of Cars
# of
Cases
Size of Family 1 or None 2 or More Total

4 or Less 90% 10% 100% (78)

5 or More
23% 77% 100% (22)
Number of Cars by Income and Size of Family

Four Members or Less: Five Members or More:


Number of Cars Number of Cars
Total Number of Cars

1 or 2 or 1 or 2 or 1 or 2 or
Income None More None More None More
Total Total Total
Less than $37,500 44 2 46 4 4 8 48 6 54
More than $37,500 26 6 32 1 13 14 27 19 46
TOTAL 70 8 78 5 17 22 75 25 100
Number of Cars by Income and Size of Family

Four Members or Less: Five Members or More:


Number of Cars Number of Cars
Total Number of Cars

1 or 2 or 1 or 2 or 1 or 2 or
Income None More None More None More
Total Total Total
Less than $37,500 96% 4% 100% (46) 50% 50% 100% (8) 89% 11% 100% (54)
More than $37,500 81% 19% 100% (32) 7% 93% 100% (14) 59% 41% 100% (46)
Car Ownership for Small, Below Average Income Families

Number of Cars

Income 1 or None 2 or More Total

Less than $37,500 96% 4% 100% (46)


Percentage of Families Owning Two or More Cars by Income

Number of Cars

Income 4 or Less 5 or More Total

Less than $37,500 4% 50% 11% (6)

More than $37,500 19% 93% 41% (19)


Conditions That Can Arise with the Introduction of an Additional Variable into a Cross Tabulation

With the Additional Variable


Initial
Situation Change Retain
Conclusion Conclusion

I
A. Refine Explanation
Some
B. Reveal Spurious
Relationship
Explanation
II
C. Provide Limiting
Conditions

No
Relationship III IV
The Researcher’s Dilemma

True Situation

Researcher’s No Some
Conclusion Relationship Relationship

No Correct Spurious
Relationship Decision Noncorrelation

Some Spurious Correct Decision


Relationship Correlation if Concluded
Relationship is
of Proper Form
Appendix 14

Chi-Square Tests
Measures of Association for Nominal Data

Measures Appropriate for Nominal Data

* Contingency Table (Chi-Square)


* Contingency Coefficient
* Index of Predictive Association
Cross Tabulations

#Cars: 0 or 1 2+
Family Size:
70 8 78
4 or less
5 17 22
5 or more
100
75 25

Frequencies of Combinations of Row (i) and Column (j)


Cross-Tabs & Chi-Squares

#Cars: 0 or 1 2+
Family Size:
78 78%
4 or less
22 22%
5 or more
100
75 25
75% 25%

H0: Row variable independent of column variable;


No association between family size & #cars
analogous to: “no correlation”
If Family Size & #Cars are Independent:

#Cars: 0 or 1 2+
Family Size:
58.5 19.5 78 78%
4 or less

16.5 5.5 22 22%


5 or more
100
75 25
75% 25%

We’d EXPECT frequencies to be distributed “randomly”;


i.e., in proportion to the margins
Using the Statistical Definition of “Independence” to Calculate the Expected Frequencies

•If A & B are independent:


P(A1B1) = P(A1)P(B1)

•e11 = nP(A1B1) = 100 (78/100) (75/100)


= (78 x 75)
100
Chi-Square Formula

 Chi-square measures how much our data differ from what we’d expect (given
the hypothesis of independence)
 Are the row and column variables associated ?

2
r c (oij  eij )
X 
2

i 1 j 1 eij
Chi-Square for Our Data

X 2 = ( 70-58.5 ) 2 + ( 8-19.5 ) 2 + (5-16.5 ) 2 + (17-5.5 ) 2


58.5 19.5 16.5 5.5

= 2.261 + 6.782 + 8.015 + 24.046 = 41.104

Is this large?

df= degrees of freedom = ( r-1) ( c-1)

For our 2x2 table, df=1

critical value for X 2 with 1 df = 3.84 (.05)

X 2 = 41.104 exceeds 3.84.


Extension Beyond 2-Way Tables

 Three-way table:

Example: Family size x #Cars x household income

 Log Linear Models

You might also like