Comparing Two Sample Proportions and Measure of Association

Political Analysis
Comparing Two Sample Proportions

The same principles that apply to the comparison of two sample means also apply to the
comparison of two sample proportions. To illustrate the similarities, we consider another gender
gap hypothesis: In a comparison of individuals, women are more likely than men to approve of
Death Penalty.
Table 1.1 Approval of Death Penalty
Gender
Approve of Death Penalty for any reason? Female Male Total
Yes 44.3% 40.9% 42.7%
(302) (235) (537)
No 55.7% 59.1% 57.3%
(380) (340) (720)
Total 100% 100% 100%
(682) (575) (1,257)
Table 1.1 shows a cross-tabulation analysis of the dependent variable, whether or not
respondents approve of death penalty for any reason, by the independent variable, gender. Are
the data consistent with the hypothesis? It may seem so.
Reading across the columns at the “Yes” value of the dependent variable, 44.3 percent of the
females are in favor, compared with 40.9 percent of the males, a difference of 3.4 percentage
points. Expressed in proportions, the difference would be .443 minus .409, or .034.
Conventional labeling for sample proportions.
The ordinary letter p represents the proportion of cases falling into one value of a variable, and
the letter q represents the proportion of cases falling into all other values of a variable. The
proportion q, which is called the complement of p, is equal to 1 − p. So that we don’t confuse the
female sample statistics with the male sample statistics, let’s use different subscripts: p1, q1, and
n1 refer to the female sample; p2, q2, and n2 refer to the male sample.
For females: p1 = .443, q1 = .557, and n1 = 682. For males: p2 = .409, q2 = .591, and n2 = 575.
Table 1.2 Proportions approving the Death penalty, by Gender.
Gender Sample Complement of Squared
Proportion Sample Proportion standard error
(p) (q) (pq/n)
Female .443 .557 .000362
(682)
Male .409 .591 .000420
(575)
Difference in proportions .034
Sum of Squared standard error .000782
Standard error of the difference .028
To find the standard error of the difference between proportions, plug the six numbers into the
formula that follows: Standard error of the difference in proportions=p1q1/n1+p2q2/n2
The formula implies the following steps:
1. Multiply each proportion (p) by its complement (q) and divide the result by the sample size.
For the female sample: (.443 × .557)/682 = .000362. For the male sample: (.409 × .591)/575
= .000420.
2. Sum the two numbers from step 1. Summing the female number and the male
number: .000362 + .000420 = .000782.
3. Take the square root of the number from step 2.
In the example: .000782= .028
Table 1.2 presents in tabular format the relevant calculations for determining the standard error
of the difference between the proportions of females and males who favor allowing Death
Penalty. Notice that the right-most column is labeled “Squared standard error (pq/n).” By
multiplying p times q, and then dividing by the sample size, we are in fact deriving the squared
standard error of each proportion.
Thus, the derivation of the standard error of the difference in proportions is directly analogous to
the derivation of the standard error of the difference in means: the square root of the summation
of squared standard errors. In the current example, we found a difference in the sample
proportions of .034, with a standard error equal to .028.
Measures of Association
Measures of association are an additional resource for the investigator. A measure of association
communicates the strength of the relationship between an independent variable and a dependent
variable. Although statistical significance is usually the main thing you want to know about a
relationship, measures of association always add depth of interpretation—and they can be of
central importance in testing hypotheses.
Statisticians have developed a large number of measures of association. Are some preferred to
others? A preferred measure of association has two main features:
 First, it uses a proportional reduction in error (PRE) approach for gauging the strength
of a relationship. A PRE measure is a prediction-based metric that varies in magnitude
between 0 and 1. The precise value of the measure tells you how much better you can
predict the dependent variable by knowing the independent variable than by not knowing
the independent variable.
-If knowledge of the independent variable does not provide any help in predicting the
dependent variable, then a PRE statistic will assume a value of 0.
-If knowledge of the independent variable permits perfect prediction of the dependent
variable, then a PRE statistic will assume a magnitude of 1.
 Second, asymmetric measures are preferred to symmetric measures. A symmetric
measure of association takes on the same value, regardless of whether the independent
variable is used to predict the dependent variable or the dependent variable is used to
predict the independent variable. An asymmetric measure of association, by contrast,
models the independent variable as the causal variable and the dependent variable as the
effect. Because asymmetric measures are better suited to the enterprise of testing causal
hypotheses, they are preferred to symmetric measures, which are agnostic on the question
of cause and effect.
Two asymmetric PRE measures:
•Lambda is designed to measure the strength of a relationship between two categorical
variables, at least one of which is a nominal-level relationship.
•Somers’ D is appropriate for gauging the strength of ordinal-level relationships.
Lambda
Table 2.1 Choosing Presidential Candidates Opinions and Gender
Gender
Opinion on choosing Female Male Total
Presidential Candidates
Without knowledge of the independent
variable
More Difficult ? ? 2,726
Easier/Same ? ? 3,170
Total ? ? 5,896
With knowledge of the independent variable
More Difficult 1,664 1,062 2,726
Easier/Same 1,403 1,767 3,170
Total 3,067 2,829 5,896
 Prediction error without knowledge of the independent variable
Suppose you were presented with the upper half of Table 2.1 and asked to summarize the
relationship between gender and Choosing Presidential Candidates opinions. Obviously, this
cross-tabulation provides no information about the independent variable. You can, however,
identify the dependent variable: the distribution of responses for all 5,896 respondents.
Assume that, based on only this knowledge, you randomly drew respondents one at a time and
tried to guess opinions. What is your better guess, “more difficult” or “easier/same”? Because
more respondents think that they know who they’ll gonna vote, your better guess is the modal
response, “easier/same.”
In the long run, by guessing “easier/same” for each randomly chosen case, you are guaranteed
3,170 correct guesses. But you will record a missed guess for every case that is not in the modal
category: the 2,726 respondents who said “more difficult.”
In the lambda logic, this number, 2,726, measures prediction error without knowledge of the
independent variable.
Prediction error without knowledge of the independent variable is the number of errors made
when using the overall distribution of the dependent variable as a predictive instrument.
…
 Prediction error with knowledge of the independent variable
For each randomly picked female, the better bet is to guess “more difficult.” This stratagem will
guarantee you 1,664 correct hits. You will have some missed guesses, too: the 1,403 female
respondents.
For each randomly chosen male, your better guess is “easier/same.” Based on what you know
about the distribution of opinion among men, this approach will give you 1,767 hits, along with
1,062 misses.
All told, how many errors will there be? Adding the 1,403 errors for females and the 1,062
misses for males: 1,403 + 1,062 = 2,465.
In the lambda logic, this number, 2,465, measures prediction error with knowledge of the
Prediction error with knowledge of the independent variable is the number of errors made
when using the distribution of the dependent variable within each category of the
independent variable as a predictive instrument.
Now consider the formula for lambda: Lambda= (Prediction error without knowledge of the
independent variable−Prediction error with knowledge of the independent variable) Prediction
error without knowledge of the independent variable.
In the Table 2.1 example, there were 2,726 errors without knowledge of the independent variable
and 2,465 missed guesses when the independent variable was taken into account, improving the
prediction by 2,726 minus 2,465, or 261. The denominator, prediction error without knowledge
of the independent variable, translates error reduction into a ratio, providing the proportional part
of PRE.
For the Table 2.1 example: Lambda= (2,726−2,465)/2,726=261/2,726≈ .0957
=0.0957/.10
Is this relationship weak? Or is it moderate, or strong? In the analysis of social science data,
especially individual-level survey data, large PRE magnitudes (of, say, .5 or above) are
uncommon. Lesser values (of about .3 or lower) are more frequent.
In conclusion, according to this guideline, the lambda we obtained from Table 2.1 (lambda =
0.0957), is uncommon.
Somers’ D
 The values of an ordinal variable communicate the direction of a difference between two
cases.
 Direction of association that exists between an ordinal dependent variable and an ordinal
(The nominal measure of choice, lambda, tells you if this difference matters in helping to predict
differences on the dependent variable.)
You could use Somers' d to understand whether there is an association between Voter’s Decision
and Presidential Candidate’s opinion on recent issues (e.g. Vote-buying)
(i.e., the ordinal dependent variable is "Voter’s satisfaction", measured on a five-point scale from
"very satisfied" to "very dissatisfied", and the ordinal independent variable is " Presidential
Candidate’s opinion on recent issues ", measured on a three-point scale from "above average" to
"below average").

Comparing Two Sample Proportions and Measure of Association

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Comparing Two Sample Proportions and Measure of Association

Uploaded by

Copyright:

Available Formats

Political Analysis

Comparing Two Sample Proportions

You might also like