Professional Documents
Culture Documents
Ebook Categorical Data Analysis For The Behavioral and Social Sciences 2Nd Edition Razia Azen Online PDF All Chapter
Ebook Categorical Data Analysis For The Behavioral and Social Sciences 2Nd Edition Razia Azen Online PDF All Chapter
Ebook Categorical Data Analysis For The Behavioral and Social Sciences 2Nd Edition Razia Azen Online PDF All Chapter
https://ebookmeta.com/product/data-analysis-for-the-social-
sciences-integrating-theory-and-practice-1st-edition-douglas-
bors/
https://ebookmeta.com/product/statistics-volume-3-categorical-
and-time-dependent-data-analysis-3rd-edition-kunihiro-suzuki/
https://ebookmeta.com/product/data-analysis-for-social-science-
elena-llaudet/
Behavioral Data Analysis with R and Python: Customer-
Driven Data for Real Business Results 1st Edition
Buisson
https://ebookmeta.com/product/behavioral-data-analysis-with-r-
and-python-customer-driven-data-for-real-business-results-1st-
edition-buisson/
https://ebookmeta.com/product/international-encyclopedia-of-the-
social-behavioral-sciences-second-edition-james-d-wright/
https://ebookmeta.com/product/applied-categorical-and-count-data-
analysis-second-edition-wan-tang-hua-he-xin-m-tu/
https://ebookmeta.com/product/data-analytics-for-the-social-
sciences-applications-in-r-1st-edition-garson/
“This book fills an important need for a practitioner-oriented book on categorical data analyses.
It not only could serve as an excellent resource for researchers working with categorical data,
but would also make an excellent text for a graduate course in categorical data analysis.”
—Terry Ackerman, University of North Carolina–Greensboro, USA
“A much needed book . . . it fills a significant gap in the market for a user-friendly categorical
data analysis book. . . . Anyone wishing to learn categorical data analysis can read this book . . .
The integration of both SPSS and SAS . . . increases the usability of this book.”
—Sara Templin, University of Alabama, USA
“An accessible treatment of an important topic. . . . Through practical examples, data, and . . .
SPSS and SAS code, the Azen and Walker text promises to put these topics within reach of
a much wider range of students. . . . The applied nature of the book promises to be quite
attractive for classroom use.”
—Scott L. Thomas, Claremont Graduate University, USA
“Many social science students do not have the mathematical background to tackle the material
covered in categorical data analysis books. What these students need is an understanding of
what method to apply to what data, how to use software to analyze the data, and most
importantly, how to interpret the results. . . . The book would be very useful for a graduate
level course . . . for Sociologists and Psychologists. It might also be appropriate for Survey
Methodologists.”
—Timothy D. Johnson, University of Michigan, USA
“This book does an admirable job of combining an intuitive approach to categorical data
analysis with statistical rigor. It has definitely set a new standard for textbooks on the topic for
the behavioral and social sciences.”
—Wim J. van der Linden, Chief Research Scientist, CTB/McGraw-Hill
Categorical Data Analysis for the
Behavioral and Social Sciences
Featuring a practical approach with numerous examples, the second edition of Categorical
Data Analysis for the Behavioral and Social Sciences focuses on helping the reader develop a
conceptual understanding of categorical methods, making it a much more accessible text than
others on the market. The authors cover common categorical analysis methods and emphasize
specific research questions that can be addressed by each analytic procedure, including how to
obtain results using SPSS, SAS, and R, so that readers are able to address the research questions
they wish to answer.
Each chapter begins with a “Look Ahead” section to highlight key content. This is followed
by an in-depth focus and explanation of the relationship between the initial research question,
the use of software to perform the analyses, and how to interpret the output substantively.
Included at the end of each chapter are a range of software examples and questions to test
knowledge.
New to the second edition:
• The addition of R syntax for all analyses and an update of SPSS and SAS syntax.
• The addition of a new chapter on GLMMs.
• Clarification of concepts and ideas that graduate students found confusing, including
revised problems at the end of the chapters.
Written for those without an extensive mathematical background, this book is ideal for a
graduate course in categorical data analysis taught in departments of psychology, educational
psychology, human development and family studies, sociology, public health, and business.
Researchers in these disciplines interested in applying these procedures will also appreciate
this book’s accessible approach.
Razia Azen is Professor at the University of Wisconsin–Milwaukee, USA, where she teaches
basic and advanced statistics courses. Her research focuses on methods that compare the
relative importance of predictors in linear models. She received her MS in statistics and PhD
in quantitative psychology from the University of Illinois, USA.
Cindy M. Walker is President and CEO of Research Analytics Consulting, LLC. Previously,
she was a professor at the University of Wisconsin–Milwaukee, where she taught basic
and advanced measurement and statistics courses. Her research focuses on applied issues
in psychometrics. She received her PhD in quantitative research methodologies from the
University of Illinois at Urbana–Champaign, USA.
Categorical Data Analysis for the
Behavioral and Social Sciences
Second Edition
Second edition published 2021
by Routledge
52 Vanderbilt Avenue, New York, NY 10017
and by Routledge
2 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN
Routledge is an imprint of the Taylor & Francis Group, an informa business
© 2021 Razia Azen and Cindy M. Walker
The right of Razia Azen and Cindy M. Walker to be identified as authors of this work has
been asserted by them in accordance with sections 77 and 78 of the Copyright, Designs and
Patents Act 1988.
All rights reserved. No part of this book may be reprinted or reproduced or utilised in any
form or by any electronic, mechanical, or other means, now known or hereafter invented,
including photocopying and recording, or in any information storage or retrieval system,
without permission in writing from the publishers.
Trademark notice: Product or corporate names may be trademarks or registered trademarks, and
are used only for identification and explanation without intent to infringe.
First edition published by Routledge 2011
Library of Congress Cataloging-in-Publication Data
Names: Azen, Razia, 1969– author. | Walker, Cindy M., 1965– author.
Title: Categorical data analysis for the behavioral and social sciences / Razia Azen and Cindy
M. Walker.
Description: Second edition. | New York, NY : Routledge, 2021. | Includes bibliographical
references and index. |
Identifiers: LCCN 2020051576 (print) | LCCN 2020051577 (ebook) | ISBN
9780367352745 (hardback) | ISBN 9780367352769 (paperback) | ISBN 9780429330308
(ebook)
Subjects: LCSH: Social sciences—Statistical methods.
Classification: LCC HA29 .A94 2021 (print) | LCC HA29 (ebook) | DDC 300.72/7—dc23
LC record available at https://lccn.loc.gov/2020051576
LC ebook record available at https://lccn.loc.gov/2020051577
ISBN: 978-0-367-35274-5 (hbk)
ISBN: 978-0-367-35276-9 (pbk)
ISBN: 978-0-429-33030-8 (ebk)
Typeset in Bembo
by Apex CoVantage, LLC
Access the Support Material: www.routledge.com/9780367352769
To our students: past, present, and future
Contents
Prefacex
2 Probability Distributions 7
Appendix305
References307
Index309
Preface
A Look Ahead
There are many statistical procedures that can be used to address research questions in
the social sciences, yet the required courses in most social science programs often pay lit-
tle to no attention to procedures that can be used with categorical data. There are good
reasons for this, both pedagogical and conceptual. Statistical procedures for categorical
data require a new set of “tools”, or a new way of thinking, because of the different
distributional assumptions that must be made for these types of variables. Therefore,
these procedures differ conceptually from those that may already be familiar to readers
of this book, such as multiple regression and analysis of variance (ANOVA). Nonethe-
less, acquiring a conceptual understanding of the procedures that can be used to analyze
categorical data opens a whole new world of possibilities to social science researchers. In
this chapter, we introduce the reader to these possibilities, explain categorical variables,
and give a brief overview of the history of the methods for their analysis.
measurement scales that have the property of equal intervals and those that have the property
of having an absolute zero.
A measurement scale has the characteristic of having an absolute zero if assigning a score of
0 to persons or objects indicates an absence of the attribute being measured. For example, if
a score of 0 represents no spelling errors on a spelling exam, then number of spelling errors
would be measured in a manner that had the characteristic of having an absolute zero.
Table 1.1 indicates the four levels of measurement in terms of the four characteristics just
described. Nominal measurement possesses only the characteristic of distinctiveness and can
be thought of as the least precise form of measurement in the social sciences. Ordinal meas-
urement possesses the characteristics of distinctiveness and magnitude and is a more precise
form of measurement than nominal measurement. Interval measurement possesses the char-
acteristics of distinctiveness, magnitude, and equal intervals and is a more precise form of
measurement than ordinal measurement. Ratio measurement, which is rarely attained in the
social sciences, possesses all four characteristics of distinctiveness, magnitude, equal intervals,
and having an absolute zero and is the most precise form of measurement.
In categorical data analysis, the dependent or response variable, which represents the char-
acteristic or phenomenon that we are trying to explain or predict in the population, is meas-
ured using either a nominal scale or an ordinal scale. Methods designed for ordinal variables
make use of the natural ordering of the measurement categories, although the way in which
we order the categories (i.e., from highest to lowest or from lowest to highest) is usually irrel-
evant. Methods designed for ordinal variables cannot be used for nominal variables. Methods
designed for nominal variables will give the same results regardless of the order in which the
categories are listed. While these methods can also be used for ordinal variables, doing so will
result in a loss of information (and usually loss of statistical power) because the information
about the ordering is lost.
In this book we will focus on methods designed for nominal variables. The analyses we will
discuss can be used when all variables are categorical or when just the dependent variable is
categorical. The independent or predictor variables (used to predict the dependent variable)
can usually be measured using any of the four scales of measurement.
Problems
1.1 Indicate the scale of measurement used for each of the following variables and explain
your answer by describing the probable scale:
a. Sense of belongingness, as measured by a 20-item scale.
b. Satisfaction with life, as measured by a 1-item scale.
c. Level of education, as measured by a demographic question with five categories.
1.2 Indicate the scale of measurement used for each of the following variables and explain
your answer by describing the probable scale:
a. Self-efficacy, as measured by a 10-item scale.
b. Race, as measured by a demographic question with six categories.
c. Income, as measured by yearly gross income.
1.3 For each of the following research scenarios, identify the dependent and independent
variables (or indicate if not applicable) as well as the scale of measurement used for each
variable. Explain your answers by describing the scale that might have been used to meas-
ure each variable.
a. A researcher would like to determine if boys are more likely than girls to be profi-
cient in mathematics.
b. A researcher would like to determine if people in a committed relationship are more
likely to be satisfied with life than those who are not in a committed relationship.
c. A researcher is interested in whether females tend to have lower self-esteem, in terms
of body image, than males.
d. A researcher is interested in predicting religious affiliation from level of education.
1.4 For each of the following research scenarios, identify the dependent and independent
variables (or indicate if not applicable) as well as the scale of measurement used for each
variable. Explain your answers by describing the scale that might have been used to meas-
ure each variable.
a. A researcher would like to determine if people living in the United States are more
likely to be obese than people living in France.
b. A researcher would like to determine if the cholesterol levels of men who suffered
a heart attack are higher than the cholesterol levels of women who suffered a heart
attack.
c. A researcher is interested in whether gender is related to political party affiliation.
d. A researcher is interested in the relationship between amount of sleep and grade
point average for high school students.
1.5 Determine whether procedures for analyzing categorical data are needed to address each
of the following research questions. Provide a rationale for each of your answers by
6 Introduction and Overview
identifying the dependent and independent variables as well as the scale that might have
been used to measure each variable.
a. A researcher would like to determine whether a respondent will vote for the Republican
or Democratic candidate in the US presidential election based on the respondent’s
annual income.
b. A researcher would like to determine whether respondents who vote for the Republican
candidate in the US presidential election have a different annual income than those
who vote for the Democratic candidate.
c. A researcher would like to determine whether males who have suffered a heart attack
have higher fat content in their diets than males who have not suffered a heart attack
in the past six months.
d. A researcher would like to predict whether a man will suffer a heart attack in the next
six months based on the fat content in his diet.
1.6 Determine whether procedures for analyzing categorical data are needed to address each
of the following research questions. Provide a rationale for each of your answers by iden-
tifying the dependent and independent variables as well as the scale that might have been
used to measure each variable.
a. A researcher would like to determine whether a student will complete high school
based on the student’s grade point average.
b. A researcher would like to determine whether students who complete high school have
a different grade point average than students who do not complete high school.
c. A researcher would like to determine whether the families of students who attend
college have a higher annual income than the families of students who do not attend
college.
d. A researcher would like to determine whether a student will attend college based on
his or her family’s annual income.
1.7 Determine whether procedures for analyzing categorical data are needed to address each
of the following research questions. Indicate what analytic procedure (e.g., ANOVA,
regression) you would use for those cases that do not require categorical methods, and
provide a rationale for each of your answers.
a. A researcher would like to determine if scores on the verbal section of the SAT can
be used to predict whether students are proficient in reading on a state-mandated test
administered in 12th grade.
b. A researcher is interested in whether income differs by gender.
c. A researcher is interested in whether level of education can be used to predict income.
d. A researcher is interested in the relationship between political party affiliation and
gender.
1.8 Provide a substantive research question that would need to be addressed using procedures
for categorical data analysis. Be sure to specify how the dependent and independent vari-
ables would be measured, and identify the scales of measurement used for these variables.
2 Probability Distributions
A Look Ahead
The ultimate goal of most inferential statistics endeavors is to determine how likely it
is to have obtained a particular sample given certain assumptions about the popula-
tion from which the sample was drawn. For example, suppose that in a random sample
of students, obtained from a large school district, 40% of the students are classified as
minority students. Based on this result, what can we infer about the proportion of
students who are minority students in the school district as a whole? This type of infer-
ence is accomplished by determining the probability of randomly selecting a particu-
lar sample from a specific population, and this probability is obtained from a sampling
distribution. In this chapter, we introduce several probability and sampling distributions
that are appropriate for categorical variables.
The properties of a sampling distribution typically depend on the properties of the underlying
distribution of the random variable of interest, a distribution that also provides the probability
of randomly selecting a particular observation, or value of the variable, from a specific popu-
lation. The most familiar example of this concept involves the sampling distribution of the
mean, often used to determine the probability of obtaining a sample with a particular sample
mean value. The properties of the sampling distribution depend on the properties of the prob-
ability distribution of the random variable in the population (e.g., its mean and variance). In
the case of continuous variables, for sufficient sample sizes the sampling distribution of the
mean and the probabilities obtained from it are based on the normal distribution. In general,
probability distributions form the basis for inferential procedures. In this chapter, we discuss
these distributions as they apply to categorical variables.
A frequency distribution table such as the one depicted in Table 2.1 summarizes the data
obtained so that a researcher can easily determine, for example, that respondents were most
likely to consider their political affiliation to be moderate and least likely to consider them-
selves to be extremely liberal. Yet, how might these data be summarized more succinctly?
Descriptive statistics, such as the mean and standard deviation, can be used to summarize
discrete variables just as they can for continuous variables, although the manner in which these
descriptive statistics are computed differs with categorical data. In addition, because it is no longer
appropriate to assume that a normal distribution is the underlying random mechanism that pro-
duces the categorical responses in a population, distributions appropriate to categorical data must
be used for inferential statistics with these data (just as the normal distribution is commonly the
appropriate distribution used for inferential statistics with continuous data). The two most com-
mon probability distributions assumed to underlie responses in the population when data are cat-
egorical are the binomial distribution and the Poisson distribution, although there are also other
distributions that are appropriate for categorical data. The remainder of this chapter is devoted
to introducing and describing common probability distributions appropriate for categorical data.
m
m! m (m − 1)(m − 2)(1)
k = = .
k ! (m − k )! k (k − 1)(k − 2)(1) (m − k )(m − k − 1)(m − k − 2)(1)
Note that 1! and 0! (1 factorial and 0 factorial, respectively) are both defined to be
equal to 1, and any other factorial is the product of all integers from the number in
the factorial to 1. For example, m! = m(m − 1)(m − 2) . . . 1.
The general formulation in Equation 2.1 can be conceptualized using the theory of com-
binatorics. The number of ways to select or choose n objects (e.g., applicants) from the total
number of objects, N, is N . Similarly, the number of ways to choose k minority applicants
n
from the total number of minority applicants, m, is m . For each possible manner of choos-
k N − m
ing k minority applicants from the total of m minority applicants, there are � possible
n − k
ways to select the remaining number of nonminority applicants to fulfill the requirement of
admitting n applicants. Therefore, the number of ways to form a sample consisting of exactly
k minority applicants is
m N − m
.
k n − k
Further, because each of the samples is equally likely, the probability of selecting exactly k
minority applicants is
m N − m
k n − k
.
N
n
Using the numerical example provided earlier, with m = 3 minority applicants, k = 1
minority admission, N = 15 qualified applicants, and n = 2 total admissions, the probability of
admitting exactly one minority student if applicants are selected at random can be computed
as follows:
We could also use this procedure to compute the probability of admitting no minority stu-
dents, but it is easier to make use of the laws of probability to determine this; that is, because only
two applicants are to be admitted, either none, one, or both of those admitted could be minority
students. Therefore, these three probabilities must sum to 1.0, and, because the probability of one
or two minority student admissions was shown to be 0.37, the probability that no minority stu-
dents are admitted can be obtained by 1.0 − 0.37 = 0.63. In this case, if applicants are randomly
selected from the qualified applicant pool, it is most likely (with 63% probability) that none of the
applicants selected by the admissions committee will be minority students, so to increase diversity
at this particular medical college, the committee should select applicants in a nonrandom manner.
Most any distribution can be described by its mean and variance. In the case of the hypergeometric
distribution, the mean (μ) and variance (σ2) can be computed using the following formulas:
nm
µ=
N
and
n (m / N )(1 − m / N )(N − n )
σ2 = .
N −1
3 (2) 6
µ= = = 0. 4
15 15
n (n −k )
P (Y = k ) = πk (1 − π ) . (2.3)
k
3 (3−3)
P (Y = 3) = 0.7 3 (0.3) = 0.7 3 = 0.343.
3
14 Probability Distributions
Further, the probabilities of all possible outcomes can be computed to determine the most
likely number of students that will be found to be proficient. Specifically:
1
2
Therefore, it is most likely that the number of mathematically proficient students will be two.
Note that, since the preceding computations exhausted all possible outcomes, their probabili-
ties should and do sum to one.
The mean and variance of the binomial distribution are expre.sed by μ = n π and σ2 = n π(1 − π),
respectively. For our example, the mean of the distribution is μ = 3(0.7) = 2.1, so the expected
number of students found to be proficient in mathematics is 2.1, which is comparable to the
information that was obtained when the probabilities were calculated directly. Moreover,
the variance of the distribution is σ2 = 3(0.7)(0.3) = 0.63, and the standard deviation is
σ = 0.63 = 0.794.
can be expressed by π1, π2, . . . , πΙ such that the sum of all probabilities is ∑π i
= 1. Therefore,
i =1
while the binomial distribution depicts the probability of obtaining a specific outcome pattern
across two categories in n trials, the multinomial distribution depicts the probability of
obtaining a specific outcome pattern across I categories in n trials. In this sense, the binomial
distribution is a special case of the multinomial distribution with I = 2.
In the illustrative example provided for the binomial distribution, students were classified
as either proficient or not proficient, and the probability of observing a particular number of
proficient students was of interest, so the binomial distribution was appropriate. However, if
students were instead categorized into one of four proficiency classifications, such as minimal,
basic, proficient, and advanced, and the probability of observing a particular number of
students in each of the four proficiency classifications was of interest, then the multinomial
distribution would be appropriate.
Probability Distributions 15
In general, for n trials with I possible outcomes, where πi = the probability of the ith out-
come, the (joint) probability that Y1 = k1, Y2 = k2, . . . , and YI = kI can be expressed by
n!
(
P Y1 = k1,Y2 = k2 ,, and YI = kI = ) k
k1 ! k2 ! kI ! 1 2
k k
π 1 π 2 πI I . (2.4)
∑π i
=1
i =1
∑k i
= n.
i =1
For example, suppose that the probability of being a minimal reader is 0.12, the probability
of being a basic reader is 0.23, the probability of being a proficient reader is 0.47, and the
probability of being an advanced reader is 0.18. If five students are chosen at random from a
particular classroom, the multinomial distribution can be used to determine the probability
that one student selected at random is a minimal reader, one student is a basic reader, two
students are proficient readers, and one student is an advanced reader:
(
P Y1 = 1,Y2 = 1,Y3 = 2, and Y4 = 1 )
5!
(0.12) (0.23) (0.47) (0.18)
1 1 2 1
=
(1!)(1!)(2 !)(1!)
5 (4)(3)(2)(1)
=
2
(0.12)(0.23)(0.47)(0.47)(0.18) = 0.066.
There are 120 different permutations in which the proficiency classification of students can
be randomly selected in this example (e.g., the probability that all five students were advanced,
the probability that one student is advanced and the other four students are proficient, and
so on), thus we do not enumerate all possible outcomes here. Nonetheless, the multinomial
distribution could be used to determine the probability of any of the 120 different permuta-
tions (i.e., combinations of outcomes) in a similar manner.
There are I means and variances for the multinomial distribution, each dealing with a par-
ticular outcome, i. Specifically, for the ith outcome, the mean can be expressed by μi = n πi and
the variance by σi2 = n π i(1 − πi). Table 2.2 depicts these descriptive statistics for each of the
four proficiency classifications in the previous example.
Table 2.2 Descriptive statistics for proficiency classifications following a multinomial distribution
e −λλk
P (Y = k ) = . (2.5)
k!
For example, suppose that over the last 50 years the average number of suicide attempts
that occurred at a particular correctional facility each year is approximately 2.5. The Poisson
distribution can be used to determine the probability that five suicide attempts will occur at
this particular correctional facility in the next year. Specifically,
(0.082)(2.5)
5
e −λλ 5 e −2.5 2.55
P (Y = 5) = = = 0.067.
5! 5! 5 (4)(3)(2)(1)
Similarly, the probabilities for a variety of outcomes can be examined to determine the
most likely number of suicides that will occur at this particular correctional facility in the next
year. For example,
(0.082)(2.5)
2
e −2.5 2.52
(
P 2 suicide attempts will occur = ) 2!
=
2 (1)
= 0.256,
(0.082)(2.5)
3
e −2.5 2.53
(
P 3 suicide attempts will occur = ) 3!
=
3 (2)(1)
= 0.214,
Probability Distributions 17
(0.082)(2.5)
4
e −2.5 2.54
(
P 4 suicide attempts will occur = )4!
=
4 (3)(2)(1)
= 0.138,
(0.082)(2.5)
5
e −2.5 2.55
(
P 5 suicide attempts will occur = )5!
=
5 (4)(3)(2)(1)
= 0.067,
(0.082)(2.5)
6
e −2.5 2.56
(
P 6 suicide attempts will occur = )6!
=
6 (5)(4)(3)(2)(1)
= 0.028,
(0.082)(2.5)
7
e −2.5 2.57
(
P 7 suicide attempts will occur = )7!
=
7 (6)(5)(4)(3)(2)(1)
= 0.010,
(0.082)(2.5)
8
e −2.5 2.58
(
P 8 suicide attempts will occur = )8!
=
8 (7 )(6)(5)(4)(3)(2)(1)
= 0.003, and
(0.082)(2.5)
8
e −2.5 2.59
(
P 9 suicide attempts will occur = )9!
=
9 (8)(7 )(6)(5)(4)(3)(2)(1)
= < 0.001.
Figure 2.1 depicts this distribution graphically, and Figure 2.2 depicts a comparable distribu-
tion if the average number of suicide attempts in a year had only been equal to 1. Comparing
the two figures, note that the expected (i.e., average or most likely) number of suicides in any
given year at this particular correctional facility, assuming that the number of suicides follows
the Poisson distribution, is greater in Figure 2.1 than in Figure 2.2, as would be expected.
Moreover, the likelihood of having a high number of suicides (e.g., four or more) is much
lower in Figure 2.2 than in Figure 2.1, as would be expected.
Figure 2.3 illustrates a comparable distribution where the average number of suicide
attempts in a year is equal to 5. Notice that Figure 2.3 is somewhat reminiscent of a normal
distribution. In fact, as λ gets larger, the Poisson distribution tends to more closely resemble
the normal distribution and, for sufficiently large λ, the normal distribution is an excellent
approximation to the Poisson distribution (Cheng, 1949).
The mean and variance of the Poisson distribution are both expressed by μ = σ2 = λ.
This implies that as the average number of times an event is expected to occur gets larger,
so does the expected variability of that event occurring, which also makes sense intuitively.
However, with real count data the variance often exceeds the mean, a phenomenon known
as overdispersion.
2.8 Summary
In this chapter, we discussed various probability distributions that are often used to model
discrete random variables. The distribution that is most appropriate for a given situation
depends on the random process that is modeled and the parameters that are needed or
available in that situation. A summary of the distributions we discussed is provided in
Table 2.3.
In the next chapter, we make use of some of these distributions for inferential procedures;
specifically, we will discuss how to estimate and test hypotheses about a population proportion
based on the information obtained from a random sample.
Probability Distributions 19
Table 2.3 Summary of common probability distributions for categorical variables
Problems
2.1 A divorce lawyer must choose 5 out of 25 people to sit on the jury that is to help decide
how much alimony should be paid to his client, the ex-husband of a wealthy business
woman. As luck would have it, 12 of the possible candidates are very bitter about hav-
ing to pay alimony to their ex-spouses. If the lawyer were to choose jury members at
random, what is the probability that none of the five jury members he chooses are bitter
about having to pay alimony?
2.2 At Learn More School, 15 of the 20 students in second grade are proficient in reading.
a. If the principal of the school were to randomly select two second-grade students to
represent the school in a poetry reading contest, what is the probability that both of
the students chosen will be proficient in reading?
b. What is the probability that only one of the two students selected will be proficient
in reading?
c. If two students are selected, what is the expected number of students that are pro-
ficient in reading?
2.3 Suppose there are 48 Republican senators and 52 Democrat senators in the United States
Senate and the president of the United States must appoint a special committee of 6 sena-
tors to study the issues related to poverty in the United States. If the special committee is
appointed by randomly selecting senators, what is the probability that half of the committee
consists of Republican senators and half of the committee consists of Democrat senators?
2.4 The CEO of a toy company would like to hire a vice president of sales and marketing.
Only 2 of the 5 qualified applicants are female, and the CEO would really like to hire
a female VP if at all possible to increase the diversity of his administrative cabinet. If he
randomly chooses an applicant from the pool, what is the probability that the applicant
chosen will be a female?
2.5 Suppose that the principal of Learn More School from Problem 2.2 is only able to choose
one second-grade student to represent the school in a poetry contest. If she randomly
selects a student, what is the probability that the student will be proficient in reading?
2.6 Researchers at the Food Institute have determined that 67% of women tend to crave
sweets over other alternatives. If 10 women are randomly sampled from across the
20 Probability Distributions
country, what is the probability that only 3 of the women sampled will report craving
sweets over other alternatives?
2.7 For a multiple-choice test item with four response options, the probability of obtaining
the correct answer by simply guessing is 0.25. If a student simply guessed on all 20 items
in a multiple-choice test:
a. What is the probability that the student would obtain the correct answers to 15 of
the 20 items?
b. What is the expected number of items the student would answer correctly?
2.8 The probability that an entering college freshman will obtain his or her degree in four
years is 0.4. What is the probability that at least one out of five admitted freshmen will
graduate in four years?
2.9 An owner of a boutique store knows that 45% of the customers who enter her store
will make purchases that total less than $200, 15% of the customers will make purchases
that total more than $200, and 40% of the customers will simply be browsing. If five
customers enter her store on a particular afternoon, what is the probability that exactly
two customers will make a purchase that totals less than $200 and exactly one customer
will make a purchase that totals more than $200?
2.10 On average, 10 people enter a particular bookstore every 5 minutes.
a. What is the probability that only four people enter the bookstore in a 5-minute
interval?
b. What is the probability that eight people enter the bookstore in a 5-minute interval?
2.11 Telephone calls are received by a college switchboard at the rate of four calls every
3 minutes. What is the probability of obtaining five calls in a 3-minute interval?
2.12 Provide a substantive illustration of a situation that would require the use of each of the
five probability distributions described in this chapter.
3 Proportions, Estimation, and
Goodness-of-Fit
A Look Ahead
n (n −k )
P (Y = k ) = πk (1 − π ) . (3.1)
k
Using Equation 3.1, suppose that in our example 4 of the 10 students were proficient in
mathematics. The probability would thus be computed by substituting n = 10 and k = 4 into
Equation 3.1, so
10 (10−4)
P (Y = 4) = π 4 (1 − π ) .
4
We can now evaluate this probability using different values of π, and the maximum likeli-
hood estimate is the value of π at which the probability (likelihood) is highest (maximized).
For example, if π = 0.3, the probability of 4 (out of the 10) students being proficient is
10 (10−4)
P (Y = 4) = (0.3) (1 − 0.3)
4
4
(10 × 9 × 8 × 7 × 6 × 5 × 4 × 3 × 2 ×1) 4
(0.3) (0.7)
6
=
(4 × 3 × 2 ×1)(6 × 5 × 4 × 3 × 2 ×1)
= 210 (0.0081)(0.1176) = 0.20.
10 (10−4)
P (Y = 4) = (0.4) (1 − 0.4) = 210 (0.4) (0.6) = 0.25.
4 4 6
4
The probabilities for the full range of possible π values are shown in Figure 3.1, which
demonstrates that the value of π that maximizes the probability (or likelihood) in our example
Figure 3.1 Binomial probabilities for various values of π with k = 4 successes and n = 10 trials
Proportions, Estimation, Goodness-of-Fit 23
is 0.40. This means that the value of 0.40 is an ideal estimate of π in the sense that it is most
probable, or likely, given the observed data. In fact, the maximum likelihood estimate
of a proportion is equal to the sample proportion, computed as p = k / n = 4 / 10 = 0.40.
In general, the maximum likelihood estimation method is an approach to obtaining sample
estimates that is useful in a variety of contexts as well as in cases where a simple computation
does not necessarily provide an ideal estimate. We will use the concept of maximum likeli-
hood estimation throughout this book.
So far, we have discussed the concept of maximum likelihood estimation and shown that we
can use the sample proportion, p = k / n, to obtain the maximum likelihood estimate (MLE)
of the population proportion, π. This is akin to what is done with more familiar parameters,
such as the population mean, where the MLE is the sample mean and it is assumed that
responses follow an underlying normal distribution. The inferential step, in the case of the
population mean, involves testing whether the sample mean differs from what it is hypoth-
esized to be in the population and constructing a confidence interval for the value of the
population mean based on its sample estimate. Similarly, in our example we can infer whether
the proportion of students found to be proficient in mathematics in the sample differs from
the proportion of students hypothesized to be proficient in mathematics in the population.
We can also construct a confidence interval for the proportion of students proficient in math-
ematics in the population based on the estimate obtained from the sample. We now turn to a
discussion of inferential procedures for a proportion.
10 (10−k )
P (Y = k ) = 0.8k (1 − 0.8) .
k
24 Proportions, Estimation, Goodness-of-Fit
Table 3.1 Probabilities of the binomial distribution with n = 10 and π = 0.8
0 0.0000 0.0000
1 0.0000 0.0000
2 0.0001 0.0001
3 0.0008 0.0009
4 0.0055 0.0064
5 0.0264 0.0328
6 0.0881 0.1209
7 0.2013 0.3222
8 0.3020 0.6242
9 0.2684 0.8926
10 0.1074 1.0000
The resulting probabilities (which make up the null distribution) are shown in Table 3.1
and Figure 3.2. Using the conventional significance level of α = 0.05, any result that is in
the lowest 5% of the null distribution would lead to rejection of H0. From the cumulative
probabilities in Table 3.1, which indicate the sum of the probabilities up to and including a
given value of k, we can see that the lowest 5% of the distribution consists of the k values 0
through 5. For values of k above 5, the cumulative probability is greater than 5%. Because our
sample result of p = 0.7 translates to k = 7 when n = 10, we can see that this result is not in
the lowest 5% of the distribution and does not provide sufficient evidence for rejecting H0. In
other words, our result is not sufficiently unusual under the null distribution and we cannot
reject the null hypothesis that π = 0.8. To put it another way, the sample result of p = 0.7 is
sufficiently consistent (or not inconsistent) with the notion that 80% of the students in the
population (represented by the sample) are indeed proficient in mathematics. On the other
hand, if we had obtained a sample proportion of p = 0.5 (i.e., k = 5), our result would have
been in the lowest 5% of the distribution and we would have rejected the null hypothesis that
Proportions, Estimation, Goodness-of-Fit 25
π = 0.8. In this case we would have concluded that, based on our sample proportion, it would
be unlikely that 80% of the students in the population are proficient in mathematics.
We can also compute p-values for these tests using the null distribution probabilities and
observing that, if the null hypothesis were true, the lower-tailed probability of obtaining a
result at least as extreme as the sample result of p = 0.7 (or k = 7) is
P (Y = 0) + P (Y = 1) + P (Y = 2) + + P (Y = 7 ) = 0.322,
which is also the cumulative probability (see Table 3.1) corresponding to k = 7. To conduct a
two-tailed test, in which the alternative hypothesis is H1: π ≠ 0.8, the one-tailed p-value would
typically be doubled. In our example, the two-tailed p-value would thus be 2(0.322) = 0.644.
Note that if only 50% of the students in our sample were found to be proficient in math-
ematics, then the lower-tailed probability of obtaining a result at least as extreme as the sample
result of p = 0.5 (or k = 5) would be
P (Y = 0) + P (Y = 1) + P (Y = 2) + + P (Y = 5) = 0.032,
which is also the cumulative probability (see Table 3.1) corresponding to k = 5. Alternatively,
the two-tailed p-value for this result would be 2(0.032) = 0.064.
There are two main drawbacks to using this method for hypothesis testing. First, if the
number of observations (or trials) is large, the procedure requires computing and summing
a large number of probabilities. In such a case, approximate methods work just as well, and
these are discussed in the following sections. Second, the p-values obtained from this method
are typically a bit too high, and the test is thus overly conservative; this means that when the
significance level is set at 0.05 and the null hypothesis is true, it is not rejected 5% of the time
(as would be expected) but less than 5% of the time (Agresti, 2007). Methods that adjust
the p-value so that it is more accurate are beyond the scope of this book but are discussed in
Agresti (2007) as well as Agresti and Coull (1998).
Estimate − Parameter
z= .
Standard error
In testing a mean, the estimate is the sample mean, the parameter is the population mean
specified under the null hypothesis, and the standard error is the standard deviation of the
appropriate sampling distribution. In our case, the estimate is the sample proportion (p) and
the parameter is the population proportion specified under the null hypothesis (denoted
by π0).
26 Proportions, Estimation, Goodness-of-Fit
To compute the standard error, recall from Chapter 2 (Section 2.5) that the variance for the
distribution of frequencies that follow the binomial distribution is σ2 = n π(1 − π). Because
a proportion is a frequency divided by the sample size, n, we can use the properties of linear
transformations to determine that the variance of the distribution of proportions is equal to
the variance of the distribution of frequencies divided by n2, or σ2 / n2 = n π(1 − π) / n2 =
π(1 − π) / n.
We can use the sample to estimate the variance by p(1 − p) / n. The standard error can thus
be estimated from sample information using p (1 − p ) / n, and the test statistic for testing
the null hypothesis H0: π = π0 is
p − π0
z= .
p (1 − p ) / n
This test statistic follows a standard normal distribution when the sample size is large. For our
example, to test H0: π = 0.8,
We can use the standard normal distribution to find the p-value for this test; that is
P (z ≤−
0.69) = 0.245,
so the two-tailed p-value is 2(0.245) = 0.49. Thus, the test statistic in our example does not
lead to rejection of H0 at the 0.05 significance level. Therefore, as we do not have sufficient
evidence to reject the null hypothesis, we conclude that the sample estimate of 0.7 is consist-
ent with the notion that 80% of the students in the population are proficient in math. This
procedure is called the Wald test.
One of the drawbacks to the Wald test is that it relies on the estimate of the population
proportion (i.e., p) to compute the standard error, and this could lead to unreliable values for
the standard error, especially when the estimate is based on a small sample. A variation on this
test, which uses the null hypothesis proportion π0 (instead of p) to compute the standard error,
is called the score test. Using the score test, the test statistic becomes
p − π0
z=
π0 (1 − π0 ) / n
and it too follows a standard normal distribution when the sample size is large. For our
example,
P (z ≤−
0.79) = 0.215,
Proportions, Estimation, Goodness-of-Fit 27
so the two-tailed p-value is 2(0.215) = 0.43 and our conclusions do not change: We do not
reject the null hypothesis based on this result.
A drawback to the score test is that, by using the null hypothesis value (i.e., π0) for the
standard error, it presumes that this is the value of the population proportion; yet, this may not
be a valid assumption and might even seem somewhat counterintuitive when we reject the
null hypothesis. Thus, this method is accurate only to the extent that π0 is a good estimate of
the true population proportion (just as the Wald test is accurate only when the sample propor-
tion, p, is a good estimate of the true population proportion).
In general, and thus applicable to both the Wald and score tests, squaring the value of z pro-
duces a test statistic that follows the χ2 (chi-squared) distribution with 1 degree of freedom.
That is, the p-value from the χ2 test (with 1 degree of freedom) is equivalent to that from the
two-tailed z-test. Another general hypothesis testing method that utilizes the χ2 distribution,
the likelihood ratio method, will be used in different contexts throughout the book and is
introduced next.
L
G 2 = −2ln 0 = −2 ln (L0 ) − ln (L1 ) . (3.2)
L1
Figure 3.3 illustrates the natural logarithm function by plotting values of a random vari-
able, X, on the horizontal axis against values of its natural logarithm, ln(X), on the vertical
axis. Note that the natural log of X will be negative when the value of X is less than 1, positive
when the value of X is greater than 1, and 0 when the value of X is equal to 1. Therefore,
when the two likelihoods (L0 and L1) are equivalent, the likelihood ratio will be one and the
G2 test statistic will be 0. As the likelihood computed from the data (L1) becomes larger rela-
tive to the likelihood under the null hypothesis (L0), the likelihood ratio will become smaller
than 1, its (natural) log will become more negative, and the test statistic will become more
positive. Thus, a larger (more positive) G2 test statistic indicates stronger evidence against H0,
as is typically the case with test statistics.
In fact, under H0 and with reasonably large samples, the G2 test statistic follows a χ2 distri-
bution with degrees of freedom (df) equal to the number of parameters restricted under H0
(i.e., df = 1 in the case of a single proportion). Because the χ2 distribution consists of squared
(i.e., positive) values, it can only be used to test two-tailed hypotheses. In other words, the
p-value obtained from this test is based on a two-tailed alternative.
28 Proportions, Estimation, Goodness-of-Fit
Figure 3.3 Illustration of the natural logarithm function for values of the variable X
For our example, with n = 10 and k = 7, L0 is the likelihood of the observed data, P(Y = 7),
computed using the binomial distribution with the probability parameter (π) specified under
the null hypothesis (H0: π = 0.8):
10 (10−7)
L0 = P (Y = 7 ) = 0.87 (1 − 0.8) = 0.201.
7
Similarly, L1 is the likelihood of the observed data given the data-based estimate (of 0.7) for
the probability parameter:
10 (10−7)
L1 = P (Y = 7 ) = 0.7 7 (1 − 0.7 ) = 0.267.
7
L
G 2 = −2ln 0 = −2ln (0.201 / 0.267 ) = −2ln (0.753) = 0.567.
L1
The critical value of a χ2 distribution with 1 degree of freedom at the 0.05 significance level
is 3.84 (see Appendix), so this test statistic does not exceed the critical value and the null
hypothesis is not rejected using this two-tailed test. We can also obtain the p-value of this test
using a χ2 calculator or various software programs: P( χ12 ≥ 0.567 ) = 0.45.
Proportions, Estimation, Goodness-of-Fit 29
Table 3.2 Summary of various approaches for testing H0: π = 0.8 with n = 10 and k = 7
where the statistic is the sample estimate of the parameter, the critical value depends on the
level of confidence desired and is obtained from the sampling distribution of the statistic, and
the standard error is the standard deviation of the sampling distribution of the statistic. For
example, in constructing a confidence interval for the population mean, one would typically
use the sample mean as the statistic, a value from the t-distribution (with n − 1 degrees of
freedom at the desired confidence level) as the critical value, and s / n for the standard error
(where s is the sample standard deviation).
Generalizing this to a confidence interval for a proportion, the sample proportion, p, is
used as the statistic and the critical value is obtained from the standard normal distribution
(e.g., z = 1.96 for a 95% confidence interval). To compute the standard error, we refer back
to the Wald approach, which uses p (1 − p ) / n as the standard error of a proportion. There-
fore, the confidence interval for a proportion is computed using
p ± z α/2 p (1 − p ) / n,
where zα/2 is the critical value from the standard normal distribution for a (1 ‒ α)% confidence
level.
30 Proportions, Estimation, Goodness-of-Fit
For example, to construct a 95% confidence interval using our sample (where n = 10 and
k = 7), we have p = 0.7, α = 0.05 so zα/2 = 1.96, and p (1 − p ) / n = 0.145. Therefore, the
95% confidence interval for our example is
Based on this result, we can be 95% confident that the proportion of students in the popula-
tion (from which the sample was obtained) who are proficient in mathematics is somewhere
between (approximately) 42% and 98%. This is a very large range due to the fact that we have
a very small sample (and, thus, a relatively large standard error). As we mentioned previously
in our discussion of the Wald test, it is somewhat unreliable to compute the standard error
based on the sample proportion, especially when the sample size is small.
The value added to and subtracted from the sample proportion (e.g., 0.284) is called the
margin of error. The larger it is, the wider the confidence interval and the less precise our
estimate. In our example earlier, the margin of error is over 28% due to our small sample.
When we are designing a study, if wish to aim for a certain margin of error for our estimate
(as is done, for example, in polling research) we can “work backward” and solve for the sample
size needed. That is, the sample size needed for a given margin of error, ME, is:
p (1 − p )
n=
(ME / z )
2
where n is the sample size, p is the sample proportion, ME is the desired margin of error, and
z is the critical value corresponding to the desired confidence level. For example, suppose that
we wanted our estimate to be accurate to within 2%, with 95% confidence. The sample size
needed to achieve this, given a proportion of 0.7, would be
p (1 − p ) 0. 7 ( 0. 3 )
n= = = 2017.
(ME / z ) (0.02 / 1.96)
2 2
Note that for a confidence interval we do not have the option of replacing p with π0 for
estimating the standard error (as we did with the score test) because a confidence interval does
not involve the specification of a null hypothesis. Although there are other available methods
for computing confidence intervals for a proportion, they are beyond the scope of this book.
We refer the interested reader to Agresti (2007), who suggests using the hypothesis testing
formula to essentially “work backward” and solve for a confidence interval. Other alternatives
include the Agresti–Coull confidence interval, which is an approximation of the method that
uses the hypothesis testing formula (Agresti & Coull, 1998), and the F distribution method
(Collett, 1991; Leemis & Trivedi, 1996), which provides exact confidence limits for the bino-
mial proportion. The latter (confidence interval computed by the F distribution method) can
be obtained from SAS.
The Pearson chi-squared test statistic for comparing two frequency distributions is
∑
2
X =
all categories
expected frequency
(O − Ei )
2
c
=∑
i
,
i =1
Ei
where Oi represents the observed frequency in the ith category and Ei represents the expected
frequency in the ith category. This X2 test statistic follows a χ2 distrution with c ‒ 1 degrees
of freedom, where c is the total number of categories. The reason for this is that only c ‒ 1
category frequencies can vary “freely” for a given total sample size, n, because the frequen-
cies across the c categories must add up to the total sample size. Therefore, once c ‒ 1 fre-
quencies are known, the last category frequency can be determined by subtracting those
frequencies from the total sample size.
The expected frequencies are specified by the null hypothesis; that is, they are the
frequencies one expects to observe if the null hypothesis is true. In our example, the null
hypothesis would state that there is no change in frequencies between 2005 and 2006,
Table 3.3 Distribution of Wisconsin 10th-g rade students according to four proficiency classification levels
(n = 71,709)
2
X = + + +
10756.35 28683.6 21512.7 10756.35
= 5784.027 + 448.1688 + 6119.445 + 0 = 12351.64.
observed frequency c O
G 2 = 2 ∑ (observed frequency) ln = 2∑Oi ln i
quency
expected freq i =1
Ei
and it also follows a χ2 distribution with c ‒ 1 degrees of freedom. For our example,
18644 × ln 18644 + 32269 × ln 32269 + 10039
c O 10756.35 28683.60
G 2 = 2∑Oi ln i = 2
E 10039 10757
×ln
21512.70 + 10757 × ln 10756.35
i =1 i
= 2 (10254.72 + 3800.69 + 7651.38 + 0.65) = 12809.36.
Another random document with
no related content on Scribd:
pared estaba edificado un castillo,
assimesmo de las mesmas
piedras, entretexidas con otras
verdes y azules, enlazadas con
unos remates de oro que hacían
una tan excelente obra que más
divina que humana parecía,
porque con los rayos del sol que
en él daban, resplandeciendo,
apenas de mis ojos mirarse
consentía. Estaba tan bien
torreado y fortalecido con tantos
cubos y barbacanas, que
cualquiera que lo viera, demás de
su gran riqueza, lo juzgara por un
castillo de fortaleza inespugnable.
Tenía encima del arco de la
puerta principal una letra que
decía: «Morada de la Fortuna, á
quien por permissión divina
muchas de las cosas corporales
son subjetas».
La otra pared del otro ángulo que
cabe éste estaba era toda hecha
de una piedra tan negra y escura,
que ninguna lo podía ser más en
el mundo; y de la mesma manera
en el medio della estaba
edeficado un castillo, que en
mirarlo ponía gran tristeza y
temor. Tenía también unas letras
blancas que claramente se
dexaban leer, las cuales decían:
«Reposo de la Muerte, de adonde
executaba sus poderosas fuerzas
contra todos los mortales».
La otra pared era de un christalino
muro transparente, en que como
en un espejo muy claro todas las
cosas que había en el mundo, así
las pasadas como las presentes,
se podían mirar y ver, con una
noticia confusa de las venideras.
Tenía en el medio otro castillo de
tan claro cristal que, con
reverberar en él los rayos
resplandecientes del sol, quitaban
la luz á mis ojos, que
contemplando estaban una obra
de tan gran perfición; pero no
tanto que por ello dexase el ver
unas letras que de un muy fino
rosicler estaban en la mesma
puerta esculpidas, que decían:
«Aquí habita el Tiempo, que todas
las cosas que se hacen en sí las
acaba y consume».
En el medio deste circuito estaba
edificado otro castillo, que era en
el hermoso y verde prado cerca
de adonde yo me hallaba, el cual
me puso en mayor admiración y
espanto que todo lo que había
visto, porque demás de la gran
fortaleza de su edificio era
cercado de una muy ancha y tan
honda cava, que casi parecía
llegar á los abismos. Las paredes
eran hechas de unas piedras
amarillas, y todas pintadas de
pincel, y otras de talla con figuras
que gran lástima ponían en mi
corazón, que contemplándolas
estaba, porque allí se vían
muchas bestias fieras, que con
gran crueldad despedazaban los
cuerpos humanos de muchos
hombres y mujeres; otras que
después de despedazados,
satisfaciendo su rabiosa hambre,
á bocados los estaban comiendo.
Había también muchos hombres
que por casos desastrados
mataban á otros, y otros que sin
ocasión ninguna, por sólo su
voluntad, eran causa de muchas
muertes; allí se mostraban
muchos padres que dieron la
muerte á sus hijos y muchos hijos
que mataron á sus padres. Había
también muchas maneras y
invenciones de tormentos con que
muchas personas morían, que
contarlos particularmente sería
para no acabar tan presto de
decirlos. Tenía unas letras
entretalladas de color leonado
que decían: «Aposento de la
Crueldad, que toda compasión,
amor y lástima aborrece como la
mayor enemiga suya».
Tan maravillado me tenía la
novedad destas cosas que
mirando estaba, que juzgando
aquel circuito por otro nuevo
mundo, y con voluntad de salirme
dél si pudiesse, tendí la vista por
todas partes para ver si hallaría
alguna salida adonde mi camino
enderezase, que sin temer el
trabajo á la hora lo comenzara,
porque todas las cosas que allí
había para dar contentamiento,
con la soledad me causaban
tristeza, deseando verme con mi
ganado en libertad de poderlo
menear de unos pastos buenos
en otros mejores y volverme con
él á la aldea cuando á la voluntad
me viniera; y no hallando remedio
para que mi deseo se cumpliese,
tomando á la paciencia y
sufrimiento por escudo y
compañía para todo lo que
sucederme pudiese, me fuí á una
fuente que cerca de mí había
visto, la cual estando cubierta de
un cielo azul, relevado todo con
muy hermosas labores de oro,
que cuatro pilares de pórfido,
labrados con follajes al romano,
sostenían, despedía de sí un gran
chorro de agua que, discurriendo
por las limpias y blancas piedras y
menuda arena, pusieran sed á
cualquiera que no la tuviera,
convidando para que della
bebiesen con hacer compañía á
las ninfas que de aquella hermosa
fuente debían gozar el mayor
tiempo del año; y assí, lavando
mis manos y gesto, limpiándolo
del polvo y sudor que en el
camino tan largo había cogido,
echado de bruces y otras veces
juntando mis manos y tomando
con ellas el agua, por no tener
otra vasija, no hacía sino beber;
pero cuantas más veces bebía,
tanto la sed en mayor grado me
fatigaba, creciéndome más cada
hora con cuidado de la mi Belisia,
que el agua me parecía
convertirse en llamas de fuego
dentro de mi abrasado y
encendido pecho, y maravillado
desta novedad me acordaba de la
fuente del olivo, donde agora
estamos, deseando poder beber
desta dulce agua y sabrosa con
que el ardor matar pudiese que
tanta fatiga me daba; y estando
con este deseo muy congojado,
comencé á oír un estruendo y
ruido tan grande, que atronando
mis oídos me tenían casi fuera de
mi juicio, y volviendo los ojos para
ver lo que podía causarlo, vi que
el castillo de la Fortuna se había
abierto por medio, dexando un
gran trecho descubierto, del que
salía un carro tan grande, que
mayor que el mesmo castillo
parecía; de los pretiles y almenas
comenzaron á disparar grandes
truenos de artillería, y tras ellos
una música tan acordada de
menestriles altos con otros
muchos y diversos instrumentos,
que más parecía cosa del cielo
que no que en la tierra pudiese
oirse; y aunque no me faltaba
atención para escucharla, mis
ojos se empleaban en mirar aquel
poderoso carro, con las maravillas
que en él vía que venían, que no
sé si seré bastante para poder
contar algunas dellas, pues que
todo sería imposible á mi
pequeño juicio hacerlo. El carro
era todo de muy fino oro, con
muchas labores extremadas
hechas de piedras preciosas, en
las cuales había grande
abundancia de diamantes,
esmeraldas y rubís y carbunclos,
sin otras de más baxa suerte. Las
ruedas eran doce, todas de un
blanco marfil, asimesmo con
muchas labores de oro y piedras
preciosas, labradas con una arte
tan sutil y delicada que no hubiera
pintor en el mundo que así
supiera hacerlo. Venían uncidos
veinte y cuatro unicornios blancos
y muy grandes y poderosos, que
lo traían; encima del carro estaba
hecho un trono muy alto con doce
gradas, que por cada parte lo
cercaban, todas cubiertas con un
muy rico brocado bandeado con
una tela de plata, con unas
lazadas de perlas, que lo uno con
lo otro entretexía; encima del
trono estaba una silla toda de fino
diamante, con los remates de
unos carbunclos que daban de sí
tan gran claridad y resplandor que
no hiciera falta la luz con que el
día les ayudaba, porque en medio
de la noche pudiera todo muy
claramente verse. En esta silla
venía sentada una mujer, cuya
majestad sobrepuja á la de todas
las cosas visibles; sus vestidos
eran de inestimable valor y de
manera que sería imposible poder
contar la manera y riqueza dellos;
traía en su compañía cuatro
doncellas; las dos que de una
excelente hermosura eran
dotadas venían muy pobremente
aderezadas, los vestidos todos
rotos, que por muchas partes sus
carnes se parecían; estaban
echadas en el suelo. Y aquella
mujer, á quien ya yo por las
señales había conocido ser la
Fortuna, tenía sus pies encima de
sus cervices, fatigándolas, sin que
pudiesen hacer otra cosa sino
mostrar con muchas lágrimas y
sospiros el agravio que padecían;
traían consigo sus nombres
escritos, que decían: el de la una,
«Razón», y el de la otra,
«Justicia». Las otras dos
doncellas, vestidas de la mesma
librea de la Fortuna, como
privadas suyas, tenían los gestos
muy feos y aborrecibles para
quien bien los entendiesse,
conociendo el daño de sus obras;
traían en las manos dos estoques
desnudos, con que á la Razón y á
la Justicia amenazaban, y en
medio de sus pechos dos rótulos
que decían: el de la una,
«Antojo», y el de la otra, «Libre
voluntad». Con grande espanto
me tenían estas cosas; pero
mayor me lo ponía el gesto de la
Fortuna, que algunas veces muy
risueño y halagüeño se mostraba
y otras tan espantable y medroso
que apenas mirarse consentía.
Estaba en esto con tan poca
firmeza, que en una hora mill
veces se mudaba; pero lo que en
mayor admiración me puso fué
ver una rueda que la Fortuna
traía, volviendo sin cesar con sus
manos el exe della; y
comenzando los unicornios á
mover el carro hacia adonde yo
estaba tendido junto á la fuente,
cuanto más á mí se acercaba
tanto mayor me iba pareciendo la
rueda, en la cual se mostraban
tan grandes y admirables
misterios, que ningún juicio
humano sin haberlos visto es
bastante á comprenderlos en su
entendimiento; porque en ella se
vían subir y baxar tan gran
número de gentes, assí hombres
como mujeres, con tantos trajes y
atavíos diferentes los unos de los
otros, que ningún estado grande
ni pequeño desde el principio del
mundo en él ha habido que allí no
se conociese, con las personas
que dél próspera ó
desdichadamente habían gozado,
y como cuerpos fantásticos y
incorpóreos los unos baxaban y
los otros subían sin hacerse
impedimiento ninguno; muchos
dellos estaban en la cumbre más
alta desta rueda, y por más
veloce que el curso della
anduviese, jamás se mudaban,
aunque éstos eran muy pocos;
otros iban subiendo poniendo
todas sus fuerzas, pero hallaban
la rueda tan deleznable que
ninguna cosa le aprovechaba su
diligencia, y otros venían cabeza
abaxo, agraviándose de la súpita
caída, con que vían derrocarse;
pero la Fortuna, dándose poco
por ello, no dexaba de proseguir
en su comenzado officio. Yo que
estaba mirando con grande
atención lo que en la rueda se me
mostraba, vime á mí mesmo que
debaxo della estaba tendido,
gemiendo por la grande caída con
que Fortuna me había derribado;
y con dolor de verme tan mal
tratado, comencé á mirar la
Fortuna con unos ojos piadosos y
llenos de lágrimas, queriéndole
mover con ellas á que de mis
trabajos se compadeciese. Y á
este tiempo, cesando la música
del castillo y parando los
unicornios el carro, la Fortuna,
mirándome con el gesto algo
airado y con una voz para mí
desabrida, por lo que sus
palabras mostraron, con una gran
majestad me comenzó á decir
desta manera: