Download as pdf or txt
Download as pdf or txt
You are on page 1of 28

EducationalEvaluation and Policy Analysis

Winter 2006, Vol. 28, No. 4, pp. 287-313

Effects of Class Size and Instruction on Kindergarten Achievement

Carolina Milesi
University of Wisconsin-Madison

Adam Gamoran
University of Wisconsin-Madison

Although experimental results indicate that smaller classes promote higher achievement in early
elementary school, the broader literatureon class-size effects is inconclusive. This seeming contra-
diction raisesquestions about the generalizabilityof experimental evidence, an issue that this article
addressesby examining the effects of class size on achievement in kindergarten with datafrom a na-
tionwide survey, the Early Childhood LongitudinalStudy-Kindergarten Classof 1998-99. To distin-
guish class-levelfrom individual-level effects, this analysis utilizes hierarchicallinear models. In re-
sponse to concerns about selectivity, teacherfixed-effects models are also estimated. In an effort to
understand the inconsistentfindings of the past, the authors examine classroom conditions that may
affect the link between class size and academicachievement, and also consider whether class size has
different effects for different groups of students. The authorsfind no evidence of class-size effects on
student achievement in either reading or mathematics, and results indicate that class size is equally
insignificantfor students from different race/ethnic,economic, and academic backgrounds. Teacher
fried-effects analyses also yield null findings for class size. Instructional activities offer significant
boosts to achievement, but the effects of instruction do not differ between small and large classes.
The authorsdiscuss why the small class size advantage evidenced by experimental researchmight not
generalize to nonexperimental,naturally occurringsettings throughoutthe nation.

Keywords: class size, ECLS-K, instruction, student achievement

IN THE last two decades, class-size reduction (CSR) federal funding for CSR programs increased to
has become a popular, yet controversial, educa- $1.3 billion. The goal of this program was: "over
tional initiative. An array of researchers, teachers' seven years, to hire 100,000 new teachers and re-
unions, policymakers, and politicians have debated duce kindergarten through grade 3 classrooms
the benefits and costs of reducing class size. across the country to an average of 18 children"
Despite mixed evidence, school districts through- (U.S. Department of Education, 2000, p. 7). When
out the nation have conducted CSR programs announcing this initiative, U.S. Secretary of Ed-
while hoping to improve student achievement. In ucation Richard W. Riley commented that, "any
1999, the federal government awarded $1.2 bil- parent or teacher will tell you that class size really
lion to the states for schools to hire "fully quali- makes a difference. [... ] Smaller classes mean
fied teachers" for primary-grade classrooms to more individual attention for students, more or-
cut down class size in the early grades. In 2000, derly classrooms for teachers, and a better learning

Earlier versions of this article were presented at the annual meeting of the American Sociological Association, Atlanta, GA,
August 2003, and at seminars at the University of Wisconsin-Madison and the University of Chicago. The authors appreciate the
valuable advice they received in those forums and are further grateful for helpful comments from the EEPA editors and reviewers.

287
Milesi and Gamoran

environment for everyone" (U.S. Department of Teacher Achievement Ratio (STAR). As one of
Education, 1998). the few large-scale randomized experiments in
Under the Bush administration, the CSR pro- education, Project STAR has been given ample
gram was incorporated into Title II of the 2001 credibility. Researchers have referred to STAR
reauthorization of the Elementary and Secondary as: "one of the great experiments in education in
Education Act (ESEA), known as the No Child U.S. history" (Mosteller, Light, & Sachs, 1996),
Left Behind Act (NCLB). Although no longer a and CSR initiatives have praised STAR's exper-
separate federal program, CSR remained an al- imental design as authoritative proof that small
lowable use of funds under Title II, Part A (U.S. classes are beneficial for student outcomes.
Department of Education, 2004). This program Furthermore, the current policy environment
provides funds to school districts to promote regards randomized field trials as the "gold stan-
teacher quality by addressing such challenges dard" for establishing the effectiveness of an in-
as teacher preparation and qualifications of new tervention. Within this framework, STAR con-
teachers, recruitment and hiring, instruction, pro- stitutes "strong" evidence that CSR "works" to
fessional development, teacher retention, and prin- raise student achievement. Indeed, STAR's evi-
cipals' leadership. Within this new framework, dence appropriately warrants causal claims. Be-
school districts can hire new teachers to reduce cause the Project STAR's research design en-
class size only in accordance with the provision to: tailed the random assignment of teachers and
"prepare, train, and recruit high-quality teachers
students to treatment and control conditions, one
and principals capable of ensuring that all children can be confident that observed differences in
achieve to high standards" (U.S. Department of student achievement result from solely the inter-
Education, 2005). By embedding CSR into this vention, namely a reduction in class size. Random
provision, NCLB establishes that aiming at teacher assignment created groups of "equal" students
quality supersedes the intent to reduce class size. and "equal" teachers at the outset. Thus, differ-
If, as some have argued, the demand for increased ences between the treatment and control groups
teacher quantity that results from CSR conflicts were not affected by selection bias or by the
with the aim to improve teacher quality, it may be presence of confounding variables that also im-
more difficult for school districts to reduce class prove student achievement, as may occur in a
size in this new context. However, if teacher qual- nonexperimental study of naturally occurring
ity is more important than class size for achieve- variation.
ment, the emphasis on quality may be warranted. Although crucial, the causal evidence that a
Even as federal policy has deemphasized CSR, randomized trial provides is not sufficient to
several states continue to battle over CSR policies. extrapolate experimental findings into policies.
Florida, for example, has made national headlines From a policy perspective, it is also necessary to
since November 2002 when voters approved a confirm that experimental findings are generaliz-
constitutional amendment to limit class size to able to "real-world" settings. The generalizabil-
18 students in kindergarten through third grade, ity of Project STAR's findings to other locations,
22 in fourth through eighth grades, and 25 in high populations, and scales of treatment remains open
school, all to be fulfilled by 2010. The Florida to question, especially in light of mixed results of
Association of District School Superintendents CSR in other contexts. For example, partly on the
and Florida's Governor Jeb Bush, among others, basis of STAR's findings, the state of California
have contested this citizen-proposed amendment embarked on a statewide CSR initiative aimed at
because of its cost and the shortage of qualified reducing class size to 20 students in all kinder-
teachers (see Gewertz, 2003; Goodnough, 2003; garten through third-grade classrooms over a
Kennedy, 2003; Richard, 2004). However, cur- 3-year period, starting in fall 1996. However, the
rently, amendment opponents have not gathered California initiative failed to live up to STAR's
enough support in the legislature to place a refer- promise, probably because whereas STAR was
endum on a statewide ballot asking voters to re- fully funded by the state and implemented with
peal or trim down the 2002 class size amendment. an ample supply of facilities and qualified teach-
Support for small classes has relied heavily on ers, the CSR initiative in California dealt with
the positive results of a CSR experiment con- limited financial resources, a shortage of quali-
ducted in Tennessee in the 1980s called Student/ fied teachers, insufficient classroom space, and a
288
Class Size and Instruction on KindergartenAchievement

substantively more diverse student population compasses an array of experimental, quasi-


(Stecher & Bohrnstedt, 2002). experimental, and nonexperimental studies.
The comparison of CSR initiatives in Ten- Among the first type of studies, the best known
nessee and California shows that the results of findings come from Project STAR, a large-
CSR in one context may not hold in another. scale randomized experiment conducted in
More broadly, given the prominence of CSR pro- Tennessee from 1985 to 1989. STAR followed
posals in many districts throughout the country, a between-subjects, within-school design. In each
the comparison illustrates the importance of un- participating school, students entering kinder-
derstanding the effects of class size in the nation garten were randomly assigned to one of three
as a whole under the prevailing schooling condi- types of classes: a small class (13-17 students),
tions. Until recently, the lack of national data on a regular class (22-25 students), or a regular
class size and student achievement at the ele- class (22-25 students) with a full-time teacher
mentary level prevented such a nationwide analy- aide. Students were to remain in their treatment
sis. The release of the Early Childhood Longitu- condition for four years and then return to their
dinal Study, a nationally representative survey regular-size classroom. Also within schools,
that began with kindergarteners in 1998, makes teachers were randomly assigned to one of the
it possible to examine the effects of class size three treatments each year. Other than the change
throughout the country, in the naturally occur- in classroom configuration, no other intervention,
ring conditions of schools. such as teacher training or special curricula,
This article focuses on the kindergarten year to occurred.
pose three questions that address the scope and STAR results revealed that students in small
generalizability of class-size effects. First, what classes increased their reading and mathematics
is the impact of class size on reading and mathe- achievement by 0.15-0.27 SD, compared to their
matics achievement, given the natural conditions peers in regular classes with and without a full-
in which class-size variation occurs? Of course, time aide (Finn & Achilles, 1999). The small
addressing this question in the absence of an ex- class-size advantage widened between kinder-
periment requires serious attention to causal in- garten and first grade but remained constant or
ference, which is addressed not only with rigor- even narrowed in second and third grade (Finn
ous controls for other class and student conditions & Achilles, 1999; Grissmer, 1999).'
but also with teacher fixed-effects models in The interpretation of STAR's findings has
which the authors examine class size variation raised issues of internal and external validity. In-
for teachers who instruct more than one kinder- ternal validity refers to the correct measurement
garten class during the same year. Second, what and interpretation of the causal relationship being
classroom conditions are relevant for class-size studied. Among the threats to STAR's internal
effects on achievement? Assessment of condi- validity are the attrition of approximately 50%
tions inside classrooms is rarely possible in ex- of the initial participants, switching among
perimental situations and thus is an important class types ("treatment crossover"), and differ-
facet of a study of nonexperimental, naturally oc- ential exposure to experimental conditions re-
curring variation such as ours. Third, does class sulting from attrition or addition of new experi-
size have different effects for different students? mental subjects (Ehrenberg, Brewer, Gamoran,
Although class size may not be consequential on & Willms, 2001; Hanushek, 1999; Nye, Hedges,
average, it may still matter for some students. By & Konstantopoulos, 1999). External validity,
addressing these questions, we aim to further the on the other hand, pertains to the ability to gen-
debate on class size and to illuminate differences eralize the effects of small classes on achieve-
between experimental studies and those of natu- ment to conditions beyond the experimental
rally occurring variation. setting. There are three main concerns in this re-
Class-Size Effects in Experimental and spect. First, schools were not randomly selected,
Nonexperimental Studies because schools volunteered to participate and had
to be large enough so that all three types of classes
What Are the Effects of Class Size? could be established in each grade (Ehrenberg
Unlike most research in education, evidence et al., 2001; Grissmer, 1999; Hanushek, 1999).
on the effect of class size on achievement en- Second, there were well-supplied experimental
289
Milesi and Gamoran

conditions, with no shortages of teachers or facil- had done-and 1.57 when he weighted each
ities, as has been documented in the implemen- study equally (Krueger, 2002, Table 1-2, p. 14).
tation of the CSR program in California (Ehren- Krueger argued that this evidence, by itself, does
berg et al., 2001; Stecher & Bohrnstedt, 2002).2 not mean that CSR is worth the investment. To
Third, actors in the experiment were not blind answer this question, he asserted, it is necessary
to the treatment condition to which they were to have: "knowledge of the strength of the rela-
assigned. Because the Tennessee legislature tionships between class size and economic and
conceived STAR as a "demonstration" (Ritter social benefits, knowledge of how these relation-
& Boruch, 1999), teachers and school adminis- ships vary across groups of students, and infor-
trators could have expected the experiment's mation on the cost of class size reduction"
results to influence the adoption of a statewide (Krueger, 2002, p. 21).
CSR policy (Ehrenberg et al., 2001; Hanushek, Hedges and Greenwald (1996) highlighted
1999). If this were the case, the incentive condi- other limitations of Hanushek's strategy. They
tions would have been altered such that schools advocated the use of a formal statistical proce-
would have made a better use of small classes in dure (meta-analysis) that combines statistical sig-
Project STAR than if CSR were fully enacted nificance levels (p-values) from different studies,
(Hoxby, 2000). each of which tests the same conceptual hypoth-
Nonexperimental studies have both supported esis. In contrast, Hanushek considered the p-value
and contradicted Project STAR's results. In a re- of each estimate separately, emphasizing the large
view of 277 different estimates of the effect of number of statistically insignificant coefficients.
class size and teacher-pupil ratios on student out- Hedges and Greenwald (1996) argued that be-
comes, Hanushek (1999) found that estimates cause individual studies may have low statistical
were almost equally divided between those sug- power, the proportion of statistically insignifi-
gesting that small classes are better and those sug- cant estimates might be large even in the pres-
gesting that they are worse.3 Hanushek obtained ence of small-to-moderate effects in each study.
similar results when he separated estimates taken Greenwald, Hedges, and Laine (1996) used this
from elementary and secondary schools, when he combined significance testing to review 60 stud-
restricted the universe of estimates to those with ies pertaining to the effect of different school re-
a value-added research design, and also when he sources on student achievement, including 29 of
solely considered value-added studies within a the 38 studies Hanushek (1989) summarized.
single state-a strategy that attempts to correct They found positive and negative effects of small'
for differences in state-school policies. Based on teacher-pupil ratios, but the positive coefficients
this evidence, Hanushek concluded that, given were more consistent across different subsets of
the current organization and incentives some studies.
schools have, CSR "is a very ineffective educa- In addition to testing the direction of the effect
tional policy" (2002). of teacher-pupil ratio, Greenwald, Hedges, and
Other writers have criticized Hanushek's work Laine (1996) summarized the magnitude of this
for its methodological approach and for its pol- effect across different studies. The median stan-
icy recommendations. Krueger (2002) objected dardized effect of teacher-pupil ratio was between
to how Hanushek summarized the estimates- 0.027 and 0.047, meaning that 1 SD reduction in
counting each estimate as a separate result, even teacher-pupil ratio raises student achievement
though some studies provided multiple, non- by 0.027-0.047 SD. Compared to other school
independent estimates. Based on the same pool resources, the effect of teacher-pupil ratio was
of estimates as Hanushek (1999), Krueger used the smallest. According to the authors' calcula-
alternative procedures to weigh the estimates and tions, if the expenditure per student increased by
found that the relationship between small classes $500-approximately 10% of the national aver-
and student achievement was more positive and age of per pupil expenditure in 1994-and if that
consistent than Hanushek claimed. In particular, amount was devoted to reducing teacher-pupil
the ratio of estimates showing a positive versus a ratios, the increase in student achievement would
negative relationship between smaller classes be 0.04 SD. If the same $500 per student were
and student achievement was 1.07 when Krueger used in teacher education, teacher experience,
weighted each estimate equally-as Hanushek or teacher salary, the increase in student achieve-
290
Class Size and Instructionon KindergartenAchievement

ment would be 0.22, 0.18, and 0.16 SD, respec- Because the "business of schooling" is mainly
tively (Greenwald, Hedges, and Laine, 1996, instruction (Barr & Dreeben, 1983), what occurs
Table 7, p. 378). within the classroom may affect the association
The debate among Hanushek, Krueger, and between class size and student achievement. Thus,
Hedges and associates prompts the question of these classroom processes may help us understand
why nonexperimental studies draw inconsistent why class size benefits student achievement in
conclusions regarding the direction and magnitude some contexts and not in others.
of class-size effects. Ehrenberg et al. (2001) and
Grissmer (1999) suggested that model misspeci- 1. Teachers may teach differently in largerand
fication and violation of model assumptions may smaller classes. The most common explanation
explain the differences among these analyses. In of class-size effects refers to a change in teach-
particular, selection bias is a pervasive problem ers' behaviors. The expectation is that teachers in
in studies of naturally occurring variation that rely smaller classes alter their instructional strategies
on survey data. In the case of class size, princi- in a way that benefits student learning, therefore
pals may selectively allocate teachers to different- raising achievement. Although this explanation
sized classes, and teachers and parents may selec- is intuitively appealing, it has almost no empiri-
tively allocate students to different-sized classes. cal support.
Varied selectivity patterns may account for in- Evidence from survey analyses reveals weak
consistent results among nonexperimental stud- associations between class size and instructional
ies, as well as for differences between the results practices. Using data on 2,170 math classes
of experiments and those of nonexperimental re- from the Longitudinal Study of American Youth
search. Class-size researchers have developed (LSAY), Betts and Shkolnik (1999) reported that
several strategies to address, at least partially, the differences in class size provoked only slight
issue of selection bias in nonexperimental studies changes in teachers' allocation of time. Accord-
(see Angrist & Levy, 1999; Betts & Shkolnik, ing to Betts and Shkolnik's estimates, if class size
1999; Hoxby, 2000). Because our analysis relies increases from 20 to 40, the largest reallocations
on survey data, we seriously consider the prob- of time would be a 3% decrease in time devoted
lem of selectivity later in this article. Our evidence to review and a 2.5% percent increase in time de-
does not allow us to make causal claims that are voted to disciplining students. An interesting fea-
as strong as those generated by experimental ev- ture of this sample is that many teachers taught
idence, but our data are more representative of more than one math class, and teachers filled
current schooling conditions than an experiment out a separate questionnaire for each class they
such as STAR or a state initiative such as CSR in taught. Thus, Betts and Shkolnik were able to
California may be. conduct a regression with fixed effects for teach-
ers. This type of regression model removes all
What Classroom Conditions Matter the variation among teachers in the dependent
for Class-Size Effects? and independent variables, therefore clearing
The studies we have reviewed so far are guided out any bias resulting from unobserved teacher
by the question: "Do small classes result in im- traits. In this case, the fixed effects yielded the
proved academic achievement in elementary same conclusions as the least squares regres-
grades?" (Finn & Achilles, 1999). Attempts to sions: class size had little impact on instructional
respond to this question are at the core of studies practice.
of educational production functions (Coleman In a similar vein, Rice (1999) analyzed eighth-
et al., 1966). However, as Barr and Dreeben (1983) grade math and science classes from the National
stated more than two decades ago, understanding Educational Longitudinal Study (NELS). She re-
the relation between class size and achievement ported that in the case of math, class size was
as an input-output problem leaves the schooling negatively associated with the amount of time
process as a black box. In this section, we review spent working with small groups, the amount of
aspects of classroom conditions that may link time devoted to innovative instructional practices,
class size with student achievement. We focus on and the amount of time devoted to whole-group
processes that occur within the classroom, because discussions. Even though these effects were sta-
it is the most direct context of student learning. tistically significant, their magnitude was small
291
Milesi and Gamoran

and class size did not affect instruction in science served similarity in teachers' methods of instruc-
classes. tion cannot be attributed to selection bias. Stu-
Like the U.S. national surveys, an international dents were also randomly allocated to classes of
survey of mathematics achievement yielded no different sizes, and contrary to Project STAR in
evidence that curriculum coverage or instruc- the United States, only one out of the six mea-
tional practices provided a link between class size sures of student achievement (i.e., mathematical
and achievement (Pong & Pallas, 2001). Compa- concepts scores) exhibited a positive and statisti-
rable findings emerged from the CSR initiative cally significant association with smaller classes.
in Grades I through 3 in California: Stasz and The authors concluded that a small class: "makes
Stecher (2000) reported that teachers in reduced a large difference to the teachers but little differ-
and nonreduced classes (with maximums of 20 ence to the students or to the instructional meth-
and 33 students, respectively) covered the same ods used" (Shapson et al., 1980, p. 151).
general topics in mathematics and language arts 2. Some instructionalpractices may be more
and did so for similar amounts of time. Some dif- effective than others in small classes. If teachers
ferences appeared in teaching practices, particu- do not change their behaviors and practices in
larly teachers in small classes spent less time small classes, why do small classes still raise stu-
disciplining students and taught to the whole class dent achievement in some contexts? One possi-
less often. Also, students in small classes carried bility is that small classes benefit student learning
out more activities that are consistent with cur- when they occur in combination with particular
riculum reforms in reading and mathematics, such methods of instruction (Ehrenberg et al., 2001).
as writing narrative pieces in language arts and
The intuition behind this account is as follows.
playing mathematics games and using patterns to
When the teacher carries out group instruction,
find relationships in mathematics.
the time he or she devotes to each student does
Evidence from an experimental study con-
not change if there is a reduction in class size.
ducted in the early 1970s confirms the tenuous
Thus, in classes where teachers allocate most
link between class size and instructional practices
of the instructional time to lecture, whole-class
(Shapson, Wright, Eason, & Fitzgerald, 1980). In
three school districts in metropolitan Toronto, recitation, and seatwork, CSR may not affect stu-
teachers and students were randomly allocated to dent achievement. In other words, this teaching
fourth- and fifth-grade classes of four sizes: 16, approach may be equally "effective" in large and
23, 30, and 37. In responses to questionnaires, small classes. In contrast, when the teacher car-
teachers indicated they made changes to adjust to ries out small-group or individual instruction, the
classes of different size. Teachers who went from time he or she spends with each student is no
a large to a small class between the two years of longer a constant but a function of class size: as
the study were significantly more likely to like the number of students in the class decreases, the
the current (small) class, report a higher level teacher devotes more time to each student. Ac-
of personal energy, and believe their students cording to this account, a positive effect of small
contributed more, paid more attention, and were classes on student achievement would arise from
more satisfied than students in larger classes. the interaction between small classes and a teach-
However, observations of classroom processes ing approach that emphasizes small-group or
revealed few effects of class size on instruction. individual activities. As far as we know, no pre-
As researchers stated, "class size did not affect vious study has tested this hypothesis.
the amount of time teachers spent talking about 3. Students may behave differently in larger
course content or classroom routines. Nor did it and smaller classes. Student behavior may also
affect the choice of audience for teachers' verbal provide a context for the effect of class size on
interactions; that is, when they changed class student achievement. Research consistently shows
sizes, teachers did not alter the proportion of their that teachers in smaller classes spend less time
time spent interacting with the whole class, with maintaining order (Betts & Shkolnik, 1999; Rice,
groups, or with individual pupils" (Shapson et al., 1999; Stasz & Stecher, 2000). One reason why
1980, p. 150). These findings are important be- student behavior is relevant to academic achieve-
cause, in contrast to the survey-based studies ment is that, given a fixed amount of class time,
mentioned, teachers were randomly allocated to a reduction in noninstructional time may result
smaller and larger classes. Therefore, the ob- in an increase in time spent in instruction, thus
292
Class Size and Instruction on Kindergarten Achievement

benefiting student learning. Another reason is that (Stecher & Bohrnstedt, 2002). The variety in
students' behavior reflects their engagement in these patterns of findings makes it important to
the classroom. In an extensive review of studies of assess the effects of class size for students who
class size and student engagement, Finn, Pannozzo, differ by race/ethnic, economic, and academic
and Achilles (2003) found that students in small backgrounds in the nation as a whole.
classes are more engaged in learning behavior
and display less disruptive behavior compared to Data and Methods
students in large classes. This pattern, they Data for this study come from the Early Child-
pointed out, is consistent with sociological and hood Longitudinal Study-Kindergarten Class of
psychological theories of group size and member 1998-99 (ECLS-K), the only nationally repre-
participation. According to these formulations, sentative study that provides data on children's
students in small classes are more "visible"-thus status at kindergarten entrance. The sample con-
more pressured to be attentive and to participate- sists of 21,260 children enrolled in approximately
and have a stronger sense of belonging, which 1,000 kindergarten programs during the 1998-
encourage them to become and remain engaged. 1999 school year, including children from public
Does Class Size Matter Morefor Some and private kindergartens, as well as from full-day
Students Than for Others? and part-day kindergarten programs. The sample
includes children from different racial-ethnic and
From a policy perspective, it is essential to in- socioeconomic backgrounds, with oversamples of
vestigate whether CSR has different effects for Asian children, private kindergartens, and private-
different groups of students. Class-size effects school kindergartners. The National Center for
may be larger for some students than for others, Education Statistics (NCES) selected the sample
and even when class size has no effect on aver- in three stages. First, NCES selected 100 Primary
age, it may still matter for some students. Such Sampling Units (counties or groups of counties).
evidence would support targeted, rather than uni-
Second, within those Primary Sampling Units,
versal, CSR policies. If class size has different
NCES selected schools using a public-school
effects for different groups of students, targeted
frame to select public schools and a private-school
implementation may maximize its benefit, re-
frame to select private schools. Finally, in fall 1998,
duce its cost, and help reduce the achievement
NCES selected 23 kindergartners within each of
gaps (Ferguson, 1998; Krueger & Whitmore,
the sampled schools (West, Denton, & Germino-
2001; Rice, 2002).
Hausken, 2000).
In Project STAR, the benefits of small classes
NCES collected data twice during the base
were two to three times larger for minority students
(mainly Blacks) than for Whites (Finn & Achilles, year of the study-in fall 1998 and spring 1999.
1999; Krueger & Whitmore, 2001). Likewise, These two waves correspond to the beginning
Wisconsin's Student Achievement Guarantee in and end of children's kindergarten year. The data
Education (SAGE) program evidenced larger NCES collected in fall 1998 provides informa-
benefits of small classes for Black students. Mol- tion on children's status before exposure to their
nar et al. (1999) reported that although there was first year of formal schooling. So far, ECLS-K has
a still a statistically significant Black-White test followed children through their fifth-grade year in
score gap at the end of first grade in all schools, school. NCES will conduct additional follow-up
the gap was smaller in SAGE than in comparison surveys of the sample in the springs of 2007, 2009,
schools. The effect of class-size reduction for and 2011, corresponding to the years when most
academically disadvantaged students has re- of the cohort will be in 8th, 10th, and 12th grade,
ceived less attention, but using data from Pro- respectively. The focus of this article is the kinder-
ject STAR, Nye, Hedges, and Konstantopoulos garten year, so we only use data collected in fall
(2002) reported no additional advantage of class 1998 and spring 1999.
size for low-achieving students. 4 Elsewhere, the Data on children's social background come
CSR program in California found that class size from interviews with their parents or guardians
did not benefit English-language learners differ- during the fall and spring of kindergarten. Data
ently from the rest of students in the state, for on students' cognitive achievement come from
whom class size did not have an impact overall children's assessments at these two waves. These
293
Milesi and Gamoran

assessments consisted of untimed one-on-one The second limitation refers to the period of the
computer-assisted personal interviews in three school year when teachers reported information
cognitive domains, namely reading, mathemat- on their instructional practices. ECLS-K included
ics, and general knowledge. 5 We use data on most of the questions on instruction in the spring
students' performance in the reading and math- questionnaire at the end of the kindergarten year.
ematics assessments. Specifically, we use the Thus, we need to assume that teachers' accounts
reading and math item response theory (IRT) at this point in time are a good representation of
scores, which reflect children's overall perfor- teachers' instruction throughout the whole school
mance in each cognitive domain. Data on chil- year. Even though this validity concern is some-
dren's classroom experience, including class size what worrisome, it is common to research in edu-
and teachers' instructional practices, stem from cation that uses NCES datasets (Brewer & Stasz,
interviews with teachers during the fall and 1996). Indicators in this study may be less valid
spring of kindergarten. than those generated by classroom observation
(usually developed in small-scale projects) but
Data Limitations
equally valid to those in studies such as High
For the purpose of this study, the data ECLS-K School and Beyond, the National Educational
collected from teachers' questionnaires present Longitudinal Study (NELS), the Schools and
two limitations. The first limitation is that informa- Staffing Survey (SASS), and the National Assess-
tion on instructional practices stems from teachers' ment of Educational Progress (NAEP).
questionnaires only. This strategy of data collec-
tion raises some validity concerns, because the Variables and Measures
pedagogical practices reported from the teachers' The two dependent variables in this analysis
perspectives may capture teachers' intentions more correspond to the IRT scores the child attained in
than what teachers accomplish in the classroom reading and mathematics during spring 1999,
(Porter, 1991). As Porter explained: "A teacher that is, by the end of his or her kindergarten
may, for example, intend to be teaching the con- year. We included, as statistical controls, the IRT
cept of fraction when his or her pedagogical prac- scores the child achieved in those two subjects at
tices would be viewed by an external observer as the beginning of kindergarten (fall 1998). In this
drill-and-practice on naming fractions" (Porter, study, controlling for prior achievement is a way
1991, pp. 17-18). Although Porter recommended to parcel out the possible selection of students
the use of external observers for cases in which with different levels of cognitive performance
instruction is linked to student achievement (1991, into classes of different size.
p. 18), this alternative would have been too costly Class size is the main independent variable of
for the national scope of ECLS-K. Nevertheless, interest. ECLS-K did not ask teachers directly
encouraging findings come from one study that about the size of the class they teach. However,
specifically examined the validity of teachers' it is possible to calculate class size using three
self-reports to measure the percentage of time different questions available in the fall kinder-
teachers use different mathematics instructional garten teacher questionnaire. One question asks
practices (Mayer, 1999). Using data from one large teachers about the number of children in their
school district, this study found a strong correla- classes at different age levels (from 3 to 9 years
tion (r = .85) between a composite of classroom old). The sum of the number of children with dif-
practices based on self-reported survey data and ferent ages indicates class size. We obtained a
a parallel composite based on classroom observa- second measure of class size by summing the
tions. Both composites measured teaching practices number of children in each class belonging to
that were consistent with the standards proposed different racial and ethnic groups. The last mea-
by the National Council of Teachers of Mathe- sure corresponds to the sum of girls and boys in
matics (NCTM). The teacher survey did not ad- each class. In the majority of cases, these three
6
equately capture the quality with which teachers measures of class size were identical .
engage in the type of practices advanced by Researchers and policymakers alike have doc-
NCTM, but it was a valid instrument to measure umented the nonlinear nature of class-size effects.
the amount of time teachers use these practices in For instance, Rice (1999) found that the effect of
their classrooms. class size on instruction varied across the class-
294
Class Size and Instruction on KindergartenAchievement

size distribution, such that class size had a small-group activities, and teacher-directed in-
stronger impact on teachers' use of instructional dividual activities.8 Since teachers did not re-
time when there were fewer than 20 students per port this information separately for language and
class-the same limit that the CSR program in mathematics, we introduced these variables in
California implemented. Project STAR and the models predicting both reading and mathematics
SAGE Program in Wisconsin set lower class-size achievement.
thresholds. Small classes in STAR had between Previous research has evaluated whether class
13 and 17 students per class; SAGE classes had size affects not only the distribution and alloca-
between 12 and 15 students per teacher. In this tion of time but also the type of activities teach-
study, we use three categories of class size: small, ers conduct in classes of different size (Rice,
regular, and large. To create these categories, we 1999; Stasz & Stecher, 2000). We constructed
followed the same strategy as Pong and Pallas two reading and two mathematics scales of in-
(200 1). Based on the distribution of all classes in structional activities. The spring teacher ques-
the sample, we divided classes into quintiles and tionnaire contains a 23-item question regarding
subsequently specified three categories of classes: the frequency of teachers' practices in reading
those in the lowest 20% of the distribution, those and language arts and a 17-item question regard-
in the middle 60% of the distribution, and those ing the frequency of instructional practices in
in the highest 20% of the distribution. Classes in math. 9We conducted an exploratory factor analy-
the first quintile, which we refer to as small sis using reading and mathematics items sepa-
classes, have 17 or fewer students. Classes in the rately. We retained two factors in each subject.
second, third, and fourth quintiles, which we refer Each of the four factors had an eigen value larger
to as regular classes, have between 18 and 23 stu- than 1. Based on this evidence, we constructed
dents. Classes in the fifth quintile, which we con- four scales of instructional activities by summing
sider large classes, have 24 or more students.7 In- the responses to the items with higher loadings
terestingly, the upper limit of classes we define as on each of the factors. While the two reading
small (17 students per class) is the same as Proj- scales include four items each, the two mathe-
ect STAR's limit for small classes. This limit is matics scales include three items each. Substan-
lower than the limit established by CSR programs tially, the scales represent extreme positions
in California and Florida (20 and 18, respec- within the reading and mathematics "curriculum
tively) but higher than that of SAGE (15 stu- wars": Whole Language versus Phonics in read-
dents). As a check on our specification strategy, ing and Teaching for Understanding versus Drill
we computed several different measures of class in mathematics.
size and estimated the effect of class size in mod- The Whole Language scale sums the follow-
els with different combinations of linear and non- ing four items: write with encouragement to use
linear indicators of class size (see Table AI in the invented spellings, if needed; read books that
appendix). Because we obtained comparable re- children have chosen for themselves; compose
sults in all these models, we are confident that the and write stories or reports; and write stories in
class-size effects we report do not depend on any a journal. The four items in the Phonics scale
particular categorization of class size. For ease of are: work on phonics, work on learning the names
presentation, we show estimates of the effect of of the letters, practice writing the letters of the
small and regular classes obtained from models in alphabet, and work in a reading workbook or on
which large classes (those in the fifth quintile of a worksheet. Cronbach's alpha is 0.77 for the
the distribution) were the reference category. Whole Language scale and 0.44 for the Phonics
To evaluate other classroom conditions that may scale. The Teaching for Understanding scale
relate to class size effects, we considered two as- contains the following three items: work with
pects of instructional practices: class organization counting manipulatives to learn basic operations,
and teachers' time allocation, on one hand, and solve math problems in small groups or with a
type of instructional activities on the other hand. partner, and work on math problems that reflect
ECLS-K measured class organization and teach- real-life situations. Finally, the three items in
ers' time allocation as the amount of time per day the Drill scale are do mathematics worksheets,
(in a "typical day") in which teachers use teacher- do mathematics problems from children's text-
directed whole-class activities, teacher-directed books, and complete mathematics problems on
295
Milesi and Gamoran

the chalkboard. Cronbach's alpha is 0.63 for (HLM) (Bryk & Raudenbush, 1992) to properly
the Teaching for Understanding scale and 0.59 account for the level of aggregation at which
for the Drill scale. class size occurs. Thus, we treat student achieve-
To evaluate another class condition that may ment as a function of both individual-student and
link class size with achievement, we considered aggregate-class characteristics. An important
students' behavior in class. ECLS-K collected advantage of this two-level framework is that it
this information in the fall teacher question- allows us to test whether different individuals
naire, obtaining teacher ratings of children's within the same context benefit more or less from
behavior in each of the classes he or she teaches, the characteristics of that context. By modeling
on a I to 5 scale, with 5 indicating very well- the interaction between these two levels (i.e.,
behaved classes.' 0 cross-level interactions), we address the question
To adjust for the potential nonrandom assign- of whether class size has different effects for dif-
ment of students to classes of different size, we ferent groups of students within the same class.
included in the model several variables that may The second problem that survey-based studies
be simultaneously correlated with class size and face is selection bias. Selective allocation to classes
student achievement. At the student level, we of different size can occur among students as well
controlled for students' family socioeconomic as among teachers. Students can be nonrandomly
background, race, gender, and age, in addition to assigned to classes based on parents' or schools'
their achievement at the beginning of the school choices. For instance, parents from higher socio-
year, as previously mentioned. Family socio- economic backgrounds may be able to enroll
economic status (SES) is a continuous variable their children in small classes, either because they
that ranges from -4.75 to 2.75. NCES computed select schools with small classes or because they
SES as an average composite of up to five mea- manage to register their children into the school's
sures: father/male guardian's education and oc- smaller classes. Also, while some schools may
cupation, mother/female guardian's education assign low-achieving students to small classes as
and occupation, and household income. NCES a compensatory strategy, other schools may allow
standardized these variables, imputing approx- high-achieving students or students from advan-
imately 10% of each variable through a hot-deck taged backgrounds to systematically secure place-
procedure (West et al., 2000). With respect to the ment in smaller classes. On the side of teachers,
students' race and ethnicity, we included three two contrasting scenarios are plausible. On the
dummy variables indicating whether the student one hand, small classes may operate as rewards to
is Asian, Black, or Hispanic, using non-Hispanic more qualified and experienced teachers within
White students as the reference category. At the schools. On the other hand, it may be the case that
"good" teachers are assigned to large classes
class level, we included three variables: (1) the
either to minimize students' misbehavior or to
percentage of Black students in the class, as
compensate students for any "learning disadvan-
reported by teachers; (2) the percentage of Hispanic
tage" large classes might yield.
students in the class, also reported by teachers;
This article deals with selection bias in two
and (3) an indicator of whether the child at-
ways. We control, at both the student and class
tended a half-day kindergarten program (either
levels, for relevant confounding variables that are
a morning or afternoon class) versus a full-day
simultaneously correlated with students' achieve-
program. ment and their allocation to classes of different
size. Most noticeable among these factors are
Analytic Strategies
student's family SES and prior achievement,
Two methodological limitations abound in characteristics of classes such as SES and racial
studies of the relationship between class size and composition, and indicators of their "remedial"
achievement that rely on survey data reflecting character. Second, to account for the possible
variation in class size that occurs without exper- nonrandom assignment of teachers to classes of
imental manipulation. The first is that class size different size, we carry out a fixed-effects re-
is treated as an individual (student) attribute gression at the teacher level. Following Betts and
when, in fact, it is a contextual variable (Pong & Shkolnik's approach (1999), we take advantage
Pallas, 2001). We use hierarchical linear models of the fact that 16% of teachers in ECLS-K teach
296
Class Size and Instructionon KindergartenAchievement

two kindergarten classes in the same school, that PlIj = 700 + 0o, (Mean Fall Achievement)j
is, two classes in the same school where there is
at least one ECLS-K sampled student. In the + 702 (Mean Family SES)j
fixed-effects model, estimates of class-size ef- + 703 (Small-Size Class),
fects are based solely on the different class sizes
that individual teachers encounter. This model + Y70
(Regular-Size Class)j
draws inferences about class-size effects from
within-teacher variation in class size rather than + 70, (Half-Day Program)j
from between-teacher variation in class size.
+ Y7 (Percent Black)j
With the exception of class size, this model holds
constant all teacher characteristics, both observed + 707 (Percent Hispanic)j + g,oj
and unobserved. Even though this strategy de-
creases the statistical power and generalizability This first model, and in particular the coeffi-
of results, it has the important advantage of elim- cient 703, allows us to address the first research
inating possible bias resulting from unobserved question regarding whether class size is associ-
heterogeneity among teachers. Comparing the ated with student achievement. To answer the re-
estimates obtained from the HLM and fixed- maining research questions, we successively es-
effects regressions provides a robustness test for timate three other models. Models 2 and 3 differ
our results. from the first only in that they include additional
HLM Models class-level covariates. In these two models, p0j is
still the only Level-2 random effect. In the final
Our conceptualization established class size model, we allow certain slopes to vary across
and instruction as the two crucial contextual pre- classes. The Level-I equation remains the same
dictors of student achievement. Because both in all models.
class size and instruction take place at the class- The second model adds two of the class con-
room level, it is appropriate to carry out an analy-
ditions that may matter for class-size effects-
sis where students are nested within classes." The
the instructional practices teachers carry out in
specific within-class model for kindergarten
classes and the behavior of students in the class:
achievement Y,j of the ith student in thejth class is:
01 = 70o + 70, (Mean Fall Achievement)j
(Spring Achievement)j
+ "" + ,,0(Whole Language/Teaching
= 0,j +P,j (Fall Achievement)ii
for Understanding Scale),
"+P2j (Family SES),j + P•3j (Black)j
+ Y09 (Phonics/Drill Scale)j
"+ 1,j (Hispanic) 0 + 3,j (Asian)j + P,j (Male),
+ 7010 (Whole-Class Activities),
"+Pj (Age),j + r,
+ 70H (Small-Group Activities)j
where p30j is the intercept, P3I-7j are slopes, and r,j
is the student-specific random error. We center + y,02 (Individual Activities)
fall achievement, family SES, and the indicator
for Black at their class means, and we include the + 7013 (Fall Class Behavior)j + ot
corresponding class means in the Level-2 equa- In the third model, we add to the intercept in-
tion for the intercept. The rest of the variables are teraction terms between class size and instruc-
in their original metric. In the first model we es- tional practices:
timate, we allow the intercept Poj to vary from
class to class, but we constrain the slopes associ- Poj = Y00 + Y., (Mean Fall Achievement)1 + +.-
ated to all student characteristics to be equal
across classes. Thus, in this first model, only the 70,4 (Small-Size Class * Whole Language/
equation for the intercept represents variation be- Teaching for Understanding Scale)1
tween classes. We set this intercept to be a func-
tion of the following classroom characteristics: + 70,5 (Small-Size Class * Phonics/
297
Milesi and Gamoran

Drill Scale)i + 70, 6 (Small-Size Class * could not match students who had no teacher
identification number associated with them or
Whole-Class Activities)i teachers who had no student identification num-
ber associated with them. Out of the 3,832 possi-
+ 7 0 17 (Small-Size Class * Small Group
ble classes, 3,549 had student and teacher
Activities)i + Y0,8 (Small-Size Class * records that matched, meaning that we retained
92.6% of original classes. These 3,549 classes
Individual Activities)i + g0j contain 18,173 sampled students.
As a result of missing data, we excluded ap-
Models 2 and 3 address our second research
proximately a third of student and class cases. At
question regarding classroom conditions and their
the student level, achievement scores constitute
relation to class-size effects on achievement. In
the main source of missing data. In the analysis
Model 2, the coefficients yo8, yo9, T1o, ]'11, and 712
of reading achievement, we excluded 17% of stu-
show the impact of instructional strategies on stu- dents because either their fall or their spring
dent achievement, and the coefficient T13 does the scores were unavailable. For the same reason, we
same with regard to student behavior. In Model 3, excluded 12% of students from the math analy-
the coefficients 7T4, To15, T016, TO17, and To,8 assess sis. At the class level, we used variables from
whether instructional practices are particularly ef- multiple ECLS-K instruments, which magnified
fective at raising student achievement when the amount of missing cases. While 14% of
teachers carry them out in small classes. classes do not have information on class size (a
To address our third research question, regard- variable we draw from the teacher Fall A ques-
ing whether class size has different effects for dif- tionnaire), an additional 9% of classes have no
ferent students, we estimate cross-level interac- information on the instructional activities we use
tions in Model 4. This model allows variation to create the math and reading instructional fac-
across classes to depend not only on a random in- tors (variables we draw from the teacher Spring
tercept but also on random variables. We allow the A questionnaire). Because our analyses rely on
slopes of three predictors to vary across classes: the nested structure of students within classes,
fall achievement, family SES, and the Black indi- missing data in any of the class-level variables
cator. Thus, in Model 4, the Level-2 equation for imply that no students in these classes have a
the intercept Poj remains the same, but the specifi- complete record at Level 2.
cation of P31j.P 2j.and P3j changes to12 :
3
To minimize missing data, we performed mean
imputation in the case of four independent vari-
P,= T70 +T, (Small-Size Class)j + p.,j ables that had a substantive amount of item non-
response, namely, students' age at kindergarten
P32j = Y20 + 72, (Small-Size Class)j + lt2i
entrance, teachers' rating of class behavior, and
P3j T
730 + T'3 (Small-Size Class)j + l`t3j percentage of Black and Hispanic students in the
class. When carrying out the regression analyses,
where 3 kj = yko, k = 4, 5, 6, 7. we used these four mean-substituted variables in
The coefficients Ti1, y2 , and yj I test whether addition to four dummies (one for each of these
the association between student achievement, on variables) indicating whether we had imputed a
one hand, and prior achievement, family SES, value for that particular student or class. For the
and being Black, on the other hand, is different rest of the variables, we applied listwise deletion.
in small classes from regular or large classes. In The HLM analyses have an average of 7.3 sam-
other words, these coefficients allow us to exam- pled children per class. Models that predict read-
ine whether small classes contribute to reducing ing achievement use a sample of 11,567 students
within-class achievement gaps among students and 2,437 classes. The sample for the math
with different academic levels, among students achievement models includes 12,153 students and
with different SES backgrounds, and among 2,556 classes. The slightly larger number of cases
Black and White students. in the models of math achievement is due to the
ECLS-K provides files at the student, teacher, fact that Hispanic children who were only orally
and school levels, so we had to construct a class- proficient in Spanish at kindergarten entrance took
level file by matching students and teachers. We the math, but not the reading, assessment.I3
298
Results with previous literature, teachers do not appear to
be opting for particular methods of instruction
DescriptiveResults according to the size of the class, since the rela-
Table I presents the descriptive statistics for tive frequency of different instructional practices
all the student-level and class-level variables we is similar across small, regular, and large classes.
use in this analysis. Table 2 presents these same Also in line with prior research, teacher ratings
descriptives but lists them separately for small, of class behavior slightly favor small classes. Fi-
regular, and large classes. In both tables, we nally, compared to large classes, small classes
weighted the statistics at the student level but not had a higher proportion of students reading
at the class level (see note 11). In the case of the below grade level (22 versus 16), a higher pro-
four variables we imputed, Tables I and 2 show portion of students with math skills below grade
the descriptive statistics for both the original and level (17 versus 14), and a higher proportion of
the mean-imputed variable. students with diagnosed disabilities (13 versus
Table 2 shows some evidence of the differen- 8). This suggests that at least some schools are
tial allocation of students to classes of different using small classes as a compensatory strategy
size. Although students in large classes were for academically disadvantaged students.
0.03 SD below the mean of SES background, stu-
dents in small classes were 0.03 SD above this HLM Analysis
mean. Racial minority students also seemed Tables 3 and 4 show the coefficients from the
more likely to be in large classes. Within large HLM analyses of reading and math achievement,
classes, 17% of students were Black and 22% respectively.
were Hispanic. In contrast, Black and Hispanic
students represented 14% and 13% of students What is the effect of class size on student
within small classes. Students in large classes achievement?
were slightly younger (by less than half of a The first model we estimate is a random-inter-
month) than their counterparts in large classes. cept model that examines whether class size is
Even though there are some differences in the associated with student achievement. Results show
type of students who end up in small, regular, or no association between class size and student
large classes, achievement averages are similar achievement in either reading or math (Model 1
for students attending classes of different sizes. in Tables 3 and 4). The estimate for small class is
In math, students in small classes scored 0.36 negative, indicating a small-class disadvantage
points higher than students in large classes in the compared to large classes, and the estimate for
fall of kindergarten. In the spring of kinder- regular class is positive, indicating a regular-
garten, the difference was 0.33 points, meaning class advantage compared to large classes, but
that during the school year students in small and none of the coefficients is statistically different
large classes gained the same in math scores. In from 0.
reading, fall scores were virtually identical for
students in small and large classes, but students What classroom conditions matterfor
in large classes gained 0.27 points more than stu- class-size effects?
dents in small classes between the beginning and Research on class size usually understands in-
end of the kindergarten year. struction and student behavior as mechanisms
Some differences also appear at the class that mediate the effects of class size on student
level. Large classes were more likely to be part achievement. Although our results show no ini-
of kindergarten programs that lasted longer. tial association between class size and achieve-
While 68% of large classes are in full-day ment (Model 1 in Tables 3 and 4), instruction and
kindergarten programs, 46% of small classes are student behavior could still affect the association.
in such programs. Teachers in large classes seem For example, if less effective teachers or more
to spend the additional time they have carrying unruly students are disproportionately assigned
out more instruction, because they report a more to small classes, then instruction and student be-
frequent use of all types of instructional activities havior could suppress the positive effects of
compared to teachers in small classes. Consistent small classes. Model 2 allows for this possibility
299
TABLE I
DescriptiveStatistics
Variable Obs. Mean SD Min. Max.
Student-level variables
Fall reading IRT 11,567 22.11 8.27 10.08 68.66
Spring reading IRT 11,567 32.22 10.06 11.00 70.80
Fall math IRT 12,153 19.37 7.22 6.90 57.02
Spring math IRT 12,153 27.57 8.69 7.54 59.20
Male 12,153 0.50 0.50 0.00 1.00
Black 12,153 0.15 0.36 0.00 1.00
Hispanic 12,153 0.18 0.38 0.00 1.00
Asian 12,153 0.02 0.15 0.00 1.00
White non-Hispanic 12,153 0.60 0.49 0.00 1.00
Other race 12,153 0.05 0.21 0.00 1.00
Months at Kindergarten entry 11,784 65.84 4.03 50.00 84.00
Months at Kindergarten entry, mean imputed 12,153 65.83 3.97 50.00 84.00
Family SES (composite) 12,153 0.00 0.78 -4.75 2.69
Location in reading ability group
No ability grouping in class 11,074 0.60 0.49 0.00 1.00
Low-ability group 11,074 0.14 0.35 0.00 1.00
Middle-ability group 11,074 0.07 0.25 0.00 1.00
High-ability group 11,074 0.19 0.39 0.00 1.00
Class-level variables
Class size 2,556 20.38 4.77 2.00 52.00
Percent black 2,440 16.34 26.00 0.00 100.00
Percent black, mean imputed 2,556 16.35 25.41 0.00 100.00
Percent Hispanic 2,440 17.81 28.85 0.00 100.00
Percent Hispanic, mean imputed 2,556 17.87 28.19 0.00 100.00
Teacher rating class behavior fall 2,518 3.42 0.82 1.00 5.00
Teacher rating class behavior fall, mean imputed 2,556 3.42 0.81 1.00 5.00
All-day class 2,556 0.54 0.50 0.00 1.00
Half-day class 2,556 0.46 0.50 0.00 1.00
Morning class 2,556 0.26 0.44 0.00 1.00
Afternoon class 2,556 0.20 0.40 0.00 1.00
"Whole-Language" activities 2,437 17.17 4.74 4.00 24.00
"Phonics" activities 2,437 21.08 2.61 4.00 24.00
"Teaching for Understanding" activities 2,556 12.32 3.01 3.00 18.00
"Drill" activities 2,556 8.75 3.54 3.00 18.00
Frequency of
Whole-class activities 2,556 3.48 0.87 1.00 5.00
Small-group activities 2,556 2.97 0.84 1.00 5.00
Individual activities 2,556 2.35 0.68 1.00 5.00
"Remedial class" indicators
% children reading below grade level 2,489 0.18 0.15 0.00 1.00
% children w/math skills below grade level 2,489 0.15 0.13 0.00 1.00
% children with diagnosed disability 2,489 0.10 0.13 0.00 1.00
Notes. (1) Descriptive statistics are weighted at the student level (using BYCOMWO) but not at the class level. See text for de-
tails. (2) Because the regression analysis for mathematic achievement includes more cases than for reading, descriptive statistics
reported here correspond to the variables used in the mathematic analysis. The exceptions are variables that are only relevant for the
reading regression analysis (fall and spring reading scores, and reading instructional factors), as well as variables included in the
additional analyses (students' locations in reading ability groups and indicators of "remedial classes"). See text for details.

300
Class Size and Instruction on KindergartenAchievement

by adding the indicators of instructional practices Does class size matter more for some students
and class behavior to the estimation of the ran- thanfor others?
dom intercept. Irrespective of the relation between class size
Results show that teacher-directed whole- and achievement on average, class size may have
class and small-group activities have a positive different effects for different groups of students.
association with student achievement, but the Model 4 tests this hypothesis by allowing the SES,
coefficients are statistically significant only in fall achievement, and Black slopes to vary across
reading. All the scales of instructional activities classes and including small class size as a pre-
(Whole Language and Phonics for reading and
dictor of between-class variation in these slopes.
Teaching for Understanding and Drill for math)
In both reading and math, Model 4 shows that
are positive and statistically significant predic-
none of the coefficients for the cross-level inter-
tors of achievement. Consistent with the litera-
actions is statistically significant. In other words,
ture that supports a balanced curriculum, these
the effect of class size does not vary across racial/
findings suggest that students benefit from both
ethnic subgroups, and class size does not matter
pedagogical approaches in reading and math
more either for economically or academically dis-
(Kilpatrick, Swafford, & Findell, 2001; Snow,
advantaged students.
Burns, & Griffin, 1998). As in the baseline
model, the coefficients for small and regular Additional HLM analyses
classes are not statistically significant in either
To make sure the statistical nonsignificance of
the math or the reading analysis. Although our
class size was robust, we carried out several ad-
results indicate that instructional conditions
ditional HLM analyses (not shown). First, we an-
affect achievement, class size is not involved.
alyzed half-day and full-day classes separately.
Regarding student behavior in class, the esti-
None of these models, predicting reading and
mates show a positive association between well-
behaved classes and student achievement, but math achievement, presents a statistically signif-
this association is not statistically significant icant coefficient for the small class-size indica-
and does not alter the association between class tor. In a second set of models, we included indi-
size and achievement. Thus, there is no indica- cators of the student's location in reading ability
tion that either instruction or student behavior groups within the class as student-level controls
is suppressing the effect of class size on student in the equation that predicts reading achieve-
achievement. ment. Using the "middle ability group" as the
Even in the absence of main effects of class reference category, results show a significantly
size, certain instructional practices may still be positive effect of being in a high-ability group
more effective when teachers use them in and a significantly negative effect of being in a
smaller contexts. To examine this possibility, low-ability group. Being in a class with no abil-
Model 3 adds to Model 2 five interaction terms ity groups does not exhibit a consistent coeffi-
between the small class indicator and the activi- cient. However, the inclusion of these indicators
ties teachers carry out in the classroom. Among does not alter the class-size estimate. It is still
these estimates, the only statistically significant small and not statistically significant.
coefficient is the interaction between small class Finally, we considered one possible criterion
and teacher-directed small-group activities in for student selection into classes of different size.
the model that predicts reading achievement. In particular, we explored whether the lack of as-
The sign of the coefficient, however, is contrary sociation between small classes and achievement
to our expectations. The negative interaction in- may result from the "remedial" nature of small
dicates that in classes with fewer children, classes. Thus, in the Level-2 regression equation,
small-group activities are less effective for rais- we included the percentage of children reading
ing the class mean reading achievement. Over- below class level, the percentage of children with
all, we detect strong effects of instruction, no av- math skills below grade level, and the percentage
erage effects of class size, and little evidence of children with diagnosed disability in each
that the effects of class size depend on instruc- class, all reported by the teacher. As we men-
tional conditions. tioned, these three "remedial" characteristics are

301
Co '. - W)( )(0000 0000C N- 000C0 0 0 0000
N- ON -c 0
=. 0 0 0 0 )00 000CC
0 C)C
'. 00 r-: - - -- ' kf)'/ 0 0 00
'.0 '.0 V-) kr 00 00

00 inl W
' 0 0 CD C 0 C0 C) N 00=00 0000000

V0 CN r- 000
C 0 0 0
C-'ir- 666W 6 CD 000DC Cw0 0000 00
"" 6666 Cl)o-
r- 00 C) o6m

r- 00 r~~- 0t
m 0
-n - 0 0 N'.ONO -'.0ONN00
/-'0N NONN'.0r
6 6666 6--
N N - N N
In~ V- kn
N- NO NNNOO NN0N ON
rn kr) N N- N N NC004
.0
kr V) Wf
) Cj
Lr) r
0 N ý a, ON ý CN ON ON ON 0N
N
C.0
Nq 000 Ný
'00 N cN 0 000=)0 000 0 0 0000
'0 00 n Cl0 0 C0 0 q 0=C 00 o~ 00o
00 (= r-00 - - - - - -- t 0 0 q C
'.0 N- Wýk 00 00 Nl 00 CD00

0000 000000

0000000 k 00000 N 000 : 0000000


mC,ýcCýr
w 7,r-00 c m

000 W M 00 V)

M~ cq
- 0 W) - CD NONO

0ý C) C C) 0 k) r)00 00crn (

ON W)
N 0 66
00 N N• N ON N- ON N ON, - ON
.0 N
00 00 00 00
0
N-00 0 c~ 0 00N- CD kK. kK

'. - 'CD CD
- D0 '.0 0 ON N '. 0 C'. 0 '0C)~ '.
00 C 0 0
00 C 0 0 t-o6oooo0n
't N) ON N 0 66 m
V'n 6 m0 v
-0000

N- 000D0 CD00 00 000


-- C\ - 0t 0 C C NC
ý C0 000 NOOOO--0
'.6
'.6 V.6 k.6 V6'.0
W.0V.0 W.0
'0 kn
kn00000000000
V) knwV 00 r-r4 )N t r C
MM"t r N -
C4 N 00
4C4C 04C 04C04C
4r
z

0 666 W) W
0
CO
6~~ ~ ~ , ' z N) N CN
U

00 0000 00 V)0
ý kn, ýCr OnCr,W
.0 N- N- N N -r -r ýr kr) W)
If) W) kr) kf)
0 N4 Nq N4 N4

ZR Z

to r-cz to
0 ~ to-'- c

CCto >, 0
O toCto)..

>*to -- Z =- z-ý" -r
cd cl D 0 m c

t4~~~C =O. =e 0 ; C, 0
r. z 0ýC ,C , cz
N
cn~ 00000cn)

302
C)
000000
o CDoo
C)
0
0C)
)
0)
)
C)C:
0
C
000
)C
:
00
I 00
00
m~

CD D 0C)0 C 00 C) ) ) ) 0 C)

r-COVýýc
-q (

0000000 0 000 0- 0

CD CD CD0Co - CDOC

C~~~ C Cl
-Cl- C) C) C C)C) : 0 C

CD 1= C Cl C 1ClC>Cl C

0000000 0 00CD0 0 0 C
00 00 0 0n 0 0 0 0 t:

C) -Lr)V ~ 0 00)
n r ý t
-: C4
-: 0cr
C

C).- i

0q00 00)00 0 0 0-k)00ff 0 C

0
C) ~ ~~ 0 ~MC ~N I: 0ý
CC)C~ ~l C ~ *
)
C) Cc

0O C7 rC7,
0FC00 001
ct C 0ýOlal 0ý r qV
ýc 000\ -0
rn M m

W ~ k): ~ ~ ~ C -0k)W

C)~~~C C ,CDC
(=C C) C
0CD 000Cl0C00 - CD CD0 CD- q- O
M00---c
00 6
0~.' ) 6
0 0

C)l 00 wC I C) C))

C)C C) 0f CD)-

CD 0 ) C)\C0 N 0 mO CO\O4
cq d e5

RCC

C3 to CO)

11
C.)
0r- >- 0 0r-

C,3 3 CO

to~~ uC
.24
u

~-~eq'C
~~I~~C
u1
C/~~~C
C/110 C
u-
l
~ u
Cl
CID
Cf
.z.
O )
m)

C13 lu

303
* * * *
* * * * * *
"CO
00 ~ *
C] 00 *5 C]
* *
t 0
*
N 0 ON ON r
* *
~-0cqrf 0 Cl 00- C0 'IT C]4 N 00
N - ON CD - C] - C] -
0 00 0ý ON ON 0 0ý C] q It N- N*\O
C) 0f 0 C] \ * -- 0 0 C- 0 0
666
-
666 -0 666 020 C) 00CD CD
0)
0
0

00- z,
C- ON
w 00 00 Cfl
ON 0 0t 00-0 C]r 0
m ' r-0 0 C]0 0
66Oý6 66 6 6 6

* * *
* * * *
*
* *
*
*
*
*
*
*
* * *". * * * * * * *
ON 0
*- ON 0 00
C14 N00 ON ON C-C - 0 '0 0 mC 00- - t C] C] ON
C] v-C 0 C]- 0 C] ONO 0 0 C] m C)0 t N t NO C]
02
6 65 6 02 0
6 -0 -0--CD D 0 020202 00 C] C] e
0 0 0 000 66666
0)
0

0 0 kn 0O rn 0 0 knC 00OW r N OCf 0 N V C' 00) kn ON , C]71


-0t
C/ C] 00NO
\0 V N
Nr-C On 001 r- -0 00
W MC 0 V) 0 n S!
0 N NON~0N020202-C], -0 C]
6 6 0-0 0

* * * *
* * * * * * * *
* * * * * * * *
C/-C V-CC/-C 0 00 WCr- 0 ON 00
mC3 t 00 C] - C] NO
ON OC] ON 00
W')CC0 ONO 0 0- CC 00 C] 05
0
6
-C]
66
0 0 0202 0
C] 6D 0 00 6 666
0
0

ON 00 m 000- 4C0C C] 0

I I

00 0- Cr

* * * *
* * * *
* * * *
v-C
OCD O 0 U 0 N - NO C] ON CC t
ON 00 t 0 00
NO
0 0 - C] C] -00
6 6 666666
0)
0
-00000
C] - N C] N ON N N
'rC N 0000 ON -0
NO ONC]--CCOO
6 6666

0
0)
0)
.E
7 0
CO 0~ 0) 0
0 02
0 V) .
.2 C] ca r 00
A
-~ON 0.- ¶0 CZ 0.

0
u ~ 0 C CCC (i 0
0
CO 02
~
0) ~ ~ C t, >~ >OC -X X X
00CO CO -Cd~~
cn CC
~*~O
-.- ~ O~ ~ C A
0)mC.) COC - CO
x COCx. OC 0.
>C./2. -
C-C CCCC.to C. U UO U) U9
0) 'I-C
-C S E 02
CO 00 CO uO~C C
Co0
C 0 ~ 0 ~ . C O C O C
A
0.

304
* *
* *
* *
-C-N w0 m 00 CY\ - C') C') n 0
OC ON 000.0 " fl) 0 t mC 'D0 C-4 - 00 m~
- C') - ON -aNC t a 00 CD )CD'l 0 0000 r- CIr ý00oN
0 00
r C') C)N -D
666
a)
-o
0

00 00 Iý 00 kn - t/N 0 '.0 N- 00 00 mC w
7' a, C'N- ONNNm' ON 00) A' 'o0-O Cl 0 N00 00 '.0 0 - C CN 0k0n .
C/N cCN - C) - C') C Ol 0 rlO C N0000---2 0C

*
Nl ON N 0 00 ON 'c01 C') 00ONC' C' q q .0 'IN ýc' r14
C'T ONt
C7 c
00 CNON 0 00 r0
N) C/ t 00 ON lz
0o
CN
q2 65 02 CN C) CN C') CD 0 D0 0' CD 00 02
-000-000
C0 0
a.)
0
0

C"NO N'.0 IN r-0 00 at r- r-- -zt wC-NCC m . ON, 00 N


0- C/NC/N 0 C/N 00 ff) ON - 7ON 0 N r- 0 Cl4 - Vl
f ON 00 ',0
,'. --
0 T
C'NCCN ON
6 0
C')~ ~
l'0 ~~C
2D200-
ON C') 00 C?)
0

* * *
* * * *
* * * * *
r- 0 '.0 00 Cl4 00 00 C') C') 00 '.0 N at'
00 00 ON 00 C')
OC ,.-• Cl 0
6c660 02
Cl q 6 0
coo 6t -0202
6oo 0
020
0 0 0 C
0

a)
0
0

0l C/ C/N ONON 0D
ý C/N - - D0 '. 00 /
0'N / Cl O C) T ' .0 ON 0 0a 0/
00
0
2 t-- 0C0CO

CC)
* *
-O * *
C.) *Cq* * *
0 * cq)
'.0 00 N'00
'.0 C') C')
Ca 6: Cl 00
U -0202
0 6oo
2CC) C11Om C1N knO
-0 0- m l 0,
6:
0 66 66 Cd
CO C') 0 C/N
N Cl 0
CC) '0 0- 000
0-1 '

0
CC)
"0
.2 CI
r*

.2
C..
0 CI
CCN
E >6 0 C

0
0
CC) .0 0
- 0 r-
0 A
CO A0 0.
>
cn > *
CO COr

a)
C.
a)
A -o Ct m m 02
20
C..
A
La. CO a) CX c U
0)wzuu3 n *-c
nc 0.
U
Ca U *
CO 2 CO
as
C/N
LL 2 . - 2 22 02
00 . Ca
00

<0 A
0.
E-U

305
Milesi and Gamoran

more frequent in small than in large classes (see served teachers' traits that affect student achieve-
Table 2). By including them in the model, we ex- ment, estimates from the fixed-effects regression
plored whether the "remedial" nature of small provide a robustness test for our previous results.
classes is suppressing an association between Following Betts and Shkolnik's procedure
small classes and student achievement. How- (1999), we included a dummy variable for each
ever, these models show the same lack of associ- teacher. Variables are the same as in the HLM
ation between class size and achievement. analysis, except for the "natural elimination" of
those characteristics that do not vary across teach-
Fixed-Effects Regression
ers, such as the half-day class indicators (because
Perhaps the apparent lack of class-size effects all the "selected" teachers teach half-day classes)
reflects the nonrandom assignment of teachers to and instruction variables (because teachers re-
small and large classes. To counter this possibil- ported their instructional practices in general, not
ity, we estimated fixed-effects regressions for with respect to each specific class they taught). 14
math and reading achievement. In this model, all Table 6 presents the results of the teacher fixed
between-teacher differences that might bias the effect analysis for reading and math achieve-
estimates of class-size effects are ruled out. As ment.1 5 The results are roughly equivalent to those
we mentioned, because this model relies on the for HLM. The coefficients for the small and regu-
variation between the classes that individual lar class-size indicators are both positive but far
teachers encounter, teachers who teach only one from statistically significant. Although not shown,
kindergarten class in the school effectively drop the same nonsignificance of class size appears in
out of the analysis. models where we specified class size as a linear
Table 5 indicates that in ECLS-K, teachers term, in models where we included both linear and
who teach two kindergarten classes usually teach nonlinear class-size variables, in models where we
smaller classes than teachers who teach only one accounted for the indicators of "remedial classes,"
kindergarten class. In this sense, results from the and in models for reading achievement where we
fixed-effect regression are less generalizable than controlled reading ability grouping.
those from the HLM models, which are estimated
on all teachers, with a greater representation of Discussion and Conclusions
large classes. Table 5 also shows that teachers Our research on ECLS-K offers no evidence
who teach two kindergarten classes have a nar- that class size affects reading or mathematics
rower range of variability in class size than other achievement in kindergarten. Class size does not
teachers (the standard deviation of class size is affect achievement on average, nor does it affect
9% smaller for teachers who teach two classes the achievement of particular groups of students.
than for teachers who teach only one kindergarten Because ours was a nonexperimental study, we
class). Since the identification of class-size ef- were especially concerned with selection bias,
fects depends on this within-teacher variation in and we used two strategies to address this issue.
class size, this set of teachers provides limited First, HLM models control for a theoretically dri-
identifying information and restricts the model's ven set of covariates at both the individual and
statistical power. Nevertheless, because the fixed- the class levels that account, at least partially, for
effects estimates account for observed and unob- the association between unobserved selection

TABLE 5
Class Size Variation Among Teachers
Class size Small classes Regular classes Large classes
Teacher category N Mean SD (< 17 students) (18-23 students) (24+ students) Total
All teachers 2,851 20.22 5.16 23% 55% 22% 100%
Teacher who teach 2,409 20.38 5.21 21% 56% 23% 100%
one class
Teachers who teach 442 19.35 4.74 32% 52% 16% 100%
two classes

306
*
*
* G, *
00
*'..R* ON
00 0 0 C/N
00 cl) ) 0 Cl0-
02
0 6 6

*
*
*
ON \0 CN ýc w 0ý C/)
cliI C) 00 C140 CN m~ 00 r
00 ON100 00 V/N "T CN N C,4 ON
O/N 0- tn m ON 0V 00
n, In r ON
'0

* *
* * *
* *c Cl C ON C/N
00- / ON 00
- 0 0 00
666 6

,--* * C14 0: ON\kn Cl rr N N *,


I/n \0 0) C/ ON C/N N Cl *ý
C/N - N N
0000L Incq-Cl*'0 2 o6
000)CDC
0

0)

0) 00-
0)
* *. *R
0 * * * C/N * *C*
*, * *.
Cl ND ON C
6 000000-

*
0 *
00 *
N 00 t ('N ON C/N
Cl000-0- -D 0 ff Cl 00w CNC/
NONOO-ONO N Cl
0
0) oo-6-66 0 c0 0 00 c 0 ON
I I
NO.
0)

0
00
C-
0)
0 * * *
* * * * *
* * * * *
N ON NO Cl
00 0 * * N 0- NO
00 02
6 6 66
0

0
0
0
C-C
Ca ON Cl ON t Cl C/N
0) 00 NO 00 m~
k 0-ONO
00 0 -0- 00 ON -0 o0 ON
0)
o6
Ca

0
I 0
0)

C-. a0.4 A
0.
0)
-0 al)
0
0
0)
F-C Q0)Q 0

0 A
C--
0, 0.
Ca
0 CO 0 qC *
NO 0 C/N
0) 0
*0
A
<0 0.

307
Milesi and Gamoran

factors and students' subsequent achievement. efficacy of the specific mechanisms favored by
Second, by using teacher fixed-effects regres- NCLB: teacher qualifications and subject-matter
sions, we eliminated unobserved heterogeneity competence, and evidence-based practice. This is
among teachers-in this case, a type of hetero- a matter for further research.
geneity that could have caused the class-size ef- Our findings for class size clearly differ from
fect to "disappear." Our findings were robust to those of Project STAR, a randomized field trial
these alternative specifications. that revealed a positive effect of small classes for
A limitation of our study is that we do not have student achievement in kindergarten through
full information about why some classes are Grade 3. STAR occurred under specific schooling
smaller than others. Lacking this information and conditions, including an ample supply of qualified
in the absence of randomization, we cannot be teachers, adequate and sufficient classroom facil-
certain that all selection bias has been eliminated, ities, and a population of students that was not as
despite our efforts to rid the study of such bias. diverse as this study or many others. It is impor-
Consequently, our causal claims cannot be con- tant to examine the settings and incentives that
sidered as strong as those of a randomized trial. characterized STAR because the generalizability
Still, our findings stand up to numerous plausible of its findings are likely to depend on the condi-
specifications, and they have the advantage of tions under which it occurred. The CSR program
greater generalizability as compared with exper- in California illustrates some of the "side effects"
iments that are necessarily limited in scope. of reducing class size at a statewide scale, with a
Although we did not find effects of class size, diverse population of students and with financial
we did find significant effects of classroom in- incentives that were equal for all districts, even
struction on student achievement. In reading, we though more disadvantaged districts required more
found that additional time spent on both whole- resources to meet the CSR goal of 20 students
class and small-group activities boosted achieve- per kindergarten through third-grade classrooms
ment, a result that is consistent with prior re- (because of teacher scarcities and lack of space).
search showing the value of instructional time in One of the main side effects of CSR in California
early reading (e.g., Barr & Dreeben, 1983). was a drop in teacher qualifications that dispro-
Moreover, in both reading and mathematics, we portionately affected disadvantaged schools. In
observed that both of the instructional orienta- elementary schools serving the fewest low-in-
tions we examined-whole language and phon- come students, the proportion of fully credited
ics in reading and "teaching for understanding" kindergarten through third-grade teachers dropped
and drill in mathematics-contributed to cogni- by 2% from 1995-1996 to 1998-1999 (i.e., from
tive performance. Our findings are thus consis- the year before the intervention to the third year
tent with the emerging consensus that a balanced of the intervention); by contrast, schools serving
instructional approach employing various teach- the most low-income students experienced a 20%
ing activities offers the best prospects for raising drop during that same period (Stecher & Bohrn-
student achievement (e.g., Kilpatrick et al., 2001; stedt, 2002, Table B. 18, p. B-9). Perhaps not sur-
Snow et al., 1998). What we did not find, how- prisingly, the CSR Research Consortium sum-
ever, is much involvement of class size in these marized the effect of the California CSR on student
instructional effects; on the whole, instructional achievement as being "inconclusive" (Stecher &
effects appeared equally consequential in large Bohrnstedt, 2002).
and small classes. Our results thus resonate with The CSR initiative in California, as well as
other studies that indicate that what happens in- CSR programs in several states throughout the
side classrooms matters much more than the nation, have relied heavily on the evidence of
structure in which those activities occur (for a re- Project STAR to justify the sizable investments
view, see Gamoran, Secada, & Marrett, 2000). they entail. However, expecting CSR programs
Interestingly, the finding that classroom in- in the "real world" to have outcomes similar to
struction rather than classroom structure affects those of STAR requires the critical invariance
achievement may also be consistent with NCLB's assumption to hold. This assumption states that
emphasis on teacher quality. Whether NCLB will "the experimental version of a program must
in fact raise achievement through the sorts of in- operate as would an actual program" (Manski,
structional activities we examined depends on the 1995, p. 53). The critical invariance assumption
308
Class Size and Instruction on KindergartenAchievement

is questionable in the case of CSR policies be- mathematics and reading, respectively. In Grades 2 and
cause most CSR programs have diverged from 3, the small-class advantage ranged from 0.20 to 0.26
the implementation conditions of Project STAR in these two subjects (Finn & Achilles, 1999, Table 1).
2
In STAR, participating schools received approxi-
(Mitchell & Mitchell, 2003). As Manski pointed
mately $2.5 million per year. These funds had to be
out, violating this assumption does not imply that
used only to hire the additional teachers and aides re-
experimental research, such as Project STAR, is quired by the project (Ritter & Borouch, 1999). There-
uninformative, but it does imply that "one should fore, schools that required additional resources to
not expect the distribution of outcomes in a ran- carry out the experiment, such as extra classrooms,
domly selected treatment group to coincide with were probably unable to participate in STAR (Biddle
the outcomes that would be realized in an actual & Berliner, 2002).
3
social program" (1995, p. 54). In fact, under the Class size and teacher-pupil ratios are different
current, naturally occurring conditions of U.S. measures (Ehrenberg et al., 2001). Student-teacher ra-
kindergartners, as represented in ECLS-K, re- tios tend to underestimate true class sizes (Rice, 1999),
ducing class size may have little impact on stu- because teacher-pupil ratios include teachers who do
dent achievement. Thus, rather than contradict- not necessarily work in mainstream classes, such as
special-education teachers or teachers who spend all
ing Project STAR, our results highlight that the
or part of the day as administrators, as librarians, or in
schooling conditions under which class-size re- functions outside of the classroom. Because pupil-
duction occurs are relevant for the student out- teacher ratio is an aggregate measure, usually com-
comes we are interested in improving. More re- puted on units larger than the classroom (i.e., entire
search is needed to identify the conditions under schools or school districts), it is an aspect of schooling
which experimental findings, such as Project not as proximal to student learning as class size is
STAR, can be generalized. (Finn & Achilles, 1999).
4
To increase external validity in randomized Rice (1999), Betts and Shkolnik (1999), and Stasz
experiments, Cook suggested a heterogeneity-of- and Stecher (2000) tested the interactive effects of
replication model, "one that emphasizes how class size and average class-level test scores on teach-
ers' instructional practices but did not use student
consistently a causal relationship replicates across
achievement as their ultimate outcome.
multiple sources of heterogeneity" (2002, p. 188). 5
To avoid ceiling effects, the cognitive assessments
If this model were applied to the case of class included a two-stage battery. The first stage entailed a
size, then the question would be: Can the same 12- to 20-item routing test for each domain. In the sec-
causal relationship between class size and stu- ond stage, the child took different skill-level assess-
dent achievement observed in Project STAR be ments depending on his or her performance in the rout-
observed in experiments throughout different ing test. The reading and mathematics assessments had
settings, time periods, and regions of the country three skill levels each, and the general knowledge as-
and with different ways of operationalizing the sessment had two. The IRT scores that NCES calcu-
lated and that we use as dependent variables make pos-
cause and effect? Based on this evidence, claims
sible the comparison of scores across children,
will properly infer, and more adequately gener- regardless of which second-stage form students took
alize, the causal relation between class size and (West et al., 2000).
6
student achievement. However, to be fully infor- In 92.2% of morning classes, 91.2% of afternoon
mative, researchers must combine inquiries about classes, and 93.4% of all-day classes, these three mea-
whether class size affects student achievement sures of class size coincided. For the remaining cases,
with studies aimed at understanding the mecha- we followed this set of rules: (1) When three valid
nisms by which class size affects student learning measures of class size were present and differed from
each other in only one student per class (e.g., 17, 18,
in some contexts. Further knowledge of how class
and 19 in the three measures), we assigned the median
size matters for student achievement may clarify among them. (2) When the three valid measures were
why experimental and nonexperimental studies substantively different from each other, we recoded
exhibit such inconsistent results. the value of class size as missing. We carried out this
procedure in only two all-day classes. In these two
cases, the actual measures of class size, as indicated by
Notes the sum of boys and girls, sum of children of different
Inkindergarten, the small-class advantage was 0.15 ages, and sum of children of different races/ethnicities,
SD in mathematics and 0.18 SD in reading. In first were 25, 6, and 3 in one case and 27, 13, and 15 in the
grade, the small-class advantage was 0.27 and 0.24 in other. (3) When there were three valid measures of

309
Milesi and Gamoran

class size and two of the measures were equal to each were consistent throughout models. In this article, we
other but different from the third one, we selected the only report the estimates for the variable slopes for the
class size indicated by the two equivalent measures. case when we included them as a set in the full model.
(4) When two measures were valid and different and "3 Children had to demonstrate a certain level of pro-
the third one was a missing case, we randomly selected ficiency in English, according to an English-language
one of the two valid measures. proficiency screener called Oral Language Develop-
7
As presented in Table 2, the mean number of stu- ment Scale (OLDS), to take any of the cognitive as-
dents in small, regular, and large classes is 14.3, 20.3, sessments. If a child failed to reach the cut score on
and 26.7, respectively. the English OLDS but was proficient in Spanish, he
8
The response categories of these three variables are: or she could take the mathematics assessment in
I = no time, 2 = half hour or less, 3 = approximately 1 Spanish (5% of the sample was assessed in Spanish in
hour, 4 = approximately 2 hours, and 5 = 3 hours or the mathematics domain). Some of the children who did
more. not take the reading assessment in the fall of kinder-
9
For all items in reading and mathematics, the cate- garten became English proficient during the school year
gories of response are: 1 = never, 2 - once a month or and could take the reading assessment in the spring of
less, 3 = two to three times a month, 4 = one to two times kindergarten. However, because these students do not
a week, 5 = three to four times a week, and 6 = daily. have a complete record-they lack a measure of fall
10Categories of response are: 1 = Group misbehaves achievement-we cannot include them in this analysis.
4
very frequently and is almost always difficult to han- Teachers did not report their teaching practices,
dle, 2 = Group misbehaves frequently and is often dif- such as class organization and typical class activities,
ficult to handle, 3 = Group misbehaves occasionally, separately for each kindergarten class they teach.
4 = Group behaves well, and 5 = Group behaves ex- Rather, teachers reported the "typical" instructional
ceptionally well. activities they "often" carry out in class. Given this in-
"IWeselected students as Level-I units and classes formation, we cannot determine whether the same
as Level-2 aggregates, because we are interested in the teacher uses different instructional practices in classes
effects that class-level attributes (class size and instruc- of different size.
tion) have on student-level characteristics (academic ' 5In the fixed-effects regression, we adjust the stan-
achievement). Even though this is a well-justified con- dard errors using the same student-level weight (BY-
ceptual decision, our HLM model does not address the COMWO) than in the HLM analyses (see note 11).
dependency of observations within schools built into
ECLS-K because of its sampling design. One way to References
perform a cluster correction is to add a third level of Angrist, J. D., & Lavy, V. (1999). Using Maimonides'
analysis pertaining to schools. Unfortunately, this is not rule to estimate the effect of class size on scholastic
a possible alternative for us because the number of achievement. QuarterlyJournalofEconomics, 114,
classes per school (5.1 on average) is not large enough 533-575.
to allow a simultaneous specification of class and Barr, R., & Dreeben, R. (1983). How schools work.
school levels. Because a HLM analysis would drop the Chicago: The University of Chicago Press.
aggregate units that contained fewer cases than the Betts, J. R., & Shkolnik, J. L. (1999). The behavioral
number of parameters estimated, the school-level effects of variations in class size: The case of math
model would have to be extremely simple to preserve teachers. EducationalEvaluationand PolicyAnaly-
the cases with small number of classes per school. sis, 21, 193-213.
A related issue pertains to sampling weights. At the Biddle, B. J., & Berliner, D. C. (2002). Small class size
student level, we included the weights that ECLS-K and its effects. EducationalLeadership,59, 12-23.
provides and suggests to use when the analysis includes Brewer, D. J., & Stasz, C. (1996). Enhancing oppor-
both rounds of children assessment, in conjunction with tunity to learn measures in NCES data (RAND
fall and spring teacher and parent data (BYCOMWO). Reprint 581). Santa Monica, CA: RAND.
However, ECLS-K does not provide class-level Bryk, A. S., & Raudenbush, S. W. (1992). Hierarchi-
weights. We could compute these weights if we knew cal linearmodels. Newbury Park, CA: Sage.
the total number of kindergarten classes per school, but Coleman, J. S., Campbell, E. Q., Hobson, C. J.,
this information is not available. In any case, we expect McPartland, J., Mood, A., Weinfield, F. D., et al.
that the student-level weight considers the oversample (1966). Equality of educationalopportunity.Wash-
of Asian and private kindergartners, lessening the con- ington, DC: U.S. Government Printing Office.
cern about the lack of class-level weights. Cook, T. D. (2002). Randomized experiments in edu-
"2We estimated different specifications of the ran- cational policy research: A critical examination of
dom-effect model, introducing these three variable the reasons the educational evaluation community
slopes-separately, in pairs, or as a set-in the baseline has offered for not doing them. EducationalEvalu-
and in the different subsequent models. The results ation and PolicyAnalysis, 24, 175-199.
310
Class Size and Instruction on Kindergarten Achievement

Ehrenberg, R. G., Brewer, D. J., Gamoran, A., & Kilpatrick, J., Swafford, J., & Findell, B. (2001).
Willms, J. D. (2001). Class size and student Adding it up: Helping children learn mathematics.
achievement. Psychological Science in the Public Washington, DC: National Academy Press.
Interest, 2, 265-294. Krueger, A. B. (2002). Understanding the magnitude
Ferguson, R. F. (1998). Can schools narrow the and effect of class size on student achievement. In
black-white test score gap? In C. Jencks & M. L. Mishel & R. Rothstein (Eds.), The class size de-
Phillips (Eds.), The Black-White Test Score Gap. bate (pp. 7-35). Washington, DC: Economic Policy
(pp. 318-374). Washington, DC: The Brookings Institute.
Institution. Krueger, A. B., & Whitmore, D. M. (2001). Would
Finn, J., & Achilles, C. M. (1999). Tennessee's class smaller classes help close the black-white achieve-
size study: Findings, implications, and misconcep- ment gap? Industrial Relations Section, Working
tions. Educational Evaluationand Policy Analysis, Paper No. 451, Princeton University. Retrieved from
21, 97-109. http://www.irs.princeton.edu/pubs/pdfs/45 l.pdf.
Finn, J. D., Pannozzo, G. M., & Achilles, C. (2003). Manski, C. F. (1995). Identification problems in the
The "why's" of class size: Student behavior in small social sciences. Cambridge, MA: Harvard Univer-
classes. Review of Educational Research, 73, sity Press.
321-368. Mayer, D. (1999). Measuring instructional practice:
Gamoran, A., Secada, W. G., & Marrett, C. B. (2000). Can policymakers trust survey data? Educational
Evaluationand Policy Analysis, 21, 29-45.
The organizational context of teaching and learn-
Mitchell, D. E., & Mitchell, R. E. (2003). The politi-
ing: Changing theoretical perspectives. In Hand-
cal economy of education policy: The case of class
book of the sociology of education (pp. 37-63).
size reduction. Peabody Journalof Education, 78,
New York: Kluwer Academic/Plenum Publishers.
120-152.
Gewertz, C. Sep 3, 2003. Class Conflict. Education
Molnar, A., Smith, P., Zahorik, J., Palmer, A., Hal-
Week, 23, 21.
bach, A., & Ehrle, K. (1999). Evaluating the SAGE
Goodnough, A. Aug 20, 2003. Florida Board Backs Program: A pilot program in targeted pupil-teacher
Retreat on Class Size. New York Times [online].
reduction in Wisconsin. Educational Evaluation
Greenwald, R., Hedges, L. V., & Laine, R. D. (1996). and PolicyAnalysis, 21, 165-177.
The effect of school resources on student achieve-
Mosteller, F., Light, R. J. & Sachs, J. A. (1996). Sus-
ment. Review ofEducationalResearch,66, 361-396. tained inquiry in education: Lessons from skill group-
Grissmer, D. (1999). Conclusion. Class size effects: ing and class size. HarvardEducationalReview, 66,
Assessing the evidence, its policy implications, and 797-842.
future research agenda. EducationalEvaluationand Nye, B. A., Hedges, L. V., & Konstantopoulos, S.
PolicyAnalysis, 21, 231-248. (1999). The long-term effects of small classes: A
Hanushek, E. A. (1989). The impact of differential ex- five-year follow-up of the Tennessee class size
penditures on school performance. EducationalRe- experiment. Educational Evaluation and Policy
searcher,18, 45-65. Analysis, 21, 127-142.
Hanushek, E. A. (1999). Some findings from an inde- Nye, B. A., Hedges, L. V., & Konstantopoulos, S.
pendent investigation of the Tennessee STAR ex- (2002). Do low-achieving students benefit more
periment and from other investigations of class size from small classes? Evidence from the Tennessee
effects. Educational Evaluation and Policy Analy- Class Size Experiment. EducationalEvaluationand
sis, 21, 143-163. PolicyAnalysis, 24, 201-217.
Hanushek, E. A. (2002). Evidence, politics, and the Pong, S., & Pallas, A. (2001). Class size and eighth-
class size debate. In L. Mishel & R. Rothstein grade math achievement in the United States and
(Eds.), The class size debate (pp. 37-65). Washing- abroad. Educational Evaluation and Policy Analy-
ton, DC: Economic Policy Institute. sis, 23, 251-273.
Hedges, L. V. & Greenwald, R. (1996). Have times Porter, A. C. (1991). Creating a system of school
changed? The relation between school resources process indicators. Educational Evaluation and
and student performance. In G. Burtless (Ed.), Does Policy Analysis, 13, 13-29.
money matter? The effect of school resources on Rice, J. K. (1999). The impact of class size on instruc-
student achievement and adult success (pp. 74-92). tional strategies and the use of time in high school
Washington, DC: Brookings Institution Press. mathematics and science courses. Educational
Hoxby, C. M. (2000). The effects of class size on stu- Evaluation and PolicyAnalysis, 21, 215-229.
dent achievement: New evidence from population Rice, J. K. (2002). Making the evidence matter: Im-
variation. The Quarterly Journal of Economics, plications of the class size research debate for pol-
115,1239-1285. icy makers. In L. Mishel & R. Rothstein (Eds.), The
Kennedy, M. (2003). Sizing Up Smaller Classes. class size debate (pp. 89-94). Washington, DC:
American School and University, 75, 16-20. Economic Policy Institute.

311
Milesi and Gamoran

Richard, A. (2004, February 18). Class-size reduction Deputy Secretary, Policy and Program Studies
is slow going in Fla. Education Week, 23, 30-31. Service.
Ritter, G. W., & Boruch, R. F. (1999). The political U.S. Department of Education. August 3, 2005.
and institutional origins of a randomized controlled Highly qualified teachers:Improving teacherqual-
trial on elementary school class size: Tennessee's ity state grants.ESEA Title II, PartA. Non-regular-
Project STAR. EducationalEvaluation and Policy ity guidance. Office of Elementary and Secondary
Analysis, 21, 111-125. Education, Academic Improvement and Teacher
Shapson, S. M., Wright, E. N., Eason, G., & Fitzger- Quality Programs.
ald, J. (1980). An Experimental Study of the effects West, J., Denton, K., & Germino-Hausken, E.
of class size. American EducationalResearch Jour- (2000). America's kindergartners:Findingsfrom
nal, 17, 141-152. the Early Childhood LongitudinalStudy, Kinder-
Snow, C. E., Burns, M. S., & Griffin, P. (1998). Pre- garten Class of 1998-99, Fall 1998. NCES-2000-
venting reading difficulties in young children. Com- 070. Washington, DC: National Center for Edu-
mittee on the Prevention of Reading Difficulties in cation Statistics.
Young Children, National Research Council.
Washington, DC: National Academy Press. Authors
Stasz, C., & Stecher, B. M. (2000). Teaching mathe-
matics and language arts in reduced size and non-re- CAROLINA MILESI is a PhD Candidate, Depart-
duced size classrooms. EducationalEvaluation and ment of Sociology, University of Wisconsin-Madison,
PolicyAnalysis, 21, 313-329. 1180 Observatory Drive, Madison, WI 53706;
Stecher, B. M. & Bohrnstedt, G. W. (Eds.). (2002). cmilesi@ssc.wisc.edu. Her areas of specialization are
Class size reduction in California: Findingsfrom stratification of educational outcomes, trajectories in
1999-00 and 2000-01. Sacramento, CA: California postsecondary education, and the educational impact
Department of Education. of early health disparities.
U.S. Department of Education. Oct 22, 1998. Press re- ADAM GAMORAN is Professor, Sociology and
lease: Vice President Gore Announces $1.2 Billion Educational Policy Studies, and Director, Wisconsin
to Begin Hiring 100,000 Teachers in Local School Center for Education Research, Department of Sociol-
Districts. ogy, University of Wisconsin-Madison, 1180 Obser-
U.S. Department of Education. (2000). Class-size re- vatory Drive, Madison, WI 53706; gamoran@ssc.
duction program. Guidance for fiscal year 2000. wisc.edu. His areas of specialization are inequality in
Washington, DC: Office of Elementary and Sec- education, school and classroom organization, and
ondary Education. school reform.
U.S. Department of Education. (2004). A descriptive Manuscript received January 4, 2005
evaluation of thefederal class-size reduction pro- Final revision received August 14, 2006
gram: Finalreport.Washington, DC: Office of the Accepted August 22, 2006

312
Appendix
TABLE A I
Effect of Class Size on Spring Kindergarten Achievement With Different Specifications of ClassSize
Reading Math
Model Est. SE Est. SE
Model 1: Large classes (24 or more students) as reference categorya
Small class (17 or fewer students) -0.102 0.246 0.111 0.183
Regular class (18-23 students) 0.107 0.202 -0.231 0.149
Model 2: Classes of 18 or more students as reference category
Small class (17 or fewer students) -0.180 0.196 -0.279 0.148
Model 3: Regular classes (18 to 23 students) as reference category
Small class (17 or fewer students) -0.209 0.203 -0.341 0.153*
Large class (24 or more students) -0.107 0.202 -0.231 0.149
Model 4: Class size as a linear variable
Class size 0.008 0.017 -0.005 0.012
Model 5: Class size as linear and quadratic terms
Class size 0.122 0.063 0.167 0.046**
Class size 2 -0.003 0.001 -0.004 0.001**
Model 6: Natural log of class size
Log(Class size) 0.356 0.320 0.156 0.241
Model 7: Classes of 14 or more students as reference category
Small class (t3 or fewer students) -0.609 0.35 1 -0.586 0.268*
Model 8: Classes of 15 or more students as reference category
Small class (14 or fewer students) -0.501 0.290 -0.638 0.221**
Model 9: Classes of 16 or more students as reference category
Small class (15 or fewer students) -0.278 0.250 -0.588 0.191*
Note. All models include as level- I variables: family SES, indicators for male, Black, Hispanic, Asian, fall achievement, and age.
As level-2 variables, all models include mean fall achievement, mean family SES, percent Black, percent Hispanic, and indica-
tor for half-day class.
"Equivalent to Model I in Table 3 for Reading and Model I in Table 4 for Math.
*p> It 1.05; **p> It l.ol; ***p It l.o01.

313
COPYRIGHT INFORMATION

TITLE: Effects of Class Size and Instruction on Kindergarten


Achievement
SOURCE: Educational Evaluation & Policy Analysis 28 no4 Wint
2006
PAGE(S): 287-313
WN: 0634903465004

The magazine publisher is the copyright holder of this article and it


is reproduced with permission. Further reproduction of this article in
violation of the copyright is prohibited. To contact the publisher:
http://www.aera.net/

Copyright 1982-2007 The H.W. Wilson Company. All rights reserved.

You might also like