Download as pdf or txt
Download as pdf or txt
You are on page 1of 85

McDougal Littell

Evidence-Based Small-Scale Study Final Report:

7TH GRADE LIFE SCIENCE

Technical Report, 07-31-05

by

Catherine Callow-Heusser, Ph.D. (ABD), Principal Investigator and Director


cheusser@endvision.net

Douglas Allred, M.S., Project Coordinator


red@endvision.net

Daniel Robertson, Ph.D., Senior Research Analyst


danr@endvision.net

EndVision Research & Evaluation, LLC


41 E. University Blvd. #321
Logan, UT 84321
(435) 881-8811

Geoffrey D. Borman, Ph.D., Associate Professor


gborman@education.wisc.edu

Maritza Dowling, Research Assistant


mdowling@ccbc.education.wisc.edu

Educational Leadership and Policy Analysis


University of Wisconsin-Madison
1161D Educational Sciences Building
1025 West Johnson St.
Madison, WI 53706-1796
(608) 263-3688
i

EXECUTIVE SUMMARY

As a small-scale feasibility study, this study demonstrated that a randomized


experimental design could be successfully conducted in typical school settings. The study
initially included 31 7th grade classrooms whose teachers were randomly assigned to either the
treatment group, which used the McDougal Littell Life Science, or to a comparison group using
science curriculum that teachers had previously used. Mixed methods were used to gather data
through student assessment, classroom observations, surveys, interviews, and focus groups.
After approximately 18-20 weeks of implementation (e.g., one half of the school year), 29
classrooms completed the study and were represented in the final analysis of student outcomes.
While this relatively small number limits statistical power, it was sufficient to show differences
that consistently favor the treatment group and that suggest McDougal Littell Life Science was
more effective than life science curriculum used in comparison classrooms.
The impact analyses focused on attitudinal and achievement changes for over 700
students nested within 29 classrooms. The overall unconditional standardized mean difference
effect size between treatment and comparison groups for average student gains on a science
content test was δ = 0.27. This difference shows a small but potentially important difference in
gains. Hierarchical linear model analyses, which take into account both student- and classroom-
level sources of variability in the outcome, were conducted to estimate the classroom-level effect
of random assignment. This method was used because students randomized together within any
one classroom or school are more likely to respond in a similar manner than students randomized
from different clusters. The analysis identified the unique impact of McDougal Littell Life
Science while accounting for both pre-existing student and classroom differences, and any effect
of classroom-clustering, including the impact from differences in characteristics of teachers. The
analysis revealed no statistically significant classroom-level effects of assignment to implement
McDougal Littell Life Science on the science achievement outcome. The magnitude of the
effect, δ = .15, was of some practical significance, but it was not large enough to achieve
statistical significance given the current design, which included a total of only 29 classrooms.
For the average McDougal Littell classroom in the sample, this effect is equivalent to moving the
entire class from the 50th to the 56th percentile while the average comparison classroom remained
at the 50th percentile.
The hierarchical linear model analysis indicated that the impact of assignment on the
science attitudinal measure, though, did serve to narrow the gap between students who began the
treatment with poorer and better attitudes toward science. In addition, there was a statistically
significant difference between treatment teachers using McDougal Littell Life Science and
comparison teachers with respect to their implementation of high-quality research-based
classroom practices and curricula, with a magnitude of effect δ = 0.15. Relative to the
comparison teachers, treatment teachers implemented research-based classroom practices and
curricula with greater quality and consistency.
Utilizing an instrumental variables approach for estimating the treatment effect,
additional analyses specifically addressed program implementation and how it influenced
students’ achievement and attitudinal outcomes. Assignment to McDougal Littell Life Science
had a positive and statistically significant effect on teachers’ uses of research-based instructional
practices and curricular materials, but the instrumental variables analyses suggested that these
benefits for teachers did not immediately translate into large and statistically significant effects

© ΣndVision Research & Evaluation, Logan UT


ii

on the student outcomes. The most consistent predictors of student outcomes were student- and
classroom-level pretest scores on the achievement and attitude measures. That is, students and
classrooms beginning the study with higher achievement and more positive attitudes towards
science tended to end the study with better achievement and attitudinal outcomes. Exploratory
regression trees analyses revealed that McDougal Littell Life Science helped to counteract this
trend in the poorest performing classrooms. The 75 treatment students experienced a positive
and statistically significant impact relative to the 47 comparison students from the five lowest-
performing classrooms. This result provided some suggestive evidence that McDougal Littell
Life Science could be particularly beneficial for high-needs students from low-performing
schools and classrooms.
Overall, teachers viewed McDougal Littell Life Science favorably. While they made
many suggestions for improving the program, the positive comments and attitudes towards the
program were quite strong. Teachers and students stated that the most effective and valuable
components were (a) the notetaking and vocabulary strategies, (b) the organization of the
materials—including “Big Ideas,” “Key Concepts,” and chapter/section headings, (c) the
resources supporting differentiated instruction, and (d) the technology components that
supported instruction.
Because of the many positive findings of this study, we recommend that funding for a
larger-scale study be sought to investigate the effectiveness of the McDougal Littell Science
Series in grades 6-8. Changes in teaching practices and student outcomes should be measured,
and the magnitude of the relationship between teaching practices and student outcomes should be
investigated, including mediators and moderators of these relationships. Additionally, this study
identifies additional factors that would need to be considered in a larger study, and as such, lays
the groundwork for a larger-scale study to be conducted successfully.

© ΣndVision Research & Evaluation, Logan UT


iii

ACKNOWLEDGEMENTS

Many people were involved in helping us successfully conduct this study, and they deserve
recognition and thanks.

Carol Guziak, Manager of Educational Research at McDougal Littell, for working diligently to
grasp the nuances of educational research and for supporting an evidence-based study of
McDougal Littell Science.

Douglas Carnine, instructional designer for McDougal Littell Science, for providing advice to
design the study and support as we conducted it.

Kathy Zantal-Weiner, friend and colleague, for helping to lay the groundwork for the study.

Duncan Drummond and Dave Sutor at McDougal Littell, for making many more phone calls
than we did to request the participation of school districts and teachers. They have heard more
“No” responses than anyone should be subjected to in a lifetime!

Becky Canning, Hyrum Henderson, and Wendy Sanborn, for helping to develop and test
instruments, for uncomplainingly dealing with grueling travel schedules, and for striving to
collect reliable, high quality data.

Linda Allred, Bryan Elwell, Suzanne Yelton, and Danhui Zhang for reliably and efficiently
grading tests and entering data.

Katie Christiansen Griffiths, for helping to schedule visits to schools and communicate with
teachers.

District superintendents and science coordinators, for agreeing to participate in the study and
giving us access to classrooms, teachers, and students.

Teachers who willingly and enthusiastically participated in the study, for cheerfully allowing us
to observe what happens in middle school science classrooms across the country.

And students, for whom we’re working to improve science education…we hope you learn to
love science as much as we do!

© ΣndVision Research & Evaluation, Logan UT


iv

TABLE OF CONTENTS

Executive Summary ......................................................................................................... i


Acknoweldgements .........................................................................................................iii
Introduction ..................................................................................................................... 1
Background ................................................................................................................. 1
Meeting Criteria for Evidence-Based Research Designs ............................................. 1
Research Design............................................................................................................. 4
Statistical Power .......................................................................................................... 5
Research Questions........................................................................................................ 6
Description of the Intervention....................................................................................... 10
McDougal Littell Science ........................................................................................... 10
Standards-Based Instruction and Assessment....................................................... 11
Research-Based Practices from Educational and Cognitive Research.................. 11
Scope of Use.......................................................................................................... 15
Cost to Schools and Districts ................................................................................. 15
Materials Provided to Treatment Teachers and Students ...................................... 15
Using McDougal Littell Life Science .......................................................................... 16
McDougal Littell Professional Development ........................................................... 17
Teaching and “Full Implementation”....................................................................... 18
Factors Affecting “Full Implementation” and Student Outcomes ............................ 21
Using Other Curriculum in Comparison Classrooms ................................................. 24
Programs Used in Comparison Group Classrooms ............................................... 24
Teaching and “Full Implementation”....................................................................... 24
Description of Population and Sample .......................................................................... 26
Population.................................................................................................................. 26
Sample ...................................................................................................................... 26
Selecting the Sample ............................................................................................. 26
Randomly Assigning Teachers to Groups .............................................................. 28
Final Sample .......................................................................................................... 28
Target Classroom................................................................................................... 29

© ΣndVision Research & Evaluation, Logan UT


v

Demographics of the Sample................................................................................... 30


Description of Schools ........................................................................................... 30
Description of Teachers ......................................................................................... 30
Description of Classrooms and Students ............................................................... 30
Instrumentation and Data Collection ............................................................................. 31
Pretest/Posttest ......................................................................................................... 31
Student Attitude Survey ............................................................................................. 34
Classroom Observation Instrument ........................................................................... 34
Using Science Materials Checklist Teacher Self-Report Checklist ............................ 36
Using Science Materials Observation Checklist ........................................................ 36
Informal Teacher Interviews ...................................................................................... 37
Informal Student Interviews ....................................................................................... 37
Journaling Questions ................................................................................................. 37
Teacher Survey ......................................................................................................... 38
Student Questionnaire ............................................................................................... 38
Treatment Teacher Focus Groups/Interviews............................................................ 38
Treatment Student Focus Groups ............................................................................. 38
Data Analysis ................................................................................................................ 39
Final Sample.............................................................................................................. 39
Hierarchical Linear Model Analyses of McDougal Littell Life Science Treatment
Effects...................................................................................................................... 42
Analysis of Research Questions................................................................................ 44
Differences in Achievement between Groups ........................................................ 44
Differences in Attitudes between Groups ............................................................... 45
Causal Effects of Implementation Quality .............................................................. 48
Differences in Treatment Effects on Achievement Attributable to Classroom
Context—Closing the Gaps ................................................................................ 51
Use and Impact of Notetaking ................................................................................ 53
Use and Impact of Technology............................................................................... 54
Implementation of the Program .............................................................................. 56
Attitudes toward McDougal Littell Life Science....................................................... 58
Effectiveness of Professional Development ........................................................... 68

© ΣndVision Research & Evaluation, Logan UT


vi

Conclusions ................................................................................................................ 70
Limitations of the Study ................................................................................................. 71
Recommendations for Future Research........................................................................ 73
References.................................................................................................................... 75
Appendices ................................................................................................................... 78
Appendix A: Sample Agenda of Professional Development Provided for Treatment
Teachers....................................................................... Error! Bookmark not defined.
Appendix B: McDougal Littell Sample Script for Seeking Study Participation ......... Error!
Bookmark not defined.
Appendix C: Letters Faxed or Emailed to Interested Schools or Districts ............... Error!
Bookmark not defined.
Appendix D: Parent Letter of introduction and Passive Permission Error! Bookmark not
defined.
Appendix E: Pretest/Postest Forms A and B................... Error! Bookmark not defined.
Appendix F: Student Attitude Survey .............................. Error! Bookmark not defined.
Appendix G: Classroom Observation Instrument ............ Error! Bookmark not defined.
Appendix H: Using Science Materials Observation ......... Error! Bookmark not defined.
Appendix I: Using Science Materials Checklist ............... Error! Bookmark not defined.
Appendix J: Teacher Survey ........................................... Error! Bookmark not defined.
Appendix K: Student Questionnaire ................................ Error! Bookmark not defined.
Appendix L: Teacher Focus Groups................................ Error! Bookmark not defined.
Appendix M: Student Focus Groups ............................... Error! Bookmark not defined.
Appendix N: Journaling Email Questions ...................... Error! Bookmark not defined.
Appendix O: Descriptive Summaries and Statistics ....... Error! Bookmark not defined.
Appendix O.1: Descriptions of Samples and MeasuresError! Bookmark not defined.
Appendix O.2: Responses to Student Questionnaire .. Error! Bookmark not defined.
Appendix O.3: Responses to Teacher Questionnaire.. Error! Bookmark not defined.
Appendix O.4: Using McDougal Littell Science Materials Checklist...Error! Bookmark
not defined.
Appendix O.5: Attrition Analysis .................................. Error! Bookmark not defined.
Appendix O.6: Journaling Email Responses................ Error! Bookmark not defined.

© ΣndVision Research & Evaluation, Logan UT


1

INTRODUCTION

The small-scale study described in this report provides research-based evidence for the
effectiveness of the McDougal Littell Life Science curricula series targeted at the 7th grade level.
The research design of this study meets the research criteria specified by the What Works
Clearinghouse (WWC) in the Study Design and Implementation Assessment Device (Study
DIAD), version 1.0. This report includes a description of McDougal Littell Life Science,
including a description of the program as implemented by teachers, in order to provide sufficient
evidence of the intervention and outcomes to meet WWC standards.

Background

McDougal Littell’s Market Research Team conducted field tests and evaluation studies during
the second semester of 2003 (McDougal Littell Market Research, 2004). Eleven teachers in nine
schools and districts from five states submitted results from implementing chapters from
McDougal Littell Science, including modules from Physical, Earth, and Life Science. The
classrooms represented a mix of urban, suburban, and rural locations as well as a wide range of
socio-economic backgrounds. Pilot testing included pre/post testing of students and results were
positive—students’ posttest scores were considerably higher than pretest scores and teachers’
attitudes towards the materials were favorable.

In order to be able to provide additional feasibility evidence for the purpose of obtaining larger-
scale funding, McDougal Littell’s Market Research Team contracted with EndVision Research
& Evaluation during the 2004-2005 school year to conduct a small-scale feasibility study of the
effectiveness of McDougal Littell Life Science. The randomized experimental design described
in the original proposal specified a minimum of 32 teachers, with 16 randomly assigned to
implement McDougal Littell Life Science curricula and 16 teachers assigned to a comparison
group using previously implemented curriculum. In order to meet McDougal Littell reporting
timelines, the study extended through the entire course of one school term, or approximately 18
weeks during the Fall of 2004. Mixed methods were used to gather data through student
assessment, classroom observations, surveys, interviews, and focus groups.

Meeting Criteria for Evidence-Based Research Designs

Classrooms, schools, and their surrounding environments are complex structures in which
experimental research conditions are difficult to achieve. As shown in the conceptual framework
in Figure 1, many factors affect students, teachers, classrooms, schools, districts, and the
communities in which they reside—most of which are difficult to control in a research study that
takes place in a “natural” or typical setting. For this reason, many contextual factors which could
impact student achievement outcomes were observed and measured through the course of the
study. Student attrition from the study was monitored and handled as necessary in the data
analysis to insure high quality, research-based evidence.

© ΣndVision Research & Evaluation, Logan UT


2

Student
Achieve-
ment

Figure 1. Conceptual Framework

The research design met the criteria specified by the What Works Clearinghouse (WWC) in the
Study Design and Implementation Assessment Device (Study DIAD), version 1.0. These criteria
specified that reporting of study findings, including descriptions of the intended curricula and the
curricula implemented by teachers, were required to provide sufficient evidence to meet WWC
standards. This final report for the study includes such descriptions.

Additionally, guidelines from the Institute for Education Sciences (IES, 2003) that recommended
providing evidence of intervention effectiveness using randomized experimental designs were
met in conducting the study and writing this report. The IES guidelines advocated the following
criteria: (a) the intended and implemented interventions were clearly described, including who
administered the intervention, who received it, and what it cost, (b) the ways in which the
intervention differed from what the comparison group received were measured and reported,
(c) the logic for how the intervention was supposed to affect outcomes was included,
(d) compromises to random assignment were articulated and considered in the data analysis,

© ΣndVision Research & Evaluation, Logan UT


3

(e) an analysis of baseline differences between intervention and comparison groups was
conducted and any pre-existing differences after random assignment were controlled in the data
analysis, (f) valid outcome measures were used, and (g) all research students were followed and
attrition was monitored, reported, and considered in the data analysis. In addition, (h) effect
sizes and results from statistical tests were reported, (i) the intervention’s effect on all subgroups
of students and outcomes were reported, regardless of direction of effect, (j) the intervention was
implemented in multiple sites with varying demographic locations and characteristics, and
(k) the intervention was carried out and delivered in typical school settings and under typical
conditions. Finally, (l) the study’s intervention and comparison groups were randomly assigned
prospectively (i.e., prior to the intervention), and (m) measures were chosen prospectively.
Overall, the research design described in this final study report dealt with the complexities of
school settings across time while addressing recommendations for providing high quality,
research-based evidence.

This experimental research design constituted a feasibility study to provide evidence for
effectiveness of McDougal Littell Life Science as implemented in the 7th grade on a small scale.
As a small-scale feasibility study, it falls short of IES guidelines for “strong” evidence on two
points: (a) it did not include data collection on long-term outcomes of the intervention, and (b)
although the IES guidelines suggest outcome data should be reported for those in the intervention
group who did not complete the intervention, most students who dropped out of the study were
absent on test day and teachers were unwilling to sacrifice another instructional day to test them
separately, or they moved to other schools not participating in the study.

As a small-scale feasibility study, researchers followed students for one school term, so data
analysis and reporting could be completed in time for McDougal Littell to use the findings (a) for
the 2005-2006 school year’s sales, (b) to submit a proposal for a large-scale study to
competitions that we anticipated being announced in spring 2005, (c) to present timely evidence
to the What Works Clearinghouse, and (d) to improve other McDougal Littell curriculum under
development.

© ΣndVision Research & Evaluation, Logan UT


4

RESEARCH DESIGN

A randomized experimental design was used to answer the research questions through both
quantitative and qualitative methods and measures (e.g., mixed methods, see design matrix in
Table 1 on page 8). Teachers were randomly assigned to implement the “Cells and Heredity”
and “Ecology” units from McDougal Littell Life Science or to a comparison group using
previously implemented curriculum but covering the same units. One of the teachers’ class
periods was selected for the study, and classroom- and student-level data was collected
throughout the course of the study. McDougal Littell provided honorariums that were distributed
to both treatment and comparison teachers upon study completion.

An accredited Institutional Review Board (IRB) reviewed the study proposal and
instrumentation, approving the study under exemption numbers 45 CFR 46.101 (b1 & b2) as the
research was conducted in established or commonly accepted educational settings, involving
normal educational practices, such as the effectiveness of or the comparison among instructional
techniques and curricula. Additional district-level IRB clearance was obtained to the extent
required by some districts for research involving human subjects in public school settings.
Written permission was obtained from district or school administrators for researchers to collect
student, teacher, administrator, school, and district data through focus groups and interviews,
surveys, classroom observations, assessment/testing, and artifacts (e.g., lesson plans, student
products or portfolios, school policies).

We had planned to tell all teachers that they were participating in research to help learn more
about how teachers actually use curriculum materials/textbooks, and to not inform teachers that
researchers were investigating the effectiveness of McDougal Littell Life Science. In this way,
teachers would remain “blind” to the intended purposes of the study. However, district
coordinators who requested that teachers participate in the study told teachers that McDougal
Littell curriculum would be used by teachers assigned to the treatment group. Treatment
teachers did agree to not talk with comparison group teachers about the curricula, specifically
content, pedagogy, classroom activities, assessments, and the effects of the curricula on
themselves and students, and those few comparison group teachers whose classrooms were in the
same building as treatment teachers agreed to not question treatment teachers.

Researchers collected baseline data, including state or district assessment scores for students
assigned to treatment and comparison classrooms, and student, teacher, classroom, and school
demographic characteristics. Additionally, pretests and attitudinal measures were administered
to students. McDougal Littell provided “typical” professional development (as described in a
subsequent section) and materials to treatment teachers, with no costs for curricula
implementation incurred by schools, school districts, or researchers.

The intervention extended through the entire course of one school term, or approximately 18
weeks. All treatment teachers implemented the same curricula and agreed to “fully” implement
McDougal Littell Life Science for one term. Researchers collected data from both treatment and
comparison classrooms throughout the study by conducting classroom observations, focus
groups, interviews, and surveys; reviewing artifacts; and administering pre/post assessments.

© ΣndVision Research & Evaluation, Logan UT


5

Statistical Power

Power analysis for two-level Hierarchical Linear Models (HLM), in which students are nested
within classrooms or schools at level 1 and classrooms or schools are at level 2, is a relatively
recent area of methodological study (Hox, 2002; Raudenbush & Liu, 2000; Raudenbush, 1997).
In an HLM framework, statistical power may be affected by several factors. As with any other
statistical analysis, both the magnitude of the treatment effect and the selected probability of
making a type I error – the alpha level – affect power. In addition, in the two-level hierarchical
model, the number of classrooms or schools represented at level 2 of the analysis, the number of
students per classroom or school represented at level 1 of the analysis, and the variation between
classrooms or schools have an impact on statistical power.

As estimates of the magnitude of the level-2 classroom effect of McDougal Littell Life Science,
an effect size of δ = 0.25 was chosen. This effect size, which expresses the expected difference
between the experimental treatment classroom means and the comparison group classroom
means on the proposed outcome, corresponded to widely used standards that experts use to
understand the magnitude of educational effects. Specifically, Cohen (1988) classified effect
sizes of 0.20 as small, 0.50 as moderate in size, and 0.80 as large, and Slavin (1990) stated that
effect sizes at or above 0.25 should be considered to be educationally meaningful. The standard
chosen, therefore, provided a relatively conservative estimate of the expected effect, but was also
of practical educational importance.

In this power analyses, three estimates of the intraclass correlation were used: ρ = 0.05, ρ = 0.10,
and ρ = 0.15. These estimates represented the proportion of variance in achievement scores that
we might expect to find between classrooms. These estimates were derived from a recent study
of teacher effects on upper elementary students’ achievement outcomes by Rowan, Correnti, and
Miller (2002), who found that 3% to 13% of the variability in students’ test score gains was
associated with differences among classrooms across a national sample.

Finally, the initial power analysis summarized in Figure 2 employed two constants: an alpha
level of p < .10 and an estimated within-classroom sample size of 25 students. Given these
estimates, the power analysis for a two-level HLM shown in Figure 2 plots power, 1 - β, where β
represents the probability of failing to reject a false null hypothesis, by the total number
(including treatment and comparison) of sampled classrooms, J, for ρ = 0.05, 0.10, and 0.15.

With a total sample of 32 classrooms, each composed of 25 students, McDougal-Littell effects of


δ = 0.25 would be detected at an alpha level of p < .10 with power of 0.75 assuming an intraclass
correlation, ρ, of 0.05. Power of 0.75 for δ = 0.25 and ρ = .10 would be attained with a total
sample of 50 classrooms. Finally, for ρ = .15, acceptable power of 0.75 would be achieved with
a total sample of 75 classrooms. The results of the power analysis plotted in Figure 2, therefore,
suggested that statistical power would be adequate for detecting classroom-level treatment
effects using a two-level HLM model under a reasonable array of assumptions. Even with a
sample size as small as 32 total classrooms, statistical power should be sufficient to detect the
expected treatment effects for a small-scale feasibility study.

© ΣndVision Research & Evaluation, Logan UT


6

Figure 2. Statistical Power Estimates for Classroom-Level Random Assignment

1.0
α = 0.100
n = 25
0.9
δ= 0.25,ρ= 0.05
0.8 δ= 0.25,ρ= 0.10
δ= 0.25,ρ= 0.15
0.7

P 0.6
o
w 0.5
e
r 0.4

0.3

0.2

0.1

22 41 60 79 98
Number of clusters

RESEARCH QUESTIONS

The research questions that follow address three broad areas, prioritized to match WWC criteria.
An evaluation design matrix that lists indicators and data sources for these research questions
follows in Table 1.

I. The impact of the intervention on student achievement and attitudes:


A. To what extent do students experiencing McDougal Littell Life Science perform better by
(i) demonstrating a higher level of proficiency on curriculum-based assessments and (ii)
exhibiting more positive attitudes than their peers using other curriculum?
B. To what extent are achievement gaps reduced among traditionally underrepresented
groups (e.g., females, students living in poverty, underrepresented minorities)?
C. To what degree does notetaking promote higher assessment scores and increase student
achievement?
D. To what degree does technology use with the curricula promote higher assessment scores,
enhance student engagement, and increase student achievement?

II. The degree and quality of classroom implementation of the curricula:


A. To what extent do teachers follow the program and teachers’ guide?
B. To what degree do teachers encourage students to use Notetaking Guides?

© ΣndVision Research & Evaluation, Logan UT


7

C. To what extent are science labs implemented?


D. To what extent is technology used to enhance the curricula and instruction?
E. What are the relationships between teacher characteristics and degree of implementation?
F. What contextual factors promote or hinder successful implementation?

A post-hoc research question was added:


G. To what extent do teachers implementing the McDougal Littell Life Science demonstrate
increased use of research-based teaching practices and instructional strategies?

III. The effectiveness of teacher support materials and professional development:


A. To what extent does professional development follow research-based conceptual models
and cover the content and pedagogy contained in the curricula?
B. To what extent do support materials assist teachers in using the curricula?
C. To what extent do support materials assist teachers in implementing notetaking?
D. To what extent do the technology-based support materials assist teachers in implementing
the curricula?

© ΣndVision Research & Evaluation, Logan UT


8

Table 1. Design Matrix: Evaluation of McDougal Littell Life Science

Evaluation Questions Indicators or Data Elements Data Source


IMPACT OF CURRICULA ON STUDENT ACHIEVEMENT
1. To what extent do students Student scores on standards-aligned Student performance on
experiencing McDougal measures are higher than peers formative, curricula-
Littell Science perform better using other curriculum based, or researcher-
by demonstrating a higher administered measures
Students are:
level of proficiency on
Classroom observations
assessments and exhibiting - intellectually engaged with
more positive attitudes than important ideas relevant to the Teachers interviews and
their peers using other concepts being taught; focus groups
curriculum? - encouraged to use higher level
Student focus groups
thinking and learning skills;
2. To what extent are Other curriculum-related
- engaged in learning activities that
achievement gaps reduced artifacts (e.g., student
are aligned with state and national
among traditionally assignments, projects,
standards;
underrepresented groups portfolios; teacher
- confident in their content abilities;
(e.g., females, students living lesson plans)
and
in poverty, underrepresented
- able to see value in the content.
minorities)?
3. To what degree does Students are engaged in notetaking and Student Notetaking
notetaking promote higher technology use with the curricula Guides, assignments,
assessment scores and projects, and related
Students believe notetaking and use of
increase student artifacts
technology supports their learning,
achievement?
achievement Student performance on
4. To what degree does curriculum-based, or
Notetaking strategies are:
technology use with the researcher-
- implemented as designed;
curricula promote higher administered measures
- integrated into daily
assessment scores, enhance
instructional lessons; Classroom observations
student engagement, and
- used for intended purposes;
increase student Teacher surveys,
- perceived as useful.
achievement? interviews and focus
groups
Technology curricula supports are:
- implemented as designed; Student focus groups
- integrated into lessons;
- used for intended purposes;
- perceived as useful.

© ΣndVision Research & Evaluation, Logan UT


9

Evaluation Questions Indicators or Data Elements Data Source


CURRICULA IMPLEMENTATION
5. To what extent do teachers Frequency of units or lessons Teacher surveys,
follow the curricula and skipped or expanded upon interviews, focus
teachers’ guide? Deviation from textbook strategies,
groups, journaling
6. To what degree do students use of supplemental materials Classroom observations
use notetaking?
7. To what extent are science Use of Notetaking Guides Lesson plans and other
labs implemented? related artifacts (e.g.,
Access/availability of computers
8. How is technology used to student projects,
enhance instruction? Frequency/type of resources used assignments)
9. What are the relationships Classroom, school, district, and Classroom observations
between teacher community factors that may or Teacher interviews,
characteristics and degree/ may not be explicitly journaling questions
quality of implementation? articulated by teachers,
10. What contextual factors students and that effect Student focus groups,
affect implementation? curricula implementation class notes
11. (post hoc) To what extent do Use of review, vocabulary, Classroom observations
teachers implementing the notetaking, questioning, and Teacher surveys, focus
McDougal Littell Life assessment strategies groups, interviews,
Science demonstrate Use of effective teaching cycle: journaling
increased use of research- review, new content, guided
based teaching practices and Student focus groups,
practice, assessment class notes
instructional strategies?
CURRICULA SUPPORT
12. To what extent does Recommended strategies for adult PD lesson plans, training
professional development teaching/learning are employed materials
(PD) follow research- Training materials include content/ PD observations
based conceptual models pedagogy objectives/activities
and cover the content and Teacher interviews,
Participants are actively engaged in journaling questions
pedagogy contained in the the curricula, and provided
curricula? sufficient learning opportunities
13. To what extent do support Support materials Review of materials
materials assist teachers in − promote strategies and concepts Teacher surveys,
using the curricula? that align with standards; interviews, focus
14. To what extent do support groups, journaling
− support increasing teachers’
materials assist teachers in questions
pedagogy, content knowledge;
implementing notetaking?
15. To what extent do − guide sequencing to align with Classroom observations
technology-based support standards and assessments;
Lesson plans and other
materials assist teachers in − provide resources for additional related artifacts
implementing curricula? content or instructional support.

© ΣndVision Research & Evaluation, Logan UT


10

DESCRIPTION OF THE INTERVENTION

The description of the intervention includes three parts: (a) a description of McDougal Littell
Science, including associated materials and resources, (b) a description of how teachers
implemented McDougal Littell Life Science to teach the “Cells and Heredity” and “Ecology”
units to students in their classrooms, and (c) a description of how the curriculum and teaching in
comparison classrooms differed from treatment classrooms. Figure 3 shows the proposed
mechanism for the impact of the intervention on student outcomes. In this model, the
intervention has an impact on both teaching practices and student outcomes. Teaching practices
also affect student outcomes.

Teaching
Practices

Life Science Student


Program Outcomes

Figure 3. Impact of the Intervention on Student Outcomes

McDougal Littell Science

McDougal Littell Science was published in 2004 and constitutes the publisher’s first foray into
publishing science curriculum at the middle school level. The series includes earth, life, and
physical science programs, as well as texts to meet integrated science standards, and individual
books for each unit in earth, life, and physical science. The 7th grade curriculum used for this
study was “Life Science.” Because of the half-year length of this study, two units were selected:
“Cells and Heredity” and “Ecology.” McDougal Littell Educational Research staff, EndVision
researchers, and advisors to the project agreed that these units included concepts covered under
most states’ 7th grade standards and would be appropriate for teaching during the first half of the
school year. The other units included in McDougal Littell Life Science are (a) Diversity of
Living Things, (b) Life over Time, and (c) Human Biology.

All treatment and comparison group teachers in the study agreed to cover—during the first half
of the school year—concepts that fall within these titles based on their state standards and the
requirements of the study. Because state standards from a number of the districts involved in the

© ΣndVision Research & Evaluation, Logan UT


11

study included integrated science in the 7th grade, many of the teachers reordered the sequencing
of their instruction for this school year to accommodate our request to cover these topics early.
Additionally, two of the districts did not include ecology objectives in their science standards for
7th grade, but teachers agreed—with district-level permission—to include an ecology unit. They
devoted less time to the concepts than they might have otherwise, so they could still cover
district-mandated objectives by the end of the school year.

Standards-Based Instruction and Assessment

Because the No Child Left Behind Act of 2001 requires all states to establish statewide
accountability systems based on challenging state standards in reading, mathematics, and science
for grades 3-8, the program content, activities, and assessments of McDougal Littell Science
were aligned to National Science Education Standards (NRC, 1996).

Research-Based Practices from Educational and Cognitive Research

A number of research-based practices were incorporated into McDougal Littell Science. First,
nine strategies from Classroom Instruction that Works: Research-Based Strategies for
Increasing Student Achievement (Marzano, Pickering, & Pollock, 2001) were incorporated into
the student and teacher texts, chapter resource books, and other ancillary materials. The
strategies follow.
• Identifying similarities and differences: This strategy includes comparing and classifying,
and suggests representing comparisons in graphic or symbolic form, e.g., contrasting units of
measure and types of classification systems, making concept maps and Venn diagrams.
• Summarizing and notetaking: Students learn a variety of note taking formats, e.g., outlines
and webbing, and learn when to delete, substitute, or keep information when writing a
summary. Examples include specific features contained in the texts (e.g., Know How to
Take Notes, Help with Notetaking) and the entire student Notetaking Guide, which is an
ancillary resource available at low cost (i.e., about $10 per student).
• Reinforcing effort and providing recognition: To help students make the connection
between effort and achievement, this strategy encourages teachers to provide recognition
to students for attaining specific goals.
• Homework and practice: Research shows that it is important that students understand the
purpose of their assignments. This strategy suggests providing homework assignments that
focus on specific elements of a complex skill to help make the purpose of the assignments
clear to students.
• Nonlinguistic representations: To help students understand content in a new way, this
strategy focuses on creating nonlinguistic representations. These include creating graphic
organizers, making physical models, generating mental pictures, drawing pictures and
pictographs, and engaging in kinesthetic activity.
• Cooperative learning: A description of the five defining elements of cooperative
learning—positive interdependence, face-to-face interaction, individual and group
accountability, interpersonal and group skills, and group processing—are described in
this strategy and suggestions are given for grouping techniques.

© ΣndVision Research & Evaluation, Logan UT


12

• Setting objectives and providing feedback: This strategy recommends that teachers use
instructional goals to narrow their students’ focus and provide criterion-based feedback. It
also encourages students to personalize their teacher’s goals and provide some of their own
feedback. Examples include Before/Now/Why lists in student lessons and scoring rubrics on
Building Test-Taking Skills.
• Generating and testing hypotheses: The variety of structured tasks included in this
strategy guides students through generating and testing hypotheses and using induction or
deduction. This strategy also advises asking students to clearly explain their hypotheses and
conclusions to deepen their understanding. Examples include Hands-On Activities and
Predict exercises throughout the book.
• Cues, questions, and advance organizers: This strategy includes asking questions or
giving explicit cues before a learning experience to provide students with a preview of what
they are about to experience. It suggests using verbal and graphic advance organizers, or
having students skim information before reading as an advanced organizer. Examples
include Before/Why/Now lists at the beginning of lessons, pre-reading lesson elements such
as Word Watch lists and Example heads throughout the book.
Another set of research-based instructional strategies that are incorporated into McDougal Littell
Science are those presented in Effective Teaching Strategies that Accommodate Diverse Learners
(Kameenui & Carnine, 1997). The features are summarized below.
• A focus on big ideas: Kameenui and Carnine describe big ideas as “those concepts,
principles and heuristics that facilitate the most efficient and broadest acquisition of
knowledge. They are the keys that unlock a content area for a broad range of diverse
learners.” Grossen and Burke (1998) wrote, “Big ideas in science do four things. First, they
represent central scientific ideas and organizing principles. Second, they have rich
explanatory and predictive power. Third, they motivate the formulation of significant
questions, and fourth, they are applicable to many situations and contexts common to
everyday experiences." The McDougal Littell Science provides a thorough development of
big ideas, or key concepts.
• Conspicuous strategies: Explicit teaching of problem solving strategies that expert
problem solvers find useful can be helpful to all students provided the strategies can be
applied to a wide range of situations. Examples include Problem Solving Strategies pages;
Building and Practicing Test-Taking Skills; and worked-out examples with step-by-step
annotated reasons.
• Mediated scaffolding: Providing extra support to students when they are first learning new
ideas can give them the skills and the confidence to succeed on their own. Examples include
Getting Ready to Learn; Getting Ready to Practice exercises with Vocabulary and Guided
Problem Solving questions; Help with Homework boxes; Practice and Problem Solving
exercises that move gradually from basic to challenging.
• Strategic integration: Kameenui and Carnine describe strategic integration as “the
combining of essential information in ways that result in new and more complex
knowledge.” Integration may be across or within disciplines. For example, Exploring Math
in Science; real world problem solving in lessons/exercises; What do you think? questions
relating to Math, Social Studies, or Art; topic integration and integration of representations.

© ΣndVision Research & Evaluation, Logan UT


13

• Primed background knowledge: In order to learn new information easily, students need to
be familiar with key background information (prerequisite concepts and skills). McDougal
Littell Science has a carefully sequenced curriculum in which prerequisite knowledge is
presented before students need it. In addition, the textbooks offer several ways to check
understanding of prerequisite skills and provide help if needed. For example, Pre-Course
Test and Practice; Brain Games, Getting Ready to Learn, Review What You Need to Know
questions, Basic Skills questions in Mixed Review Exercises; Help with Review Notes and
Skills Review Handbook.
• Judicious review: Carefully planned and paced review of important ideas can increase
students’ retention of concepts and facility in applying skills. Examples of judicious review
can be found in the Notebook Review, Mixed Review, and Chapter Review; as well as
teaching and learning support materials.
Because research shows that ongoing, embedded assessment provides feedback to help teachers
plan instruction, McDougal Littell Science provides materials for diagnosing how well students
understand the material, for differentiating instruction to reach all students, for assessing student
progress, and for providing remediation. The curriculum also emphasizes important test-taking
skills and problem-solving strategies for students based on the Creating Independence through
Student-owned Strategies, or CRISS (Santa, Havens, & Maycumber, 1996), model for student-
centered teaching. The following strategies and materials are emphasized.
• Ongoing diagnosis: Materials to diagnose student understanding are provided before,
during, and following each chapter and lesson. Tools include pre-course tests, chapter warm-
up games and Getting Ready to Learn exercises, Skill Check exercises, Your Turn Now and
Getting Ready to Practice to help monitor how well students are grasping the vocabulary,
skills and concepts. Additional assessment resources include Homework Check boxes, and
the Test and Practice Generator CD-ROM.
• Differentiating instruction and practice: There are a number of components identified in
the materials that help differentiate instruction and practice to reach all students. For
example, labs and worksheets are provided at three levels: Level A for struggling learners,
Level B for students who perform adequately, and Level C for advanced students.
• Building test-taking skills: It is more important than ever for students to build strong test-
taking skills in order to be successful on annual assessments required by the NCLB Act.
McDougal Littell Science provides instruction and practice with test-taking skills at the end
of every unit in the textbook. Included are (1) multiple-choice questions where students are
encouraged to use cognitive skills to decide whether answer choices are reasonable, (2) short-
response questions where guidance is provided about how to write complete answers and
show work, (3) context-based multiple choice questions that involve interpreting diagrams
and graphs, and (4) extended-response questions where students learn how to write complete
answers to multi-step problems.
• Assessment: The series provides diagnostic, formative, and summative assessment resources
for measuring student progress on an ongoing basis. These include test-practice questions at
the end of every exercise set, quizzes, reviews, traditional and standardized chapter tests, an
end-of-course test, quizzes for every lesson, alternative forms for quizzes and tests, a Test

© ΣndVision Research & Evaluation, Logan UT


14

and Practice Generator CD-ROM to create customized quizzes and tests, and Online Quizzes
and Standardized Test Practice.
• Reteaching and remediation: A variety of resources help students achieve success, such as
Help with Review notes, Notebook Reviews after every few lessons that summarize key
vocabulary and skills and provide practice, Chapter Reviews, Cumulative Practice, Skills
Review Handbook, Extra Examples, Common Error notes, and additional teacher resources.
• Problem solving strategies: Questions on state and national tests are often posed as word
problems. In order for students to demonstrate mastery of science skills, they must be able to
read and interpret word problems and apply appropriate strategies to solve them. McDougal
Littell Science incorporates problem solving throughout the textbook to help students learn to
apply skills in context.
Finally, students need strong skills in reading, writing, and notetaking in science in order to
understand course content, be successful on important state and national assessments, and
develop the ability to become independent learners. Recent brain research and classroom
research in reading and writing supports the value of well-known practices of successful
teachers, particularly practices supporting vocabulary development and reading comprehension.
Based on this research, McDougal Littell Science incorporates the following features.
• Vocabulary development: The textbook provides strong support to students in learning,
practicing, and reviewing vocabulary. The Getting Ready to Learn page at the start of each
chapter lists important review words practiced in the Using Vocabulary exercises. At the
beginning of each lesson, the key vocabulary for the lesson appears under the Word Watch
list, and new vocabulary in the lesson is emphasized by boldface type with yellow
highlighting. Other vocabulary building aids include Help Notes, Getting Ready to Practice,
Notebook Review pages, Chapter Reviews, and the complete Glossary that includes
examples and diagrams.
• Reading comprehension: Students are given tips for identifying the main idea,
understanding the vocabulary, knowing what’s important in a lesson, being an active reader,
and reading word problems. Other strategies include (1) establishing a context by connecting
new learning to prior knowledge and starting each lesson with a real-world example, a short
activity, or a visual presentation of a math idea to set the stage for new concepts in the lesson,
(2) facilitating understanding by presenting new concepts in short sentences that use simple
syntax and that are accompanied by appropriate tables, charts, and diagrams, (3) reflecting on
learning (i.e., metacognition) by encouraging students to consider whether an answer is
reasonable or to explain their reasoning, (4) using graphic organizers such as charts, Venn
diagrams, or concept maps to help classify mathematical objects.
• Writing opportunities: Students need frequent opportunities to practice writing skills.
These opportunities occur throughout the textbook in Exercises, Stop and Think questions,
activities, Notebook Review, and Exploring Math in Science sections.
• Effective notetaking: Taking effective notes is an important comprehension, learning, and
review strategy that increases engagement. Yet, teachers throughout the country report that
students enter middle school with few if any notetaking skills. Thus, the authors identified
the goal of helping students develop their notetaking skills as an important objective of the

© ΣndVision Research & Evaluation, Logan UT


15

program. They incorporated many notetaking aids into the program, such as Getting Ready
to Learn pages and Notebook Reviews in the textbooks, and the separate Notetaking Guides.

Scope of Use

McDougal Littell Science reached distribution channels in time for 2004-2005 school year sales.
Recently, Houghton Mifflin, McDougal Littell’s parent company, released a news
announcement, “North Carolina’s Top School Districts Choose State-of-the-Art McDougal
Littell Science for Middle School,” in which they claim, “In total, McDougal Littell Science will
reach about 100,000 North Carolina students next year” (April 25, 2005, available at
http://www.hmco.com/company/newsroom/news/news_release_042505.html). Widespread sales
are anticipated for the 2005-2006 school year.

Cost to Schools and Districts

Costs to schools and districts vary based on the base components and ancillaries purchased. For
this study, the components listed in the following section were provided for each student in the
treatment group, for a total cost per treatment student of under $100. Costs for the basic package
provided to each teacher in the treatment group were approximately $370, while the ancillary
package also given to each teacher included materials that would typically cost approximately
$1400 per teacher. For this study, teachers were also given student textbooks for each student in
their other Life Science classes, at a cost of approximately $57 per student.

Materials Provided to Treatment Teachers and Students

The following materials were provided to the teachers in the treatment group. Although most
materials were given to teachers at professional development sessions, not all materials were
mailed in time for the professional development session or prior to the start of the school year.
The delays in teachers receiving materials were due to districts agreeing just prior to the start of
the school year to participate in the study. Coordinating distribution of materials, professional
development scheduling, and the schedules of teachers for whom the contract year had already
started was difficult. For example, in one district that agreed to participate in the study just
before the start of the school year, teachers participated in the McDougal Littell professional
development after students had returned to school, and they received the Science Toolkit
approximately one month after the start of the school year.

The following Life Science materials were included in the basic package for treatment teachers:
• Teacher Edition (TE) Life Science Single Volume Edition
• Test Generator CD-ROM Kit (Life, Earth, and Physical Science)
• Lab Generator CD-ROM
The following Life Science ancillary materials were provided to treatment teachers:
• Cells and Heredity Unit Assessment Book
• Cells and Heredity Unit Resource Book
• Cells and Heredity Unit Transparency Book
• Diversity of Living Things Unit Assessment Book

© ΣndVision Research & Evaluation, Logan UT


16

• Diversity of Living Things Unit Resource Book


• Diversity of Living Things Unit Transparency Book
• Ecology Unit Assessment Book
• Ecology Unit Resource Book
• Ecology Unit Transparency Book
• Human Biology Unit Assessment Book
• Human Biology Unit Resource Book
• Human Biology Unit Transparency Book
• Life Over Time Unit Assessment Book
• Life Over Time Unit Resource Book
• Life Over Time Unit Transparency Book
• Problem Solving and Critical Thinking TE Grade 7
• Process and Lab Skills TE Grade 7
• Program Overview
• Content Review CD-ROM (Life, Earth, and Physical Science)
• EasyPlanner CD-ROM (Life, Earth, and Physical Science)
• e-Edition CD-ROM (Life, Earth, and Physical Science)
• English Learners Package
• McDougal Littell Science Issue Flyer-Big Ideas in Science
• McDougal Littell Science Issue Flyer-Differentiated Instruction
• McDougal Littell Science Issue Flyer-Reading Support
• Notetaking Guide for Life Science
• PowerPoint Presentations CD-ROM (Life, Earth, and Physical Science)
• Science Toolkit
• Scientific American Frontiers DVD Kit
• City Science (provided to teachers who requested it)
• Multi-language Glossary (provided to some teachers)

The following Life Science materials were provided for students in the treatment group.
Additionally, a Pupil Edition Life Science Textbook was provided for all students in each of the
treatment teachers’ 7th grade science classes that were not included in the research.
• Pupil Edition Life Science Textbook
• Notetaking Guide for Life Science
• Standardized Test Practice Pupil Edition
• Problem Solving and Critical Thinking Pupil Edition
• Process and Lab Skills Pupil Edition

Using McDougal Littell Life Science

Teachers in the treatment group were asked to “fully implement” McDougal Littell Life Science.
Although full implementation was not operationally defined, we asked the teachers to use as
many of the instructional strategies, resources, and ancillary materials as possible. Because some
of the teachers said during the professional development sessions that they would not participate
in the study if they could not use some of their “tried and tested” activities, we did not require

© ΣndVision Research & Evaluation, Logan UT


17

that they eliminate all extra activities. For example, some had laboratory activities they would
not give up while others had supplementary activities they had always used, such as projects
jointly completed with teachers of other subjects. One of these was a project involving English,
history, and science classrooms, where students created exhibits about famous scientists over a
two week period and jointly participated in a history fair.

All treatment teachers taught other units prior to the units required for this study, including units
of measure, measurement, laboratory use, and others. All teachers started “Cells and Heredity”
and “Ecology” units during the first half of the school year, as requested for the study, although
some teachers were unable to complete one of the units until after the start of the second term
due to interruptions to school schedules from fires and hurricanes. For some teachers, teaching
these units meant restructuring their planned sequencing for the school year, or obtaining district
permission to not use district sequencing and pacing guides. In one district, the resequencing
resulted in teachers in both treatment and comparison groups not having to administer district-
level criterion-referenced tests (CRTs) to students in the classrooms participating in the study—
this was a huge motivator for teachers in that district to participate in the study!

McDougal Littell Professional Development

McDougal Littell provides complementary professional development to districts and schools


who purchase more than some minimal value of Life Science (C. A. Guziak, personal
communication, January 5, 2004). The training covers use of the materials in the series,
including the Notetaking Guide and other ancillary print materials, and use of computer and
online support resources by teachers and students. Professional development sessions typically
range from 3 hours to 2 days in length, depending on district requirements and availability of
time. Not all teachers in districts that purchase the program attend publisher-provided
professional development.

For this study, we agreed to plan one and a half days for professional development to train
treatment teachers how to use the program. However, not all districts were able to allot this
amount of time given teaching contracts and the amount of time available prior to students
returning to school, so some of the sessions were shorter than the planned time. Shortened
sessions typically did not include as much time for teachers to work with each other to develop a
lesson or to use the technology components. One trainer from McDougal Littell trained all
treatment teachers near their home location on the dates shown in Table 2. Teachers were
provided with a $100 honorarium for attending the in-service, although most did not know prior
to the in-service that they would be given an honorarium.

Because of the late date at which most districts agreed to participate and provided names of
teachers, the in-service dates were scheduled just before or just after students returned to school.
This meant that teachers did not have long to prepare lesson plans that used the materials prior to
teaching. EndVision researchers attended all in-service sessions except one in which training
was provided for a single teacher. A sample agenda for the in-service is included in Appendix
A, although the training varied somewhat in length, content, and sequencing between sites.

© ΣndVision Research & Evaluation, Logan UT


18

Table 2. Dates in 2004 for Professional Development and Student Return to School

District Dates of Professional Times Students Return


Development Sessions to School

School district in a southern state Thursday, August 5 8:00-4:30 Monday, August 9


Friday, August 6 8:30-noon
School district in a southern state Friday, August 13 8:00-5:00 Monday, August 16
School district in a western state Monday, August 16 8:00-4:30 Wednesday, August 25
Tuesday, August 17 8:30-noon
Two school districts in a northeast Monday, August 30 8:00-4:30 Tuesday, September 7 or
state Tuesday, August 31 8:30-noon Wednesday, September 8
School district in a western state Thursday, September 30 9:00-3:00 Monday, August 30

The trainer included a variety of “hands-on” activities during the professional development
sessions. Some of these included the following:

• lab-type activities to demonstrate problem-solving processes or processes for teaching


inquiry-based investigations (some of these were not activities included in the McDougal
Littell Life Science materials),
• planning a sample lesson that included use of ancillary resources,
• exploring ClassZone.com and SciLinks websites,
• editing test questions within the Test Generator software, and
• using the EZ-Planner CD to edit lesson plans.

Additionally, the trainer discussed and demonstrated components of the material such as the “Big
Ideas and Key Concepts,” vocabulary-building strategies (e.g., decoding strategies, notetaking
strategies for learning vocabulary, multi-language glossary), notetaking in outline format using
text headings and subheadings, and multiple forms of assessment of student learning (e.g.,
questions embedded in text, section and chapter review questions, 3-minute warm-ups, chapter
and unit tests, standardized test practice).

Teaching and “Full Implementation”

To learn how to implement McDougal Littell Life Science, the 15 treatment teachers for this
study attended one or two day in-service sessions conducted by a McDougal Littell Science
trainer. The trainer introduced an extensive selection of 7th grade life science instructional
resources. This broad array of materials included but was not limited to: (a) standard textbook
editions for teacher and students; (b) supplemental materials such as transparencies, Notetaking
Guides, reading study guides, and assessments; and (c) technology such as DVDs, videos,
PowerPoint presentations, and Internet websites.

During the on-site data collection activities, researchers noted that treatment teachers varied in
their level of implementing the McDougal Littell Life Science materials. Thus, for purposes of
this study, treatment teachers were categorized in one of three groups: (a) “full” implementers,

© ΣndVision Research & Evaluation, Logan UT


19

(b) moderate or partial implementers, or (c) low implementers. “Full implementers” were
teachers who regularly employed a wide range of McDougal Littell Life Science resources or
resources from the primary science program, including materials and instructional strategies. In
contrast, “low implementers” were treatment teachers who seldom used or used only limited
McDougal Littell Life Science materials and strategies or comparison teachers who did not use a
single published program consistently. Moderate or partial implementers may have used a single
published program, but they also regularly included teacher-developed or other materials and
activities. Partial implementers may have used McDougal Littell Life Science materials but did
not consistently use the embedded instructional strategies. General descriptions of the two
anchor points for this continuum of implementation, full and low implementers, follow.

Full Implementers. A number of factors were included in the rating of “full implementer.” The
following factors and descriptions help depict the qualities of those teachers.

Environmental Factors. Environmental factors refer to the physical arrangement of


classroom, equipment, and materials. Data collectors recorded the following classroom
characteristics for instructors who fully implemented Life Science resources:
(a) Student seating was arranged to encourage individual learning and cooperative
grouping. All students were able to see instruction.
(b) Teachers displayed science posters, student projects and various learning resources—
produced by the publisher of their primary Life Science program, by the teacher, or
by students—in a purposeful and orderly manner. In treatment classrooms, this
included display of “Big Ideas,” “Key Concepts,” and science vocabulary.

Materials and Instructional Strategies. Full implementers regularly followed the lesson
format identified in the teacher’s edition. Consequently, treatment teachers routinely applied
effective teaching practices embedded in the McDougal Littell Life Science, namely: review of
previous content, presentation of new content tied to prior knowledge, guided practice with
scaffolding to promote learning of increasingly complex concepts, and independent practice.
Within this framework of effective instruction, teachers who fully implemented frequently relied
on the following:

(a) Teacher and student edition textbooks


(b) Clearly stated objectives and set expectations for classroom routines and behavior
(c) Start of class warm-ups and other forms of review
(d) Transparencies for notetaking and Notetaking Guides
(e) Strategies for learning and practicing science vocabulary
(f) Graphic organizers to visually organize concepts and their properties
(g) Connections to prior knowledge and experiences
(h) Questioning strategies to check for student understanding
(i) Frequent feedback and reinforcement of appropriate student responses
(j) Organized sequencing of concepts using PowerPoint presentations
(k) Internet websites and other software and aids to visually depict concepts
(l) Science labs and other hands-on activities to develop concept understanding
(m) Frequent assessment of student learning, including homework assignments

© ΣndVision Research & Evaluation, Logan UT


20

Researchers further confirmed the frequent application of these resources and instructional
strategies through data collected in student focus groups.

Overall Tone of Classroom. Full implementers often established an overall classroom


tone conducive to student learning. Perhaps this tone can best be described as a teacher’s clear
expectation to students that they need to “be about the business of learning.” More specifically,
teachers who fully implemented demonstrated:

(a) Frequent opportunity for student response, practice, and feedback


(b) High rate of student engagement to maximize opportunities for learning
(c) Low frequency of off-task behavior with appropriate intervention to maintain
engagement
(d) High rate of positive reinforcement
(e) Differentiated instruction to meet the needs of all learners
(f) Materials and resources for English Language Learners (ELL)

Low Implementers. Low implementers used materials and instructional strategies poorly, or
included materials and strategies from other publishers or sources.

Environmental Factors. Environmental factors refer to the physical arrangement of


classroom equipment or materials. Data collectors recorded the following typical classroom
characteristics for instructors who were low implementers:

(a) Student seating divided in rows emphasizing individual work


(b) Classrooms appeared less welcoming, disorganized, or void of displays of student
work
(c) Little evidence of materials included with published resources was seen on walls,
desks, or in bookcases

Materials. Low implementers typically followed a less structured and more varied
routine or presentation than teachers who fully implemented a Life Science program.
Consequently, low implementers routinely presented new content without opportunities for
student responses or had students work independently while failing to regularly assess students’
skills through review of previous content or guided practice. Within this framework of less
effective instructional strategies, teachers who were low implementers more often relied on the
following limited resources:

(a) Teacher and student edition textbooks


(b) Reading study guides
(c) Chapter tests
(d) Teacher-generated activities or worksheets, or non-McDougal Littell materials
and assessments

Researchers further confirmed the infrequent use by treatment teachers of additional McDougal
Littell Life Science resources through data collected in student focus groups.

© ΣndVision Research & Evaluation, Logan UT


21

Overall Tone of Class. Low implementers often established an overall classroom tone
less conducive to student learning. More instructional time was spent managing off task
behaviors. Classrooms where teachers only partially or minimally implemented the Life Science
curriculum often demonstrated:

(a) Lower rates of positive reinforcement


(b) Infrequent opportunity for student response and feedback
(c) Lower rates of engagement for some or many students
(d) Increased frequency of off task behavior, including talking during instruction, out-of-
seat disruptions, physical interactions with other students, and other behavior that
disrupted other students and decreased learning opportunities

In summary, instructors who fully implemented McDougal Littell Life Science had similar class
sizes and available instructional time as teachers who were low implementers. However, full
implementers more often demonstrated (a) greater attention to environmental factors that
influenced instruction, including access to publisher’s materials, (b) a wider range of Life
Science resources and instructional strategies that were embedded in the program implemented
within the context of more consistent adherence to effective teaching strategies, (c) increased
student engagement and opportunities for student learning, and (d) an overall positive classroom
tone where student effort was recognized and that was more conducive to student learning.

Factors Affecting “Full Implementation” and Student Outcomes

As in any school setting, many contextual factors, including characteristics of schools, teachers,
and students, may affect implementation of any program, and the impact of the program on
student outcomes. In this study, some of these contextual factors that may have influenced
implementation or student outcomes included the following, non-exhaustive lists.
District or school characteristics. School environments included lock-down facilities and
security guards, access to and support of technology resources, Title I funds and resources,
socioeconomic status, proportions of minority students, number of students learning English,
number of students with disabilities and level of inclusion, school dress code or uniforms,
number of students per classroom, classroom aides, state or district standards, state or district
testing, and even the school’s response to weather and other external events.

Teacher characteristics. Teacher characteristics that could affect study outcomes included
voluntary or district-stipulated participation in the study, years of teaching experience and
academic preparation, preparation for managing instructional time and student behaviors, level
of interaction with students during instruction, expectations for students, attitudes towards
students, use of a positive or punitive reinforcement system, extracurricular demands, and
comfort with observers visiting the classroom.

Student characteristics. Contextual factors related to student characteristics included number


of days attending school (or absent from school), distractibility (potentially due to factors
external to school), classroom seating assignments, time of day, gender, socioeconomic or
minority status, disabilities, ability to read and communicate using English, interest and ability in
content area (e.g., science), and many others.

© ΣndVision Research & Evaluation, Logan UT


22

The following descriptions submitted by teachers, with identifying information removed,


describe the many uncontrollable factors that affect students’ opportunities to learn in school and
classroom settings. The first was submitted by a treatment teacher who participated in this
science study, and the second was sent to us by a treatment teacher participating in a math study
(conducted concurrently with the science study) whose comments reflected those we heard
frequently from teachers participating in the science study.

Study disruptions caused by factors mostly external to school settings.

Well, this year has been quite interesting! First of all, our Middle School was
struck by lightning within the first few days of school. A fire erupted which
engulfed our library and sent soot and smoke throughout our school. At that point,
our school was placed on double sessions with another Middle school.

What does that mean? It simply means that our students and faculty share the
same school building but at different times of the day. Instead of attending school
from 7:20 a.m.- 2:35 p.m.(Teachers attend for 7:00 a.m.-2:50 p.m.), our students
arrive for school at 12:20 p.m. - 4:30 p.m.(Teachers from 9:30 a.m.- 4:50 p.m.)
This shortened schedule has taken away important teaching time.

Also, the students have been home all morning and I find it very difficult to get
them to focus on their schoolwork. By the late afternoon, they are tired and not
exactly working to their potential. The school is still required to serve a bag lunch
during our first period class. The students receive P.E. three times a week. As you
can see, the biggest problem facing our students is the shortened amount of time
for classroom instruction.

The fire presented problems which our school had never faced before. (Example:
changes in bus schedules for bringing students to school at the later time,
preparing lunch for almost 3,000 students including both schools, after school
sporting events and dance lessons, etc).

Once we became accustomed to our new schedule changes, the threat of


Hurricane Ivan the Terrible threw us into a tailspin. We missed eight days of
school which must be made up during Thanksgiving, Christmas, New Year's, and
other scheduled holidays.

So, as you can see, this year has been very interesting! Someone once said that the
most important quality for a teacher to possess is "Flexibility"! I would have to
agree!

Study disruptions caused by factors mostly internal to school settings.

Since [the district agreed to participate in the study after students had returned to
school and] we received the textbooks [for the study] after the beginning of

© ΣndVision Research & Evaluation, Logan UT


23

school, … we had to learn a new series, correlate this to our standards and pacing
guide, change out books and get to work.

This coincided with the three days of assemblies related to fund raising, etc. and
one day of Fall Break. November brought a student holiday for Election Day, and
2 1/2 days vacation for Thanksgiving. December delivered assemblies (thus not
having classes) for Choral Concerts and Band Concerts. We were out from the
21st through January 4. Oh, and did I mention in-school ball games????

Second semester starts. Ahhh, January. January began with a 3-day week
followed by a 5-day week (with the assembly for the Spelling Bee), followed by a
4-day week (Martin Luther King Holiday), followed by a 5-day week (with a
Character Counts Assembly day and a Tag all day field trip.) The Flu Bug is
surfacing....

February brings 2 days out of school the first week for a Flu Epidemic. This is
followed by a 4-day week and then ANOTHER 4-day week, ANOTHER 4-day
week (Inservice day), and, yes, ANOTHER 4-day week (President's
Day/Inservice). Also, thrown in (on a day we were actually in class) was a poll on
drug use/knowledge that took 2 class periods.

March is off to a good beginning: a 5-day school week! However, this is


immediately followed by a 4-day week (staff development day). The 3rd week is a
5-day week, followed by the week of Spring Break. The last week in March is a 4
day week, for the students have 6 school days off...teachers report on Monday the
28th for Staff Development.

The 1st week in April (4-8) is a 5 day week, followed by a 4-day week. The 15th
is another in-school Administrative Day. We are now into the 3rd week in April.
Monday, the 18th, is another Student Holiday (Teacher Inservice Day). Beginning
on Tuesday the 19th through Wednesday the 27th, the calendar is blocked off for
State Testing. I do not know the exact schedule for this. Testing is not all day, and
some days are make-ups...

May. School is out the 20th (Friday), with students returning on Monday to pick
up their report cards. Those 3 weeks we are here are filled with May Days,
assemblies, concerts, team assemblies, school-wide assemblies, AR Reading
reward day, etc., etc., etc...

As you can see, it is hard to get much traction!! We get started, just to be off a day
or two and have to review when we return. I have not included the 4 (FOUR)
pull-out programs that are constantly removing students from the classroom (but
NOT special areas!) to attend a variety of reform or enrichment programs. Add to
this the art and music field trips not previously mentioned, sickness, court dates,
death in families, etc. and it seems like our academic calendar shrinks each year.

© ΣndVision Research & Evaluation, Logan UT


24

Yet programs get more difficult, and the public expects higher test scores each
year. And EVERYONE expects ALL schools to be above average!!! (We really
need to focus on statistics a bit more!)

I hope this sufficiently addresses [the question about whether any disruptions may
have affected the study]...I know there are more things that occurred (the intercom
every few minutes comes to mind), but once you see the layout on the calendar, I
think you begin to get the Big Picture.

Using Other Curriculum in Comparison Classrooms

Comparison group teachers were asked to teach using whatever curriculum they had used
previously—none of the schools involved had adopted a new science program to be implemented
this school year, so all comparison group teachers, with the exception of one first-year teacher,
had taught using the same curriculum at least one year prior to this study.

Programs Used in Comparison Group Classrooms

Teachers in the 15 comparison classrooms reported that they used the following Life Science
programs. None of these teachers reported that they had attended professional development for
using these programs in the past three years.

Publisher Name of Text Publication Year Count

Addison-Wesley Science Insights-Exploring Living Things 1994 1


Holt, Rinehart, and Holt Science & Technology 2001 1
Winston Holt Science & Technology 2002 3
Holt Science & Technology 2003 3
Prentice Hall Science Explorer: Life, Earth, and Physical Science 2001 4
Science Explorer: Life, Earth, and Physical Science 2002 1
Not Reported 2

Teaching and “Full Implementation”

Teachers in the comparison group also ranged widely in the degree of implementation of the
materials and resources they used. Overall, they more frequently used materials from a variety
of sources and less often followed the sequencing in their primary texts. However, a few of the
teachers in the comparison group did more closely follow the publisher’s materials and lesson
plans, and incorporated instructional strategies similar to those embedded in McDougal Littell
Life Science.

Contextual factors that affected teachers in the treatment group also affected teachers in the
comparison group—with one exception. Teachers in the comparison group did not have to
change what they would have typically done in their classrooms, other than let the researchers
visit to observe. They were not required to develop new lesson plans, use new materials, or add

© ΣndVision Research & Evaluation, Logan UT


25

new materials to their classroom environment. The amount of stress experienced by the
comparison teachers as a result of participating in this study was substantially less than that
experienced by teachers randomly assigned to the treatment group. Treatment group teachers
were required to use a new Life Science program, which involved new lesson plans and
activities, use of technology resources, and for those who implemented fully, potential changes
in teaching practices to implement the instructional strategies embedded in McDougal Littell Life
Science. For comparison teachers, the year started with “business as usual,” while treatment
teachers whose districts agreed to participate in the study near the start of school scrambled to
learn how to implement a new Life Science program.

© ΣndVision Research & Evaluation, Logan UT


26

DESCRIPTION OF POPULATION AND SAMPLE

Because this study was a feasibility study conducted to provide evidence for product
effectiveness with the hope of obtaining Federal funding for a large-scale study, the sample size
was determined using a power analysis with slightly relaxed parameters. The power analysis is
further described in the analysis section of this report.

Population

The population of science teachers from which the sample could have been selected included
teachers in all schools or districts which (a) had not already purchased the McDougal Littell Life
Science, (b) were not actively discussing purchase of McDougal Littell Life Science with sales
representatives, and (c) had student populations that were reasonably close to national averages
for socio-economic status (SES) and ethnicity percentages.

Sample

The original proposal for this study specified participation by a minimum of 32 teachers, with
one of their 7th grade science periods selected as the target classroom. Because we wanted to
reduce the potential for first year teachers, we told district science coordinators that we were
searching for 40-50 teachers who were interested in participating. We asked them to return a list
of 4-8 7th grade science teachers from each middle school that had district and/or school
permission to participate in the study. Ideally, we planned to select only one teacher per school,
with no more than 3 teachers selected from any one large school if very large schools or fewer
districts agreed to participate in the study.

Selecting the Sample

Personnel from McDougal Littell’s Educational Research staff made the first contact with
districts, typically contacting the district science coordinator or curriculum director. A sample of
the script followed during these calls is included in Appendix B. When the response to a request
to participate in the study was favorable, letters that briefly explained the study and solicited
permission to participate were emailed or faxed to the district contact (see Appendix C), and the
contact name was submitted to EndVision’s Project Coordinator. The Project Coordinator
continued making contact and discussing the study with district representatives until the district
either (a) declined participation either directly or by not returning the promised signed letter of
participation, or (b) agreed to participate and returned the signed letter of permission that
included names and contact information for teachers agreeing to participate in the study.

EndVision started calling districts on May 11, 2004. Because teachers for three studies were
recruited simultaneously (e.g., McDougal Littell Life Science, Middle School Math and Algebra I
programs), districts were asked if they were willing to participate in one or more of the studies.
As of September 3, 2004, when the final district agreed to participate in the science study and
returned names of teachers, EndVision’s Project Coordinator had made over 1400 contacts with
93 districts to seek study participants. McDougal Littell had called many other districts that
expressed no initial interest.

© ΣndVision Research & Evaluation, Logan UT


27

Reasons cited by those declining districts that provided an explanation to EndVision staff are
included in the table below. Additionally, a few districts required extensive paperwork to be
filed, school board or district administrator approval, and/or district-level IRB approval.
Although a number of districts that required this more extensive approval route sought approval
early on, few school boards or district administrators approved participation. Given the time
constraints as the school year approached, we sought districts where decisions to participate
could be made more quickly.

Table 3. Reasons Given for Not Participating in the Study

Reason for not participating Frequency

Conflict with competing science programs, curriculum standards, or other ongoing research 15
District personnel initially expressed interest but time ran out to continue recruitment 10
activities
District personnel stated that there was insufficient time to prepare for the study 7
People were simply too busy 6
The decision maker was no longer available, (e.g., summer leave, change in personnel) 3
District agreed to participate but teachers declined 1

We had set 32 teachers as the absolute minimum for conducting the study in anticipation of
attrition. By the start of the school year, given varying start dates across districts, six school
districts had agreed to participate and had provided contact information for 37 teachers. One
School District provided 14 names, but we decided that 14 teachers from one district were too
many to include in the study. We chose 10 of the teachers, eliminating schools with student
populations from the highest ranked socioeconomic status. In all, 34 teachers in six school
districts representing 20 schools agreed to participate in the study. Overall, twelve schools with
teachers participating in the study had only one participating teacher, 4 schools contributed 2
teachers, 3 schools supplied 3 teachers, and one school included 4 participating teachers. The
participating districts included the following, with New York designated as the NORTHEAST
group, districts in Alabama and Tennessee labeled SOUTH, and Utah and California districts
included in the WEST group.

Table 4. Districts and Numbers of Teachers Agreeing to Participate

State # of Teachers # of Schools Region

New York 3 1 NORTHEAST


New York 5 5 NORTHEAST
Alabama 8 3 SOUTH
Tennessee 6 2 SOUTH
Utah 10 8 WEST
California 2 1 WEST
5 States 34 Teachers 20 Schools 3 Regions

© ΣndVision Research & Evaluation, Logan UT


28

Randomly Assigning Teachers to Groups

Because the start of school varied among districts and districts agreed to participate at different
times, teachers were randomly assigned by district to comparison or treatment groups. Random
assignment with replacement was completed using the following process. Names of teachers
from the same district were written on same-size slips of paper and folded in half. The papers
were then drawn from a hat. The first name drawn was assigned to the treatment group and
replaced in the hat. The second name drawn was assigned to the comparison group and replaced
in the hat, and so on alternating treatment and comparison groups until all but one name had been
assigned. The last name was automatically placed in the next group (e.g., comparison or
treatment). If a name was drawn that had been drawn previously, it was replaced in the hat and
drawing with replacement continued until a new name was drawn.

Final Sample

Although 34 teachers agreed to participate in the study, the final sample used in the analysis
included only 29 of the 34 teachers. The following table lists the sample selection, random
assignment, and participation.

Table 5. Sample Selection, Random Assignment, and Participation

Sample Number of Number of Comments


Treatment Teachers Comparison Teachers

Original 16 17
Random assignment 16 17 See Team Teaching
repeated in one district (+ 1 team teacher) paragraph below
Two treatment teachers 14 17 See Withdrew Prior to
withdrew (+ 1 team teacher) Study paragraph
Final sample used in 14 15 See Final Sample
outcome analysis paragraph below

Team Teaching. One teacher who had been randomly assigned to the comparison group team
taught with a teacher at the same school who had been assigned to the treatment group. Because
they refused to give up team teaching and we were concerned about treatment diffusion if they
remained in the groups to which they had been randomly assigned, we requested the name of a
replacement teacher. The names of the replacement teacher and the teacher who had been
randomly assigned to the treatment group were again put into a hat and the first of the two
names—to be assigned to the treatment group—was drawn. The original treatment teacher’s
name was the name drawn, so she was again assigned to the treatment group. Because of the
team teaching situation, both the treatment teacher and her partner were given McDougal Littell
materials and included in the data collection process. The additional teacher was not included in
the final analysis as she had not been randomly assigned to a group.

© ΣndVision Research & Evaluation, Logan UT


29

Withdrew Prior to Study. Two treatment teachers dropped out of the study prior to the start of
school. One teacher who had been assigned to the treatment group decided not to participate
prior to the professional development session. The professional development was conducted in
that district on Thursday and Friday before school started on Monday. The teacher would have
been unable to attend on Friday because she was directing soccer tryouts, and she was
uncomfortable changing to a new curriculum so soon before school started as she had already
laid out lesson plans for the first units of life science. Additionally, the contract days for her
district prior to school starting were few and she was unwilling to sacrifice a day in lieu of
preparing her classroom. Another teacher who left the study attended the in-service just prior to
their school start date (but during teacher contract time). After the morning session, she left in
tears explaining that her life was “too much of a mess”—she couldn’t face adding the stresses of
lesson development and teaching with new curriculum at that time.

Final Sample. The final sample included 14 treatment teachers and 1 team teacher using the
McDougal Littell materials from whom data was collected. Data from the additional teacher was
excluded from the data analysis, but may be used for later reporting and publications. Seventeen
were randomly assigned to the comparison group. Data including pretests and classroom
observations were collected from these teachers, but 1 comparison teacher would not participate
in a second classroom observation. That teacher and another did not return the final data
package, which included student content and attitude posttests, teacher questionnaire, and student
demographics, resulting in complete data from 15 comparison teachers.

Informed and Uninformed Volunteers. When we contacted teachers to tell them the group to
which they had been randomly assigned, we learned that some of the districts had asked for
volunteers who would agree to participate in the study prior to submitting names. Other districts
provided the names of teachers they expected to participate without asking for volunteers. After
discussing the study with those teachers who had not volunteered, even those who had been
reluctant decided to participate. The “selling points” included teacher honorariums, the materials
provided to treatment teachers at no cost, and the research design—a few of the reluctant
teachers who were assigned to the comparison group might have typically refused but decided to
participate because of the quality of the randomized research design.

Target Classroom

Target classrooms for the study were selected using the following criteria:
• Students were 7th grade students, if the teacher taught multiple grades.
• First and final periods were excluded, as the researchers conducting this study have observed
teacher and student behavior that occurred during these periods that was considerably
different than other periods.
• Classrooms were selected that had no more than a “typical” number of “inclusion” students
or students receiving special services.
• If the school used some form of tracking (i.e., classrooms consisting almost exclusively of
struggling, “average”, or high achieving students), “average” classrooms were selected.

© ΣndVision Research & Evaluation, Logan UT


30

• Class periods were selected that maximized the number of classroom observations that could
be conducted per day.

Demographics of the Sample

Description of Schools

Four of the 20 schools in which the study was conducted included both treatment and
comparison teachers. For this reason, a school could be included in both the treatment and
comparison descriptions below.

Treatment teachers participating in this study taught at 10 different schools. Of the 10 schools,
two had minority enrollments that exceeded 50%, three had more than 50% of their students
eligible for free or reduced lunch, and four were categorized as Title-1 Eligible by the U.S.
Department of Education. The average pupil-teacher ratio was 19.4, and the average total school
enrollment was 957.8.

Comparison teachers participating in the study taught at 12 different schools. Of the 12 schools,
three had minority enrollments that exceeded 50%, four had more than 50% of their students
eligible for free or reduced lunch, and five were categorized as Title-1 Eligible by the U.S.
Department of Education. The average pupil-teacher ratio was 18.8, and the average total school
enrollment was 943.2.

Description of Teachers

Of the 14 treatment teachers who completed the study, all had bachelors degrees in either science
or science education, and three had advanced degrees. The average McDougal Littell teacher had
taught for 11.2 years, with a maximum of 35 years and a minimum of 1 year. On average,
treatment teachers had 9.7 years of experience teaching science, with a maximum of 20 years
and a minimum of 1 year.

Of the 15 comparison group teachers who completed the study, 12 had bachelors degrees in
either science or science education, and eight had advanced degrees. The average comparison
group teacher had taught for 12.5 years, with a maximum of 26 years and a minimum of 0 years.
On average, comparison group teachers had 10.6 years of experience teaching science, with a
maximum of 26 years and a minimum of 0 years.

Description of Classrooms and Students

The average total class size for treatment classrooms was 29.2 students, with a maximum size of
36 and a minimum size of 23 students. On average, parents of 2 students per classroom refused
to allow their child to participate in the study by returning the letter refusing permission (range,
0-5; see Appendix D). One classroom consisted entirely of male students. Minority composition
of classrooms ranged from 0.0% to 92.3%, with an average of 24.4%. On average, 8.7% of
students in each classroom had been categorized with a learning disability; this statistic ranged
across treatment classrooms from 0.0% to 26.3%. Finally, in the average treatment classroom
3.4% of students did not speak English as a primary language and 1.4% did not use English as

© ΣndVision Research & Evaluation, Logan UT


31

their primary home language, although these figures respectively varied from 0.0% to 30.8% and
from 0.0% to 9.1%.

The average total class size for comparison group classrooms was 27.7 students, with a
maximum size of 33 and a minimum of 23. Minority composition of classrooms ranged from
0.0% to 100.0%, with an average of 20.4%. On average, 6.7% of students in each classroom had
been categorized with a learning disability; this statistic ranged across comparison classrooms
from 0.0% to 31.0%. Finally, in the average comparison group classroom, 1.7% of students did
not speak English as a primary language and 6.8% did not use English as their primary home
language; respectively, these figures varied from 0.0% to 17.6% and from 0.0% to 46.2%.

INSTRUMENTATION AND DATA COLLECTION

The instruments used in the study are listed in Table 6. The following sections provide detailed
descriptions of the instruments, including reliability data where appropriate.

Pretest/Posttest

The pretest/posttest (see Appendix E) was comprised of items from two sources: (a) the Science
Benchmark Test from the University of Oregon’s Institute for the Development of Educational
Achievement (IDEA) and (b) items from the National Assessment of Educational Progress
(NAEP) 1996 and 2000 8th grade science tests. The Science Benchmark Test was developed as a
standardized test covering life, earth, and physical sciences, with students given 12 minutes to
complete the 60 vocabulary matching items and 15 minutes to complete 60 multiple choice
items. Researchers at the University of Oregon compared the predictive ability of the Science
Benchmark Test to the Florida Comprehensive Assessment Test (FCAT) and found the
following with a 10th grade sample of 110 students:

Both [vocabulary matching and multiple choice] components were significantly


related to FCAT pass/no pass group membership and contributed uniquely to the
correct identification of students in each category. … The resultant discrimination
between the FCAT groups indicates that scores on the benchmark Life Science
subtest correctly captured 98.4% of those students who did not “pass” (i.e.,
performed poorly) on the FCAT science outcome and 95.7% of those students
who did “pass” (i.e., performed well) on the FCAT science outcome. A total of 3
students were incorrectly placed in the FCAT categories on the basis of their
benchmark Life Science scores. (J. Caros and J. Derschon, Institute for Academic
Access, University of Oregon, personal communication, August 12, 2004)

Because the two components of the Science Benchmark Test covered vocabulary and concepts
not related to life science and required substantially less time to administer than the length of one
middle school science period, we opted to use only life science items and include additional
items that required students to provide short answer responses selected from the NAEP
assessments. Twenty vocabulary matching items relating to life science and 35 multiple choice

© ΣndVision Research & Evaluation, Logan UT


32

items that included life science and measurement questions were selected from the Science
Benchmark Test. The Section A multiple choice items are the same in both forms of the test
created for this study. The same twenty vocabulary matching items in section B are presented in
a different order in forms A and B. Section C includes different items in forms A and B, selected
from the 1996 and 2000 NAEP assessments.

Table 6. Instruments and Frequency of Data Collection

Instrument Data Collected Timing and Frequency of


Data Collection

Pretest/Posttest Student knowledge of science Prior to intervention


After both units were taught and near
end of first ½ of school year
Student Attitude Survey Student report of attitudes towards Prior to intervention
science After both units were taught and near
end of first ½ of school year
Classroom Observation Researcher observation of During each of the two site visits, such
Instrument (a) teacher behavior and teaching that each teacher was observed a
practices, (b) use of questioning, total of three times (once on one site
(c) student behavior, (d) student visit, two different days on the other)
and teacher engagement, and
(e) classroom environment
Using Science Materials Researcher report of materials During observations conducted during
Observation observed each of the two site visits
Using Science Materials Teacher report of materials used Following each of the two site visits
Checklist
Informal Teacher Teacher attitudes, opinions, and During each of the two site visits
Interviews and practices Solicited and unsolicited email
Comments messages throughout study
Informal Student Student attitudes, opinions During each of the two site visits
Interviews
Journaling Questions Teacher report of teacher behavior, Every 10-14 days after the start of the
classroom environment, opinions study
Teacher Questionnaire Teacher report of student After posttest was administered
demographics, assessment scores
Teacher report of teacher
demographics, opinions,
practices, educational experience
Student Questionnaire Student report of science After both units were taught and near
experiences end of first ½ of school year
Teacher Focus Group Treatment teacher report of use of During second site visit
Questions materials, opinions
Student Focus Group Treatment student report of During second site visit
Questions attitudes towards materials

© ΣndVision Research & Evaluation, Logan UT


33

Prior to finalizing the test items and format, researchers administered the test to 4 students
entering 7th grade and 4 students entering 8th grade. These students attended schools in northern
Utah, and according to their parents, they were “average” students. The 8 students completed the
test in 40 minutes or less, and 8th grade students scored 32/100 points higher, on average, than 7th
grade students. Because class periods for middle school rarely exceed 50 minutes, researchers
felt the length of this test and the concepts tested were appropriate for this study.

Classrooms participating in the study were randomly assigned to take either form A or form B of
the pretest, with the alternate form used as the mid-year posttest. Originally, we planned to
administer a posttest at the end of the year, using the same form administered for the pretest.
However, some classrooms did not take the mid-year posttest until February—given hurricanes
and school fires that resulted in missed school days and shortened periods, which slowed the
pacing of instruction—and teachers did not want to lose another period at the end of the school
year, particularly when so many end-of-year days were allocated for standardized and state-
mandated testing. Without the final end-of-year posttest—which was the same as the pretest—
and because the NAEP items in section C were different in forms A and B, there was no way to
compare section C scores. Accordingly, section C scores were not used in the data analysis.

The percent correct on each of sections A and B, as well as the total percent correct for sections
A plus B were used in the data analysis. Cronbach’s Alpha, a statistic indicating the internal
reliability of a test instrument, provides an estimate of the extent to which the instrument can be
expected to produce consistent scores over time. Values higher than 0.80 are usually considered
acceptable. Cronbach’s Alpha for forms A and B are shown in the following table. Item
difficulty is the proportion of examinees correctly answering a particular item. Point-biserial
correlation is the correlation between examinee responses to a particular item and their total
score on the test. It ranges from -1.00 to +1.00, with moderate to high positive values indicating
that the item is effective at discriminating between examinees of high and low ability.

Table 7. Measurement Properties of Pretest/Posttest

Assessment Administration Form Number Cronbach’s Mean Item Mean Item


of Items Alpha Difficulty Point-Biserial
(Mean Item Correlation
Variance)
Science Pretest A 55 .862 .42 (.20) .34
Content
B 55 .873 .46 (.21) .35
Posttest A 55 .923 .55 (.22) .44
B 55 .915 .56 (.21) .42

© ΣndVision Research & Evaluation, Logan UT


34

Student Attitude Survey

The Student Attitude Survey was adapted from the NSF-funded Mathematics Across the
Curriculum (MATC) project at Dartmouth College (Korey, 2000, see Appendix F). It has been
widely used in many NSF-funded and other studies, and takes less time to administer than other
commonly used content attitude surveys (e.g., 15 minutes compared to 30 minutes or more).
Because this survey was designed for math attitudes and targeted towards college students, the
survey was adapted for 7th grade science students by (a) changing “math” to “science,”
(b) changing the language slightly to make the items easier to understand by less fluent readers,
and (c) modifying selected questions to reflect science activities. For example, “Expressing
scientific concepts in mathematical equations just makes them more confusing” became “Using
science concepts in math equations just makes them more confusing” and “I rarely encounter
situations that are mathematical in nature outside school” became “I rarely see situations outside
of school that can be explained using science.”

The attitude score was created by assigning numbers to each response category (5=Strongly
Agree, 4=Somewhat Agree, 3=Neutral, 2=Somewhat Disagree, 1=Strongly Disagree). Many
items were reverse-scaled prior to the analysis so that a high score consistently reflected a high
level of attitude towards science, and vice-versa. The total score for each student was then
computed by summing the student’s responses to all items on the scale, creating a possible high
score of 175. Cronbach’s Alpha, shown in the following table, was slightly lower than the
commonly accepted minimum value of 0.80.

Table 8. Measurement Properties of Student Attitude Survey

Assessment Administration Number Cronbach’s


of Items Alpha
Science Attitude Pretest 35 .767
Posttest 35 .734

Classroom Observation Instrument

Although researchers had intended to use existing instruments to the extent possible, no
classroom observation instruments were located that were deemed appropriate for this study.
Because the McDougal Littell Life Science was not strongly based on an inquiry-based approach
to teaching and learning, a number of the existing instruments were eliminated upon review.
Instruments that were reviewed include the following (references still needed on some that were
sent to us from other researchers upon our request):
• Behavioral Protocol (Received from Dr. Hyrum Henderson, retired, Utah State University
Department of Special Education, Logan, UT. No additional reference available.)
• Classroom Observation and Analytic Protocol and Teacher Interview Protocol (Horizon
Research, 2000)

© ΣndVision Research & Evaluation, Logan UT


35

• Collaboratives for Excellence in Teacher Education (CETP) Classroom Observation Protocol


(Lawrenz, Hoffman, & Appeldoorn, 2002)
• Effective Teaching Practices (Hofmeister & Lubke, 1990)
• GeoKids Classroom Observation Protocol (Frontier 21 Educational Solutions, 2004)
• Reformed Teaching Observation Protocol (RTOP; Sawada & Piburn, 2000)
• Teacher Performance Measure (TPM; Stenhoff, et al., 2004)
• The Praxis Series: Professional Assessments for Beginning Teachers (ETS, 2004)
• TIMSS Videotape Classroom Study Coding Procedures (Stigler, et al., 1999)
Given the dearth of available instruments that (a) reflected the goals of this study, (b) measured
the research-based instructional strategies used in McDougal Littell Life Science, and
(c) included reports of a high level of reliability, researchers decided to develop an observation
instrument that better captured teacher and student behaviors in the classroom (see Appendix G).
Research on effective teaching practices in general, as well as teaching for science understanding
specifically, support measuring the following teaching practices and teacher and student
behaviors, which were recorded or measured during the classroom observation.
• The types of activities (review, new content, guided practice, independent practice, and/or
lab), the time allowed for each activity, and the length of the transition between activities;
• The kinds of student responses elicited by teachers, and the subsequent feedback sequence.
In this category for science teaching, the research suggests that higher levels of questioning
(e.g., generate a hypothesis, synthesize, etc.) and checking for student understanding promote
concept knowledge and problem-solving abilities;
• The frequency and type of off-task student behaviors, particularly those that disrupt other
students, and the teacher’s response (or lack thereof) to those behaviors.
• Student grouping and overall behavior, the concepts taught and a description of the
activities/strategies used, and a list of materials;
• A summary of activities used to review, introduce new content, provide guided practice with
feedback and checking for understanding, and assign independent practice;
• Additional factors such as use of “wait time” when eliciting student responses, circulation
throughout the room, calling randomly on students to check understanding, use of
differentiated instruction or cooperative activities, evidence of planning, evidence of routines
that students followed, and other factors that have research-based evidence supporting their
impact on student achievement and attitudes.
• An overall rating of levels of (a) use of effective teaching practices and (b) opportunities to
learn.
Observers trained by observing and coding behaviors with the form in middle school classrooms
not involved in the science study. Additionally, those who did not reach reliability quickly
observed study classrooms with another observer, with differences discussed and consensus
reached after the observation.

© ΣndVision Research & Evaluation, Logan UT


36

Observers continued to work in pairs until they achieved 80% or higher agreement on all scaled
items, with items for which they did not agree exactly separated by only 1 point on the scale.
Frequency based items coded on the first page of the observation sheet were required to show
similar patterns of behavior, with at least 80% of the coding being matched. Once an acceptable
level of reliability was reached, observers conducted inter-rater reliability checks during 1 out of
every 4 observations. In all cases, the 80% or higher level of agreement was reached.

Using Science Materials Checklist Teacher Self-Report Checklist

At each of the two site visits, teachers were asked to complete the Using Science Materials
Checklist (see Appendix I). If they were unable to complete the checklist before we left the
school, we provided a self-addressed, stamped envelope so they could return the checklist. We
asked teachers to complete the checklist twice during the study so we could determine if they
used additional resources and ancillary materials as time passed and they became more
comfortable with the materials.
An analysis of the differences between the checklists revealed very few differences. Two of the
treatment group teachers did use technology more frequently by the end of the study, but the
frequencies only changed by one category (of five) for a few items.
Responses on a number of items from these checklists were extracted to calculate a variable for
degree of implementation (categories for frequency of use were assigned a value from 1 to 5 for
low to high use, and the sum of the item responses was divided by the number of items—which
varied between treatment and comparison groups). The items are listed below and were selected
because if teachers fully implemented the program, they would need to more frequently use these
materials. Technology use was not included as quite a few of the teachers did not have sufficient
access to technology to use it in their teaching, and prior research has demonstrated mixed
support for effective use of technology in teaching to improve student achievement. Rather,
effective teaching practices interact with use of technology in teaching, as reported in aptitude-
treatment interaction studies (e.g., see Cronbach & Snow, 1977; Snow, 1989; Snow, Frederico,
& Montague, 1980).
• Teacher Edition
• Pupil Edition
• Notetaking Guide
• Vocabulary Practice
• Daily Vocabulary Scaffolding (treatment only)
• 3- Minute Warm-ups (treatment) or Start of Class Warm-ups (comparison)
• Section Quizzes
• Items relating to Big ideas and Key concepts (treatment only)
• Items relating to differentiated instruction and assessment

Using Science Materials Observation Checklist

During each classroom observation, researchers completed the Using Science Materials
Checklist for classroom observations (see Appendix H). All materials observed being used
during the target classroom period were recorded. This data was compared with data from the

© ΣndVision Research & Evaluation, Logan UT


37

Using Science Materials Checklist that was completed by teachers to verify the teachers’ self-
report responses on the checklist. For example, if a teacher marked the checkboxes that said they
used the Teacher’s Edition and Pupil Edition textbooks for teaching every lesson, we would have
expected observers to check that they saw those materials being used. A comparison of observer
and teacher checklists indicated no inconsistencies.

Informal Teacher Interviews

Because teachers escorted students to their next class in a number of the schools, little time was
available to talk individually to most teachers unless scheduled. Whenever time permitted, we
questioned teachers informally and asked questions based on previous responses to emails or
questions, or based on what researchers observed during site visits. Most teachers were friendly
and open and initiated discussions whenever there was extra time. Information they provided
during these conversations was used to verify level of implementation of McDougal Littell Life
Science or of the programs used by comparison group teachers.

Informal Student Interviews

Because students had only five minutes between class periods, and because in a number of the
schools, students were escorted by teachers as they moved as a group to their next class, little
time was available to talk to students individually. Because of this, no formal questions were
developed for unscheduled talking to students during site visits. Rather, when possible and
where appropriate, students who expressed an interest in visiting with us were asked questions
based on researchers’ observations during site visits. These informal interviews occurred before
and after school when students approached the researchers, between classes as we accompanied
students to their next classroom, or during the lunch period.

Journaling Questions

Although we had planned to send journaling questions to teachers twice weekly, getting teachers
to respond within a few days was difficult and for a number of the teachers, many prompts and
reminders were required. As we could tell from responses that this activity could cause
resentment and frustration, and because we wanted teachers to answer thoughtfully and
thoroughly, we opted to send journaling questions once every week or two. The first journaling
question was sent out approximately one month after the start of the school year, as teachers
covered other topics (e.g., measurement) prior to teaching the two units required by the study.

Questions were developed based on prior responses to questions, unsolicited comments teachers
sent by email, and classroom observations and other data. Journaling questions are included in
Appendix N, with differences between treatment and comparison group questions indicated.
Some teachers consistently responded with short phrases while others wrote volumes, which
paralleled the researchers’ experiences on other research projects where journaling and open-
ended questions were used.

© ΣndVision Research & Evaluation, Logan UT


38

Teacher Questionnaire

The teacher questionnaire (see Appendix J) administered near the end of the first half of the
school year included the following:

• student demographics
• student scores on life science unit or chapter tests for “Cells and Heredity” and “Ecology”
• students scores for reading, math, and science on standardized tests
• teacher demographics
• teacher opinions and practices
• teacher educational experience, including college courses and teacher licensure

As noted in the survey, many of the questions were taken from instruments used in Horizon
Research’s Looking Inside the Classroom: A Study of K–12 Mathematics and Science Education
in the United States (Weiss, et al., 2003). Weiss’s study included 364 teachers from a nationally
representative sample of schools which were selected for the 2000 National Survey of Science
and Mathematics Education (Wiess, et al., 2001).

Student Questionnaire

The student questionnaire was designed to gather students’ opinions about the materials used in
their science classes and students’ use of technology to learn science concepts. The
questionnaires are included in Appendix K.

Treatment Teacher Focus Groups/Interviews

Focus groups or interviews were held with 9 of the 14 treatment teachers, selected because (a)
they were able to meet during site visits and (b) they were teachers who more fully implemented
the program. Although we had hoped to conduct these meetings as focus groups, some teachers
were the only treatment teacher at a site, so they were interviewed individually. Questions used
in the focus groups or interviews are included in Appendix L.

Treatment Student Focus Groups

One student focus group was conducted in each of the six school districts. If more than one
treatment teacher taught in the same school, those teachers were asked to select 3 students each
for a focus group, representing high achieving, average, and struggling students. If only one
teacher was in a school, we requested that six students be allowed to participate in the focus
group, again representing the three levels. Focus groups were conducted with students taught by
most of the teachers who participated in focus groups or interviews. In most cases, we provided
students who participated in focus groups with pizza and pop during the meeting. Focus group
questions are in Appendix M.

© ΣndVision Research & Evaluation, Logan UT


39

DATA ANALYSIS

Qualitative data was analyzed following analysis techniques outlined by Miles and Huberman
(1994) and Yin (1994). In general, the sequence of analyzing qualitative data involved
(a) affixing topical codes to data drawn from interviews, focus groups, and observations
(b) identifying relationships between variables, patterns, themes, and distinct differences
between subgroups, (c) tabulating frequencies of themes or patterns, (d) matching patterns and
processes, commonalities and differences, (e) identifying generalizations that covered the
consistencies discerned in the data, (f) confirming generalizations with follow-up interviews or
phone calls, and (g) building explanations that reflected different theories. Lastly, all qualitative
data was triangulated with documentation, observations, and student achievement data.

SPSS was used to provide descriptive statistics and to examine the data for anomalies and
outliers. Any outliers or anomalies were investigated and reported. Equivalence of treatment
and comparison groups was analyzed based on both teacher and student demographic
characteristics and pretest scores. Dissimilar factors were used as covariates in the quantitative
analysis. Both SPSS and HLM software were used to complete the statistical analysis.

Final Sample

The results shown in Table 9 provide direct comparisons of the baseline characteristics of the
treatment classrooms and comparison classrooms. As the results indicate, the class size,
teachers’ years of experience, years of experience teaching science, and the percentages of
minorities and free lunch participants were statistically equivalent across treatment and
comparison classrooms. Likewise, t-tests for the baseline achievement and composite attitudinal
outcomes for the treatment and comparison classrooms showed no differences. Therefore, the
treatment and comparison samples were sufficiently well-matched at baseline on key teacher and
classroom characteristics, demographics, and on the pretest measures. In these respects, the
sample selection process and randomization procedure appear to have produced a baseline
sample of classrooms that has good internal validity—because there are no large, statistically
significant treatment/comparison differences.

In the previous description of the sample, we concluded that the analysis of the baseline data
showed no important differences between treatment and comparison classrooms. In discussing
the results of our analyses of the science achievement and attitudinal outcomes, we begin by
assessing whether there was differential data and sample attrition between treatment and
comparison classrooms, or systematic attrition from the analytical sample that may have changed
its characteristics relative to those for the baseline sample.

© ΣndVision Research & Evaluation, Logan UT


40

Table 9. Comparison of Baseline Characteristics for McDougal Littell Life Science Treatment
Classrooms and Control Classrooms

Variable Condition N M SD 95% CI for t(27)


Difference
Lower Upper
bound bound

Class size Control 15 27.67 3.27 -4.57 1.47 -1.05


Treatment 14 29.21 4.59

Years of experience Control 15 11.20 9.28 -7.99 5.39 -0.40


Treatment 14 12.50 8.20

Years teaching science Control 15 9.73 8.55 -6.35 4.53 -0.34


Treatment 14 10.64 5.20
% Minority
Control 15 34.40 27.82 -19.99 21.46 0.07
Treatment 14 33.67 26.47
% Free lunch
Control 15 8.31 16.49 -9.12 10.38 0.90
Treatment 14 7.68 14.35
Baseline science
attitudinal composite Control 15 0.12 1.11 -1.00 0.66 -0.42
Treatment 14 -0.05 1.06
Baseline science
achievement Control 15 0.11 1.39 -1.16 0.75 -0.44
Treatment 14 -0.09 1.08
Baseline science
achievement Control 15 0.12 1.02 -0.90 0.54 -0.51
(Parts A and B only) Treatment 14 -0.05 0.86

The final analytical sample for the science achievement outcome was composed of 366 students
in 14 treatment classrooms using McDougal Littell Life Science and 382 students in 15
comparison classrooms. The mean pretest and posttest science achievement scores, along with
standard deviations for each outcome, for treatment and comparison students are presented in
Table 10. Listwise deletion of student cases with missing pretest and/or posttest achievement
data did not cause differential attrition rates by program condition, χ2 (1, N = 967) = 2.33, p =
.13, leaving 75% of the baseline sample of 486 treatment students and 79% of the 481 baseline
comparisons for the analyses. Likewise, for our analyses of attitudinal data, deletion of student
cases with a missing final composite attitudinal measure did not cause differential attrition rates
by program condition, χ2 (1, N = 967) = 0.69, p = .41, leaving 71% of the baseline sample of
486 treatment students and 73% of the 481 baseline comparisons for the analyses. The means
and standard deviations for the attitudinal measures are also presented in Table 10.

© ΣndVision Research & Evaluation, Logan UT


41

To further investigate the internal validity of the study, we compared the pretest scores of those
treatment students who were dropped from the analyses to the pretest scores of the comparison
students who were dropped from the analyses. No statistically significant difference was found
between the treatment and the comparison students, t (174), p = .29 (two-tailed), suggesting that
the initial academic ability of the treatment and the comparison group students who were
dropped from our analyses were similar. The same result applied to the attitudinal measure, the
treatment and the comparison students dropped from the analyses of attitudinal data were
statistically equivalent, t (188), p = .29 (two-tailed).

Table 10. Student-level Means and Standard Deviations for the Outcome Measures

Treatment Control
M SD M SD
Science Achievement*
Pretest 0.03 1.80 -0.03 1.74
Posttest -0.14 1.76 0.13 1.91

Science Attitude Composite*


Pretest 0.07 2.99 -0.07 3.13
Posttest 0.02 3.04 -0.02 3.02

Ability Attitude
Pretest 15.34 3.28 15.61 3.82
Posttest 16.19 2.95 16.15 3.05

Personal Growth Attitude


Pretest 18.11 4.60 17.90 4.67
Posttest 18.85 4.92 18.76 4.75

Utility Attitude
Pretest 16.07 3.03 15.63 3.00
Posttest 16.47 3.02 16.43 3.28

Interest Attitude
Pretest 10.33 3.50 10.22 3.28
Posttest 11.07 2.08 11.08 2.01

Note: * Standardized composite measures.

© ΣndVision Research & Evaluation, Logan UT


42

To address the issue of external validity, we compared those students who were retained in the
analysis to students who were not retained due to missing data. Those students who were
retained had higher pretest science achievement scores than those who were not retained, t (830),
p < .001 (two-tailed). With regard to the analyses of attitudinal data, the students retained for the
analyses and the students who were dropped due to missing data had statistically equivalent
baseline science attitudinal composite scores, t (789), p = .14 (two-tailed). Thus, concerning the
analyses of achievement data, low-achieving students were underrepresented in the analyses and
this does compromise the external validity of the study to some extent.

While conceding this limitation, it is also the case that this data attrition claimed a total of only
23% of the baseline sample. Further, there is no conflict in this experiment between random
assignment of treatment and missing at random. That is, among the complete data observations,
those assigned to treatment have similar covariate distributions to those assigned comparison.
As noted by Rubin (1976) and Little and Rubin (1987), the missing data process is ignorable if,
conditional on treatment and fully-observed covariates, the data are missing at random (MAR).

Hierarchical Linear Model Analyses of McDougal Littell Life Science Treatment Effects

Cluster randomized trials (CRTs) in education randomize at the level of the school or classroom
and collect data at the level of the student. In many respects, they are the optimal design for
school-based and classroom-based interventions. They address practical problems, including the
potential difficulties of randomizing individual teachers within schools or students within
classrooms to alternate treatments, and they are often well aligned with the theory of how
educational interventions work best: as coordinated, systemic initiatives delivered by
organizational-level elements acting in concert. Though greater attention has been paid to these
designs in education in recent years (Boruch, et al., 2004; Raudenbush, 1997), methodological
work related to the proper specification of impact estimates from CRTs is an evolving field.

Estimation of treatment effects at the level of the cluster that was randomized is the most
appropriate method (Donner & Klar, 2000; Murray, 1998; Raudenbush, 1997). When the
number of clusters is small, though, this strategy will not be efficient and will lack necessary
statistical power. If clustering is simply ignored and the analysis is done at the level of the
individual student, this will create the illusion that statistical power has been substantially
increased. However, these standard tests of statistical significance, which assume that the
outcome for an individual is completely unrelated (or independent) to that for any other student,
are inappropriate for CRTs. This is the case because in CRTs two students randomized together
within any one classroom or school are more likely to respond in a similar manner than two
students randomized from different clusters. If one computed the standard errors for a CRT as if
individuals had been randomized, the outcome would understate the true standard errors
substantially, thereby lending a false sense of confidence in the impact estimates. As Cornfield
(1978, p. 101) noted: “Randomization by group accompanied by an analysis appropriate to
randomization by individual is an exercise in self-deception.”

A relatively recently proposed analytical strategy for the analysis of CRTs is the use of a
hierarchical linear model (Raudenbush, 1997). In this formulation, one may simultaneously
account for both student and classroom-level sources of variability in the outcome by specifying
a 2 level hierarchical model that estimates the classroom-level effect of random assignment. Our

© ΣndVision Research & Evaluation, Logan UT


43

fully specified level 1, or within-cluster model nested students within classrooms with a single
covariate, the achievement or attitudinal pretest measure. The linear model for this level of the
analysis is written as

Yij = β0j + β1j(PRETEST)ij + rij,

which represents the post-intervention outcome for student i in classroom j regressed on the fall
pretest. The term rij is the level-1 residual variance that remained unexplained after accounting
for the pretest scores of the students.

In this model, each student’s pre-intervention score was centered around its within-classroom
group mean and the effect of the pretest was modeled as a source of random variation at level 2
of the model. In this way, at level 2 we estimated classroom-level treatment effects of
McDougal Littell Life Science on two outcomes: the mean posttest achievement outcome in
classroom j and the pretest-posttest slope in classroom j.

This level 2 model, thus, allowed us to estimate both the overall and compensatory effects of
assignment to McDougal Littell Life Science. That is, the model answered the questions: did
assignment to McDougal Littell Life Science affect the overall classroom-level posttest
achievement and attitudinal means and did program assignment attenuate the classroom-specific
relationships between the pretest and posttest measures? As suggested by the work of Bloom,
Bos, and Lee (1999) and Raudenbush (1997), our models included a classroom-level covariate,
the mean baseline pre-intervention science achievement or attitudinal score, to help reduce the
unexplained variance in the outcome and to improve the power and precision of our treatment
effect estimates. The fully specified level 2 model was written as

β 0j = γ00 + γ01(MEAN_PRE)j + γ02(TREATMENT)j + u0j,

β 1j = γ10 + γ11(TREATMENT)j + u1j,

where the mean posttest intercept for classroom j, β0j, is regressed on the classroom-level mean
pretest score, the treatment indicator, plus a residual, u0j. The pretest-posttest slope, β1j, is
predicted by the treatment indicator, plus the residual, u1j.1

1
The statistical precision of the design can be expressed in terms of a minimum detectable effect, or the smallest
treatment effect that can be detected with confidence. As Bloom (2005) noted, this parameter, which is a multiple of
the impact estimator’s standard error, depends on: whether a one- or two-tailed test of statistical significance is used;
the α level of statistical significance to which the result of the significance test will be compared; the desired
statistical power, 1 - β; and the number of degrees of freedom of the test, which equals the number of clusters, J,
minus 2 (assuming a two-group experimental design and no covariates).
The minimum detectable effect for our design is calculated for a two-tailed t-test, α level of p < .05, power, 1 - β,
equal to 0.80, and degrees of freedom equal to J = 29 classrooms minus 3 (a two-group experimental design with the
classroom mean pretest covariate). Referring to Tables 2 and 3 for the McDougal Littell Life Science impact
estimators’ standard errors, which range from .23 to .46, and employing Bloom’s (2005) minimum detectable effect
multiplier, we calculated minimum detectable effects of approximately δ = 0.67 to δ = 1.33. That is, our design had
adequate power to detect school-level treatment-control differences of at least 0.67 to 1.33 standard deviations.

© ΣndVision Research & Evaluation, Logan UT


44

We specified this multilevel model for six outcomes: (1) science achievement, (2) perceived
science ability and confidence (Ability), (3) interest and enjoyment in science (Interest), (4) the
belief that mathematics contributes to personal growth (Personal Growth), (5) the belief that
mathematics contributes to career success (Utility), and (6) the total science attitudinal composite
of the Ability, Interest, Personal Growth, and Utility measures. These analyses provide
intention-to-treat (ITT) estimates of the program effect. That is, regardless of the actual quality
of implementation of McDougal Littell Life Science, these analyses compared all classrooms
assigned to the experimental condition—those who were intended to receive the treatment—to
all those assigned to the comparison condition. Thus, with variability in implementation across
classrooms, the ITT effect represents only the impact of assignment to the treatment. This ITT
effect also provides a reasonable estimate of the overall effects on science achievement and
attitudes that can be expected in the field when teachers implement the program with the typical
variability in quality.

Analysis of Research Questions

Differences in Achievement between Groups

To what extent do students in the classrooms using McDougal Littell Life Science perform better
by demonstrating a higher level of proficiency on curriculum-based assessments?

The outcome variable in this analysis was the science posttest score (in percent-correct form) on
sections A and B of the science content test. Independent variables at the student level included
student pretest scores, a variable indicating whether or not the student was female, and a variable
indicating whether or not the student was classified by the teacher as a member of a minority
group. Independent variables at the classroom level included the treatment group (McDougal
Littell or comparison), a composite variable representing each teacher’s level of expertise, and a
composite variable representing the level of school poverty. The effective teaching practices
variable was created by combining standardized variables reflecting the classroom observation
rating of use of effective teaching practices, a composite score from items taken from the
curriculum implementation checklist completed by the teacher, and years of teaching experience.
The school poverty variable was created from data on each school’s demographics. Both
classroom level variables were standardized so as to have means of 0.

Table 11 displays pretest and posttest group means and standard deviations for science content
tests administered to students. McDougal Littell Life Science appeared to be associated with a
positive impact on student achievement beyond the impact associated with the comparison
condition. From pretest to posttest, students in treatment classrooms experienced an average
improvement of 12.9 points, or a pre-post effect size of .81 standard deviations. This compares
to an average gain of 9.3 points for students in the comparison classrooms, or a pre-post effect
size of .57 standard deviations. The difference between groups in pre-post gains was 3.6 points,
or an unconditional between-groups standardized mean difference effect size of .27 standard
deviations. These numbers represent the effect sizes of the treatment without statistical
adjustment for the effects of clustering and covariates (as will be discussed in the next
paragraph). As such they are undoubtedly subject to some degree of statistical bias. We include
them here not as a precise estimator of impact but for the purpose of completeness. Effect sizes
obtained from the HLM analysis are, of course, less subject to statistical bias.

© ΣndVision Research & Evaluation, Logan UT


45

Table 11. Pretest and Posttest Means and Standard Deviations for Science Test Scores
Means Standard Deviations
Treatment Comparison Treatment Comparison
Pretest 43.60 44.16 15.89 16.46
Posttest 56.53 53.50 20.91 19.14
Gain 12.93 9.34 17.23 13.92
Difference in gains 3.59
Effect sizes for gains 0.81 0.57
Effect size for group difference in gains 0.27
Note: Means reported in this table are unadjusted for variation in other factors.

The first multilevel model, shown in the left columns of Table 12, assessed student and
classroom-level effects on the science achievement posttest. Most important, the treatment
assignment indicator, which was modeled as a predictor of classroom mean achievement and the
within-classroom pretest achievement slope, shows no statistically significant classroom-level
effects of assignment to the program. The effects were in the hypothesized direction, classrooms
assigned to use McDougal Littell Life Science had higher mean posttest outcomes and the
program helped close the gaps between higher and lower achievers. The magnitude of the effect,
δ = .15, was of some practical significance, but it was not large enough to achieve statistical
significance given the current design, which included a total of only 29 classrooms.2

Differences in Attitudes between Groups

To what extent do students in classrooms using McDougal Littell Life Science perform better by
exhibiting more positive attitudes than their peers using other curriculum?

Science Attitudes Composite. The next multilevel model, shown in the right columns of Table
12, provides the student and classroom-level results for the science attitudes composite measure.
In this case, the treatment assignment indicator modeled as a predictor of classroom mean
achievement was not statistically significant. The treatment effect on the within-classroom
pretest slope, however, was of statistical significance (p < .10).

This outcome suggests that McDougal Littell Life Science equalized attitudinal outcomes among
students who entered the classrooms at baseline with better and worse attitudes towards science.
The coefficient for the intercept for the pretest slope outcome suggested that in a typical
classroom a one standard deviation increase in the baseline science attitudes score was associated
with an increase of nearly one third of one standard deviation, or 0.31, on the posttest attitudinal
measure. In other words, in classrooms in which students started out with better attitudes toward
science they also tended to end up at posttest with better attitudes and students with worse
attitudes at pretest also tended to end up with worse attitudes at posttest. Assignment to group
using McDougal Littell Life Science, though, cut this relationship in half. In this way, McDougal

2
The effect size was calculated by dividing the McDougal Littell treatment coefficient for science achievement of
0.29, shown in Table 12, by the student-level posttest standard deviation for the control group. In this way, the
effect size represents the magnitude of the program impact on the posttest measure for the average student.

© ΣndVision Research & Evaluation, Logan UT


46

Littell Life Science served to minimize those inequalities within classrooms that were related to
students’ baseline attitudinal outcomes.

Table 12. Summary of Multilevel Models Predicting Science Achievement and Science
Attitudes Composite

Science Achievement Science Attitudes Composite


Fixed Effect Effect SE t Effect SE t

Classroom mean outcome


Intercept 0.06 0.13 0.49 -0.04 0.18 -0.21
Treatment 0.29 0.27 1.08 0.12 0.37 0.32
Mean classroom pretest 0.79 0.10 7.58** 0.25 0.18 1.42

Pretest slope
Intercept 0.57 0.04 16.36** 0.31 0.04 7.87**
Treatment -0.05 0.07 -0.71 -0.15 0.08 -1.99+

Random Effect Estimate χ2 df Estimate χ2 df


Classroom mean outcome 0.48 203.99** 26 0.65 70.02** 26
Pretest slope 0.01 34.56* 27 0.00 25.03 27
Within-classroom variation 1.48 7.77

Note: + p < .10; * p < .05; ** p < .001.

Ability, Personal Growth, Utility, and Interest Attitudinal Outcomes. The multilevel results
for the first of the four individual attitudinal outcomes are summarized in the left columns of
Table 13. First, the treatment assignment indicator modeled as a predictor of classroom mean
achievement and the within-classroom pretest achievement slope revealed no statistically reliable
classroom-level effects of assignment to the program on either outcome. This was the consistent
result across all attitudinal outcomes. In all cases, the treatment effect was in the hypothesized
direction, classrooms assigned to the McDougal Littell Life Science program had higher mean
posttest outcomes, but the magnitudes of these effects were too small or their standard errors
were too large for the treatment coefficients to achieve statistical significance. The magnitude of
the treatment effect was largest for the personal growth outcome with an effect size equivalent to
δ = 0.11.

© ΣndVision Research & Evaluation, Logan UT


47

Table 13. Summary of Multilevel Models Predicting Science Attitudinal Outcomes

Ability Personal Growth Utility Interest


Fixed Effect Effect SE t Effect SE t Effect SE t Effect SE t
Classroom mean attitude
Intercept 16.11** 0.15 105.06 18.74** 0.23 80.07 16.47** 0.18 92.66 11.05** 0.11 101.28
Treatment 0.02 0.27 0.07 0.52 0.46 1.12 0.13 0.39 0.33 0.01 0.23 0.02
Mean classroom pretest -0.04 0.19 -0.21 0.51* 0.17 3.02 0.32 0.22 1.45 0.03 0.09 0.30

Pretest slope
Intercept 0.03 0.04 0.70 0.46** 0.05 9.31 0.21** 0.04 5.03 0.02 0.02 1.07
Treatment -0.13 0.07 -1.74 0.15 0.10 1.54 -0.06 0.08 0.76 -0.01 0.04 -0.32

Estimate χ2 df Estimate χ2 df Estimate χ2 df Estimate χ2 df


Random Effect
Classroom mean attitude 0.36* 47.67 26 0.95** 56.45 26 0.54** 57.25 26 0.20* 52.89 26
Pretest slope 0.01 29.95 27 0.03* 46.21 27 0.00 33.12 27 0.00 26.18 27
Within-classroom variation 8.66 16.97 9.25 3.84

Note: * p < .05; ** p < .001.

© ΣndVision Research & Evaluation, Logan UT


48

Causal Effects of Implementation Quality

The measure of implementation quality quantified observed variation across program and
comparison classrooms in terms of the teachers’ uses of research-based instructional practices
and curricular materials that were consistent with the general approach outlined by McDougal
Littell Life Science. The composite variable included a classroom observation rating of use of
effective teaching practices, a measure of the level of implementation of the science curriculum
as reported by teachers (which includes comparison group teachers), and the number of years of
teaching experience. The classroom observation measure of use of effective teaching practices
includes factors such as use of questioning and level of interaction with students to check
understanding, level of feedback and positive behavior management techniques, level of
engagement of students, use of effective teaching practices such as review, introduction of new
content, guided practice, and independent practice, and other factors that affect opportunities to
learn. As demonstrated by the boxplots displayed in Figure 4, which shows the distribution of
implementation quality by treatment group (treatment = 1, comparison = 0), there was some
variability across teachers in implementation quality. The dark line going through the box
represents the median implementation quality score for each group, the box itself defines the
interquartile range for each group, and the whiskers extending out from the box represent the
range of outlying scores that fall outside of the interquartile range.

In addition to the variability in implementation scores shown in Figure 4, it is apparent that the
median value, 0.42, for the treatment group is higher than the median value, -0.32, for the
comparison group. Indeed, a t-test comparing the treatment teacher group mean (M = 0.58, SD =
0.76) to the comparison teacher group mean (M = 0.58, SD = 0.76) on the implementation
measure revealed a statistically significant teacher-level impact of assignment to the treatment
condition, in which teachers implemented McDougal Littell Life Science, t(27) = -3.57, p < .001.
That is, teachers who were randomized into the treatment condition implemented research-based
instructional practices and curricular materials with greater quality and consistency than
comparison teachers.

Additional analyses specifically addressed program implementation and how it may influence
achievement and attitudinal outcomes. These analyses utilized an instrumental variables
approach for estimating the treatment effect for the treated (Angrist, Imbens, & Rubins, 1996).
In this formulation, assignment to the treatment group, where teachers implemented McDougal
Littell Life Science, was modeled as an instrument for actual implementation quality. The
second-stage, or outcome, equation was represented by models that included the composite
measure of implementation quality. Random assignment to the program was correlated with
implementation quality, but it was not correlated with the error term in the outcome equation
because it was determined randomly. Under reasonable assumptions, the instrumental variables
model yields a consistent estimate of the effect of “treatment on the treated.”

Table 14 summarizes the outcomes from a series of two-stage least squares regression models
estimated with Huber-White robust standard errors, which corrected for the clustering associated
with sampling and randomizing students from within the 29 classroom clusters. In these models,
assignment to treatment was modeled as an instrument for actual quality of implementation, with
quality of implementation measured by a composite variable incorporating measures of both

© ΣndVision Research & Evaluation, Logan UT


49

teachers’ use of research-based instructional practices and their implementation of curricular


materials that were consistent with the strategies found in McDougal Littell Life Science. We
employed the same pretest covariates as used in the previous multi-level models for the
intention-to-treat effects. For science achievement, the results revealed a small positive
coefficient of 0.29 for implementation quality. This effect associated with a one-unit increase in
implementation quality was equivalent to an effect size of δ = 0.15. However, neither this
coefficient nor any of the other coefficients for implementation quality attained conventional
levels of statistical significance. Therefore, although assignment to McDougal Littell Life
Science had a positive and statistically significant effect on teachers’ uses of research-based
instructional practices and curricular materials, this effect did not immediately translate into
large and statistically significant effects on the student outcomes.

2.00000

1.00000
Implementation Score

0.00000

-1.00000

-2.00000

0 1
Treatment

Figure 4. Boxplot of Implementation Scores for Treatment and Control Teachers

© ΣndVision Research & Evaluation, Logan UT


50

Table 14. Summary of Two-Stage Least Squares Regression Estimates of the Causal Effects of Implementation Quality on Science
Achievement and Attitudinal Outcomes

Science Science Ability Personal Utility Interest


Achievement Attitudes Growth
Composite

Constant -0.04 -0.04 16.69*** 7.87* 10.94* 10.55***


(0.15) (0.29) (3.48) (3.60) (4.33) (1.24)

Pretest 0.80*** 0.32 -0.03 0.60** 0.35 0.05


(0.10) (0.23) (0.23) (0.20) (0.27) (0.12)

Implementation 0.29 0.04 -0.01 0.11 0.10 0.04


quality (0.19) (0.19) (0.19) (0.39) (0.32) (0.17)

R2 0.18 0.01 0.00 0.03 0.01 0.00

Note. ***p < .001; ** p < .01; * p < .05; robust SEs in parentheses.

© ΣndVision Research & Evaluation, Logan UT


51

Differences in Treatment Effects on Achievement Attributable to Classroom Context—


Closing the Gaps

To what extent are achievement gaps reduced among traditionally underrepresented groups
(e.g., females, students living in poverty, underrepresented minorities)?

The final modeling of treatment effects involved an exploratory analysis of potential differences
across classrooms in treatment effects. In the prior models, the most important predictor of the
outcomes was the context of the classroom, as measured by the students’ baseline academic
performance and attitudes. That is, classrooms with higher baseline outcomes also tended to
achieve higher posttest outcomes. Using a regression trees approach implemented by
Generalized Unbiased Interaction Detection and Estimation (GUIDE), we constructed a
piecewise linear estimate of a regression function constructed by recursively partitioning the data
and sample space (Loh, 2002).

The piecewise-linear model fitted here using the regression tree approach, as implemented by
GUIDE, had two continuous predictors: student scores on the science achievement pretest and
classroom mean scores on the science achievement pretest and the categorical treatment
predictor. These analyses, like the earlier analyses using multi-level models, suggested that the
classroom mean score on science achievement was the most important predictor of the posttest.
In addition, as summarized in Figure 5, the model suggested that classroom mean achievement
was an important contextual factor for understanding differences in the treatment effect.

Assignment to McDougall Littell Life Science had a strong positive impact on students who came
from classes with the lowest average scores on the science achievement pretest. Specifically,
among the 5 classrooms with classroom mean achievement pretests of less than or equal to -1.07,
there was a statistically significant treatment effect, t(119) = 9.50, p < .001, favoring McDougal
Littell Life Science participants. This model and the results are summarized in node 2 (labeled
N2) of Figure 5. This group consisted of 75 treatment students from 3 classrooms and 47
comparison students from 2 classrooms.

Therefore, along with the evidence from the multi-level model of treatment effects for the
science attitudes composite, this model provided further evidence suggesting a compensatory
effect of McDougal Littell Life Science. In the multi-level analysis with science attitudes as the
outcome, the program narrowed the attitudinal gaps within classrooms between those who
started the intervention at baseline with higher and lower pretest scores. The regression trees
analysis provided an indication that the neediest classrooms, as indicated by the lowest
classroom mean pretest scores, benefited the most from McDougal Littell Life Science relative to
similar initially low-scoring comparison classrooms.

© ΣndVision Research & Evaluation, Logan UT


52

Figure 5. McDougal Littell Randomization Study: Regression Trees Result Using GUIDE
GUIDE stands for "Generalized Unbiased Interaction Detection and Estimation"

A case goes to the left node if and only if the condition is satisfied
Node1
classmzpre <=
-1.070

This group contains 75 students in the treatment group and 47 in the comparison from five classrooms

N2 N3 classmzpre
<= -0.565
Post-test-mean
= -0.671

Y= (-1.70 + 7.29*pre-test +5.08*Treatment - 8.59*class_m pretest)*I(class_m pretest <= -1.07)

The treatment variable is highly significant with t=9.50,upper tail p=0

N6
N7 classmzpre
Post-test-mean <= 0.550
= -0.465

N14 classmzpre N15 classmzpre


<= 0.285 <= 1.030

N28 N29 N31


N30

Post-test-mean Post-test-mean Post-test-mean


Post-test-mean
= -0.640 = 0.256 = 1.326
= 2.465
© ΣndVision Research & Evaluation, Logan UT
53

Use and Impact of Notetaking

To what degree do teachers encourage students to use Notetaking Guides?


To what degree does notetaking promote higher assessment scores and increase student
achievement?

Eighty-five percent of the teachers in the treatment group used the Notetaking Guides in teaching
lessons at least 2-3 times per week, and the other 15% used the Notetaking Guides sometimes.
Because of this mostly high level of use, we were unable to determine the impact of notetaking
separately from the rest of the program. Additionally, we had requested beginning-of-study and
end-of-study homework and notetaking samples from target classrooms. However, most
teachers did not assign homework and few teachers required students to take notes separately
from Notetaking Guides. When teachers provided us with samples of students’ notetaking, many
treatment teachers copied pages from the Notetaking Guides, as students did not take notes
beyond those. Because of this, we could not determine if students’ notetaking skills improved
over the course of the study.

One teacher refused to use the Notetaking Guides because they were provided only for students
in the target classroom; all other teachers encouraged students in their classrooms to use the
Notetaking Guides regularly. In addition, one teacher in the comparison group had a similar
reading study guide that she used with the students in her life science classes. Teachers used the
Notetaking Guides in a variety of ways. Some teachers asked students to complete the
Notetaking Guides independently either in class or at home using the textbooks as a reference.
Some teachers had students copy information from the notetaking transparencies directly into
Notetaking Guides with some or little discussion or demonstration of concepts. Others used the
notetaking transparencies and other resources to introduce new concepts while constantly asking
questions to check student understanding, with the students completing the Notetaking Guides as
the concepts were covered.

However, almost all teachers using McDougal Littell Life Science commented on their use of the
Notetaking Guides and the positive impact of the notetaking on students’ understanding of the
science concepts. Many expressed regret at only having the Notetaking Guides for their target
classrooms and claimed the notetaking was one of the strongest parts of the program in helping
students learn, as expressed in the following comments:

• “The Notetaking Guide [was the most effective part of the program]. … This allowed
students to take a multitude of notes in a myriad of different ways. It also allowed me as a
teacher to ask multi-leveled questions that went beyond memorization.”
• “The text and notetaking skills workbook form the ground work for everything else we
do in class. All of our projects and labs revolve around the notes we take serving as
previous knowledge and giving us the background information to make observations and
predictions.”
• “I like the notetaking and voc. strategies that the students learn because I feel like we
kind of fail our students in the area of study skills and these really seem to help.”
54

• “The students also liked the Notetaking handbook. It really helped them to see the
different ways notes can be taken.”
• “I had a student write me a thank you note thanking me for teaching students how to read
and take notes on what they read.”

Use and Impact of Technology

To what extent is technology used to enhance the curricula and instruction?


To what degree does technology use promote higher assessment scores, enhance student
engagement, and increase student achievement?

Most teachers in the study used technology to plan lessons and teach, so it was not possible with
this study to determine a differential impact on students from use of technology. Most teachers
using McDougal Littell Life Science used some of the technology resources in both their lesson
planning and teaching, as shown in Table 15. The PowerPoint presentations, EasyPlanner
software, and ClassZone.com website were used most often, with over one third of the teachers
using these three resources several times per week. The e-Edition of the program was used the
least frequently, although almost 70% of the teachers did not use the PowerPoint Presentations to
teach. In response to journaling emails, many teachers said that ClassZone.com and the SciLinks
websites were two of their favorite resources and that they helped hold students’ interest, while
the TestGenerator evoked the greatest frustration. The following comments and tables provide
additional information concerning technology use by treatment teachers.

• “My greatest success in using the [McDougal Littell Life Science] program was learning
how to incorporate technology into my teaching. ClassZone is absolutely FABULOUS!
My students liked every activity we did using the materials supplied by the web site.
They especially loved the one on building a prairie. “
• “My students thoroughly enjoy class when I use ClassZone.com”
• “I really enjoyed having the students use ClassZone.com for ecology and cells. It helped
reinforce the concepts that I was presenting. It also helped those students who are visual
learners.”
• “The ClassZone website” was one of my favorite parts of the McDougal Littell program.
• “The lab and test generators were difficult to operate. Just getting to the point where you
could actually edit the questions was difficult.”
• “I didn't like the test-making program. It was cumbersome to use and had too many
bugs.”
55

Table 15. Use of Technology and Lab Activities by Treatment Teachers

Use in lesson planning Use in class


2-3 2-3
Do not Some- Do not Some-
Never times/ Daily Never times/ Daily
have times have times
week week
Scientific American
0% 39% 62% 0% 0% 0% 31% 69% 0% 0%
Frontiers
Video Guide 0% 39% 62% 0% 0% 0% 39% 62% 0% 0%
EasyPlanner Plus CD 0% 23% 46% 31% 0% 0% 46% 39% 15% 0%
Content Review CD 0% 23% 62% 15% 0% 0% 54% 31% 15% 0%
PowerPoint Presentations
0% 46% 23% 31% 0% 0% 69% 8% 23% 0%
CD
e-Edition CD 0% 70% 30% 0% 0% 0% 100% 0% 0% 0%
ClassZone.com 0% 8% 54% 39% 0% 0% 15% 54% 31% 0%
SciLinks 0% 23% 54% 23% 0% 0% 23% 54% 23% 0%

More
2-3 3-5
Never Once than 5
Times Times
Times
Approximately how many times have you used the Test Generator
15% 15% 15% 54% 0%
CD-ROM to create a test?

Do not have Never Sometimes Every Test

If you used it, how often did you edit the test questions? 0% 8% 42% 50%

Once per Every 2 Once per 2/3 Times


Never
month Weeks Week per Week
How often do you have students do a lab activity? 0% 8% 8% 69% 15%

Do not have Never Sometimes Every Lab


How often do you use labs from the CD-ROM? 0% 15% 77% 8%
How often do you edit the labs? 0% 39% 46% 15%
How often have you used the Process/Lab Skills Manual? 0% 39% 62% 0%
56

Implementation of the Program

In considering the degree and quality of classroom implementation of the curricula, to what
extent do teachers follow the program and teachers’ guide?
• To what extent do support materials assist teachers in using the curricula?
• To what extent do support materials assist teachers in implementing notetaking?
• To what extent do the technology-based support materials assist teachers in
implementing the curricula?
What are the relationships between teacher characteristics and degree of implementation?
What contextual factors promote or hinder successful implementation?

All teachers in the treatment group used the program, although the levels of implementation
varied. Over 2/3 of the teachers used the text in their daily lesson planning, with over 90% using
it for lesson planning several times per week. Almost 40% of the students used the textbook
daily, while almost 80% used the texts and the Notetaking Guides at least 2-3 times per week.
Approximately 40% of the teachers reported regularly (e.g., 2-3 times per week or daily) using
the 3-minute warm-ups, the notetaking transparencies, the “Big Ideas” flow charts, graphic
organizers, and the chapter teaching visuals. Over 36% of the teachers practiced vocabulary
regularly with their students and over 23% used the vocabulary scaffolding regularly. Many of
the teachers used the vocabulary strategies (e.g., word pyramids, word wheels, four square
diagrams) to teach word meanings, and we saw substantial evidence of use of the vocabulary
strategies by students on the classroom walls, in their notetaking, and in discussions with
students. Every treatment teacher mentioned the vocabulary strategies as the most useful or their
favorite part of the program, and most reported that the notetaking strategies were one of the
most useful parts of the program for students.

While most of the treatment teachers reported using most of the materials, and they reported
frequent use of the vocabulary and notetaking strategies, other instructional strategies seemed to
be used less often. For example, in our classroom observations, we most often saw use of the
Notetaking Guides, vocabulary strategies, graphic or advance organizers, big ideas and key
concepts, but we observed the following less often: 3-minute warm-ups, questions embedded in
textbook sections or other questioning strategies, skill checking activities, setting objectives,
homework, cooperative learning, or differentiated instruction. While some teachers used 3-
mintue warm-ups or other strategies to check for understanding, presented new content using
graphic organizers, asked frequent questions to check for understanding, and provided guided
practice in applying a concept (e.g., cell division), other teachers asked students to independently
complete the Notetaking Guides or end-of-section questions using the textbook—and this
activity took up the entire class period.

Overall, we would rate 12 of 14 treatment teachers fairly high on use of the materials (e.g.,
Notetaking Guides, technology, transparencies), but lower on the intentional use of some of the
more valuable instructional strategies embedded within the program (e.g., chapter/section
headers for outlining, questions embedded in text or at ends of sections to check for
understanding, other curriculum-embedded assessments). Those teachers who used the materials
most effectively also typically used more of the instructional strategies effectively. However,
57

teacher characteristics were not strongly related to their level of use of the materials or their use
of instructional strategies.

Several contextual factors hindered implementation of McDougal Littell Life Science. First,
teachers did not seem to understand the importance of the research-based instructional strategies
upon which the program was based. They did not typically use the beginning-of-class reviews,
questions embedded in the text to help check for student understanding, or frequent assessments
to check student progress. Some teachers infrequently used strategies to review and practice
vocabulary, differentiated instruction, or materials for English Language Learners. They would
benefit from not only knowing what materials are included in McDougal Littell Science, but
from learning about the instructional strategies upon which the program was developed, and the
reasoning behind those instructional strategies. The in-service focused almost exclusively on use
of the materials; yet, the instructional strategies and the research upon which the program was
developed form the greatest strengths of the program.

Secondly, some districts agreed to participate in the study just prior to the start of the school
year. Because of this, teachers had little time to prepare for using the program, which has many
components and materials, and they expressed frustration at not having more time and training to
learn how to use them.

Next, some of the teachers had limited access to technology. One district did not allow teachers
to install software on either the classroom or the lab computers, so they could not use the
computer-based resources unless they brought a laptop from home or strictly used the websites.
Other teachers had infrequently used technology prior to their involvement in this study, and
although they increased their technology use, they did not become proficient or completely
comfortable.

Also, most of the teachers refused to give up their “tried and true” favorite activities during the
course of the study, and in particular, labs or hands-on activities. As one teacher commented,
“keep in mind that 2 years ago we didn’t have textbooks. We just got together as science
teachers and created our own materials. So in spite of McDougal Littell’s materials, I still use a
lot of my own stuff that has worked so well.” Teachers reported using the McDougal Littell labs
25-90% of the time, with most using McDougal Littell labs about ½ of the time.

Another barrier to fully implementing the program was that some schools were in states with
integrated standards in the 7th grade, or with ecology not included in the 7th grade standards.
Teachers in states with integrated standards rearranged their school-year lesson plan schedules to
cover “Cells and Heredity” and “Ecology” in the first half of the school year, but they expressed
numerous concerns about being able to sequence and cover material for the required standards in
7th grade. Although materials used for recruiting districts explained the units that needed to be
covered for the study, some district contacts did not report that Ecology was not included in the
state standards in 7th grade. As a result, teachers did agree to teach the unit but shortened the
amount of time allocated to it and were not enthusiastic about losing a week of instruction they
felt their students needed to do well on the state-mandated tests.
58

While the fire and hurricanes in one district, which resulted in shared classrooms and shortened
class periods, slowed the pacing of instruction, it did not seem to interfere substantially with the
implementation of the program. However, teachers did have to adjust the activities they used, as
it was difficult to complete a lab in a period just over 30 minutes long, and completing a lesson
that involved review, introduction of new content, and application of concepts was a challenge.

Finally, some teachers had such large numbers of students, or large numbers of students with
disabilities or low English proficiency, that they limited use of some activities in their
classrooms. For example, labs, cooperative activities, or activities which were not conducive to
students maintaining good behavior were rarely used. In these classrooms, more bookwork,
fewer hands-on activities, and less teacher-student interaction were observed.

Given the challenges to implementing McDougal Littell Life Science, teachers worked hard to
use the program and participate in the study. They were enthusiastic about the program and
expressed a great deal of satisfaction in the results they experienced with students. The
following comments reflect comments made by a number of teachers and provide an indication
of the level of satisfaction experienced by some teachers:

• “I thoroughly enjoy teaching this book. The teacher resources are great!”
• “Because I have not taught science for many years, McDougal Littell Science Curriculum
has been a godsend for me. Everything is so well organized and the wealth of resources is
outstanding. I like everything about the program.”
• “Using the McDougal Littell Curriculum has changed my method of teaching greatly.
Having taught for 35 years, I guess I was "old school." Being able to incorporate
technology into my teaching has opened up all kinds of possibilities for me. The lesson
plans and ideas in the teacher's edition have also helped me make my classroom more
learner friendly. My students seem to respond better when I utilize the suggestions and
activities suggested in the resources provided by McDougal Littell. On a scale of 1 to 5,
I would give the curriculum a rating of 5 as to how it has increased my knowledge of
science content. The materials and resources are my lifeline.”

Attitudes toward McDougal Littell Life Science

Overall, most treatment teachers’ attitudes towards McDougal Littell Life Science were
favorable. When asked to rate the program on a scale of 1 (low) to 10 (high), answers ranged
from 5 to 10, with most teachers responding 7, 8, or 9. While some teachers will continue to use
the program after the study ends, others will return to other programs because (a) not all teachers
in the district are using it, and the district did not purchase the program, so teachers will return to
the curriculum used by others in the district, (b) the state science standards include topics
unrelated to life science, so teachers will return to an integrated science text, (c) teachers prefer
to pull their own materials together using a variety of resources rather than relying on a single
program, and (c) teachers felt the textbook, reading study guides, and assessments were too
difficult for their 7th grade students.

On the other hand, the teachers who will continue to use McDougal Littell Life Science felt the
program was well-organized and appropriately covered content specified by the state standards,
included valuable resources that helped support their teaching and student learning, and
59

improved their teaching. In fact, one teacher felt the content was so well organized and
presented that the teacher uses the same materials for 8th grade classes—and supplements it to
meet criteria specified by 8th grade standards, as needed.

What Teachers Liked Most about McDougal Littell Life Science. When teachers were asked
what parts of McDougal Littell Life Science they like most or thought were most useful, the most
frequently named components were (a) the notetaking and vocabulary strategies, (b) the overall
organization of the text, including the Big Ideas and Key Concepts, (c) the three levels for
differentiated instruction, and (d) the additional resources that “made teaching easier.” A few
teachers noted the number of instructional strategies built into the program, such as the blue
questions scattered throughout the text that they used to check students’ comprehension of the
material. Some teachers said they appreciated the integrated materials, such as the “overlap of
the textbook, the CDs, the toolkit, and the reference manual along with the different levels that
were available. Together, it allowed for greater flexibility when teaching to the different learning
modalities and abilities of my students.” Other comments include the following:

• “The vocabulary and notetaking strategies were my favorite. I had a student comment to
me about a question that he was having trouble with on a recent quiz, and then he said,
but I remembered writing a “Frame Game” for this word and so I got the right answer.
Why was it my favorite part? Simple, it’s a strategy that works to help students learn.”

• “Initially, [the students] didn’t like [the vocabulary strategies] because it required more
work. But then they realized the value of them and it’s no longer a problem.”

• “The Notetaking Guide. As mentioned above. This allowed students to take a multitude
of notes in a myriad of different ways. It also allowed me as a teacher to ask multi-
leveled questions that went beyond memorization.”

• “My favorite part of the program would be the Teacher’s Resource book with the various
levels to meet the needs of all the SPE students I teach.”

• “My favorite part of the program was the chapter investigations. My students really
enjoyed the experiments. I especially liked the lab generator that allowed me to
customize the investigations.”

• “Suggested outlines for daily lesson plans. Saves me time & stress.”

The technology components were also frequently mentioned among those teachers for whom
technology was readily-accessible. Teachers used the technology to support teaching and help
students to make visual connections. One teacher said the CDs “made [the program] very easy
to transport, and the organization of the material was easy to navigate,” and another said

• “I really enjoyed having the students use ClassZone.com for ecology and cells. It helped
reinforce the concepts that I was presenting. It also helped those students who are visual
learners. We enjoyed the labs, too.”
60

What Teachers Thought Most Effective in Teaching Students. Teachers thought (a) the
notetaking and vocabulary strategies, (b) the organization of the content, including the Big Ideas
and Key Concepts, (c) the reading study guides, and (d) the labs were the most effective in
teaching students. The following comments explain why teachers thought these were important.

• “I think that the logical layout of the material made teaching very effective. Kids seem to
flourish when they know where they're going, and how they are going to get there.”

• “The organization of the material [helped students learn]. It gave students a mental grid
to be able to hold onto things.”

• “I believe that the "Big Idea" and "Key Concepts" were important tools for teaching that I
had never used before. I used to teach students how to skim for facts and how to outline,
but I never made a chapter poster emphasizing what all of them should have picked up
out of the chapter. I still have the last three chapters' facts on my walls so the students can
see/read what they have learned from the past several months.”

• “The Big Ideas and Key Concepts were helpful because they helped students to focus
their learning. It kept the important information in their mind. If the information was not
related to the big ideas or key concepts, I told them to skip it.”

• “The vocabulary and notetaking strategies was the most effective. Not only did it make
the students think about the vocabulary and notes in a different way, but in my class, after
the students had finished working on the vocabulary and notes, I would ask them to
present and explain their work to the class by displaying their notes via a FlexCam and
large screen TV. This method gave the students’ self-esteem a boost, as well as pride and
ownership in their work.”

• “I feel the Study Guides are very effective in developing their ability to read in the
content area. I spend a lot of time teaching them how to identify main ideas in
paragraphs. Also, the labs are great for teaching difficult concepts. “ Seeing is
Believing!” It was exciting for me to see some of my Spec. Ed. students getting so
excited about removing DNA from split peas, wheat germ, and strawberries.”

• “I used the study guide for all of the chapters. But the study guide was both good and
bad. It was good in that it made students “dig” and find a connection. It was bad in that
a lot of students are used to being spoon fed and this material doesn’t let students get by
with answering the questions without understanding the information. Unfortunately, it’s
a reflection on our fast paced society today. Parents tell students to hurry and get their
homework done so they can get to their soccer practice or music lessons or what have
you. When students complained that they couldn’t get it done in time, parents attempted
to help but they too didn’t want to spend the time or effort digging for understanding to
answer the questions. “
61

• “The most effective part of the program was the use of the visuals and the questions in
the teacher's edition that helped the students interpret the information in the visuals. My
students also responded well to the organization of the material in each section.”

What Teachers Said Students Liked Best. Teachers said students most liked (a) the labs and
classroom investigations, (b) the technology, particularly ClassZone.com, and (c) the strategies
that helped students learn, such as the vocabulary strategies. For example, teachers had the
following to say about the components they thought students liked most:

• “The students like the investigations and the labs. They were easy to understand for the
most part and something that could be done either in class or at home. In fact, several
were assigned as homework or bonus assignments to be done at home.”

• “Demonstrations, lab activities, ClassZone.com. They like being engaged in hands-on


learning.”

• “I believe that my students like the format of the book, the colorful visuals, and the
reading tips.”

• “They liked the colorful photos and the labs. It was current information and very visually
interesting. Some students claim that they are learning more in this book because it is so
up-to-date.”

• “Many liked and used the ClassZone.com web site often.”

• “My students thoroughly loved all the labs available. When I didn’t have lab equipment
to perform a specific lab, I used ClassZone.com. They thoroughly enjoyed reviewing
science concepts before a test at this site.”

• “They enjoyed the PowerPoint presentations and using the online feature of the program.
Interactivity is the way to go with the MTV generation.”

• “The students also liked the Notetaking handbook. It really helped them to see the
different ways notes can be taken.”

• “Learning the notetaking and vocabulary techniques, because they help them organize a
large amount of information.”

What Students Liked Best About McDougal Littell Life Science. Surprisingly, students in
every focus group mentioned how much they liked the organization of the materials. Some
students said they liked how the textbook “states the purpose. It’s easier to read because
headings let me know what I’m supposed to pay attention to, and it’s well organized so we know
where info fits.” The students then proceeded to show the researchers what they meant—and
identified Big Ideas, Key Concepts, chapter and section headings, and other organizational
structures that helped them learn. The students also named the pictures and visuals, the labs, and
the use of technology—particularly the simulations and practice tests at ClassZone.com.
62

Teachers who liked and effectively used the materials had students who liked the materials and
who understood how to best use materials and organizational structures for their learning.
Approximately ½ of students had tried ClassZone at home (90% had computer access), about
10% said used ClassZone at home “regularly”. However, students in all focus groups agreed that
answers to questions were too hard to find in the text, and they expressed frustration at the
“mismatch” between worksheets and the text.

Teachers’ Least Liked Components. Teachers cited many things over the course of the study
that caused them frustration or that they did not like. In addition to expressing frustration over
the late date at which they had received materials and the difficulties a late start caused in
preparing, some teachers were daunted by the amount of materials and resources that the
program included.

Additionally, there were specific components of McDougal Littell Life Science that teachers
disliked. Many listed problems with the technology, and particularly the lab and test generators.
Others were frustrated at all the assessment components, but particularly the test generator. A
few reported using only “A” level tests as the other levels were too difficult for their students.
Many of the teachers said their students struggled with the reading level of the text and the
amount of material each chapter included. Some teachers felt the reading study guides were
poorly written and did not match the textbook sufficiently, while other teachers expressed
dissatisfaction with the mismatch in editing between student and teacher editions. Page
numbering warranted enough frustration to be brought up by half of the treatment teachers—one
recommendation was to use page numbers like Ecology-129 rather than D129. In other words,
put the unit title in front of the page number. And while some teachers liked the PowerPoint
presentations and used them regularly, others did not use them at all—they felt the content of the
presentations was too specific (e.g., not an overview of a section) or too general (e.g., not enough
specific details). More specific comments from teachers follow.

• “Having only a couple days to try to get familiar with the material before beginning
instruction.”

• “Too much stuff and not getting to it all and being able to use it. It was however a ‘good’
frustration.”

• “My greatest frustration was not having enough time to do everything suggested in the
book. The program has so many excellent activities, demonstrations, etc. I need to work
on time management and better advanced planning for next year.”

• “Trying to maintain the pacing set forward for each unit. Some of the material was way
too in-depth. I wanted to cut some of the content, but then I couldn't use the testing and
support material provided.”

• “Textbook was intimidating—too much text compared to previous book and too few
pictures, fonts too small.”
63

• “Mistakes in the teacher/student edition, discrepancies between the student edition and
the teacher edition, made me feel like I couldn’t trust the teacher edition.”

• “There were times I felt the text was written in an awkward way that made the text hard
to understand. Often, too, I found some of the test questions were worded in an unclear
manner, making it more difficult to understand just what answer they were looking for.”

• “One of the biggest helps for a teacher is to provide shortcuts for the teacher. The
prepared PowerPoint presentations were a good idea (shortcut), but the actual information
contained in the presentations were not as detailed as I would like. So, I would go in and
customize the presentation (that’s good), but that just added more work for the teacher
(that’s bad). The same goes for the Chapter Tests. Format the tests so they can be used
with the “Scantron” bubble answer sheets. That’s a big shortcut for the teacher.”

• “The PowerPoint Presentations were not at all what I thought they would be. I assumed
they would be more wiz-bang to them and really allow for either a great introduction,
summation or ongoing review of the lesson.”

• “The lab and test generators were difficult to operate. Just getting to the point where you
could actually edit the questions was difficult. It was cumbersome to use and had too
many bugs.”

• “My least favorite part of the program was the assessment component. I tried to use some
parts of the section quizzes and chapter tests, but I usually ended up with my own
teacher-made quiz or test. My students had great difficulty with the tests.”

• “Once again, I don’t want to sound like a broken record, but the Assessments (Chapter
Tests in particular) are big problems for me. I find myself developing Section Quizzes,
and Chapter Tests from the Previewing Content notes at the beginning of the chapter.”

• “Reading study guides and tests were not user-friendly.”

• “The idea of having three different labs, three different study guides, three different tests,
and Spanish tests/study guides for the diverse reading levels were wonderful ideas, but
not terribly practical for large classes (I have a few classes over 32 now and will have a
class or two of 36 students in a few weeks when the schedules change at the semester
break). You cannot have students help to grade papers in class when there are such
differences, and taking them all home to grade (160 students x 3 page test or 2 page lab
or front/back of a study guide) requires many, many hours at home or very late evenings
after school just so I could get the tests back to the students the next day. Because I am
not a Spanish teacher or fluent in Spanish, it is difficult to ask the Spanish teachers here
to grade my Spanish science papers for me when they already have a six classes worth of
papers to grade. I could see using this idea with much smaller classes …”

• “Some diagrams have labels that cannot be found in the text or in the glossary (e.g.,
vesicles)”
64

• “There are several programs/folders for images that are inaccessible. I am not sure
whether it is my operating system - it is a MacOS 9.2.”

When teachers were asked what ONE change they would like to recommend to the publisher,
few teachers listed just one change, and the comments reflected those they listed for disliked
items.

• “I would ask them to either scale back their content to be more inline with the material
covered at both 7th and 8th grade, or they could do a better job with the interface used on
their CD materials. Using some of the CD material was far too frustrating to allow for
efficient use of prep time. I also found that my students have had difficulty finishing the
prepared tests in a 40-minute period.”

• “I’m sure you could guess from my previous answers that I would change the Chapter
Tests in the Assessment book.”

• “Bold print & yellow all important vocabulary – not just a selected few. Make the work
sheets match the book info. Also, don’t combine telophase and cytokinesis in one picture
and one process (p.A83). P.A20-24 Bold print & yellow highlight the following:
cytoplasm, endoplasmic reticulum, golgi app., vacuoles, lysosomes, (nuclear membrane
& nucleolus) not when mentioned.”

• “Please have the teacher's text match the student text for reading. I have sent previous e-
mails concerning the missing sentences, the different paragraphs, etc. between the two
texts. It is important when we are reading aloud and discussing what we read, and the
information is not the same.”

• “EDIT the teacher edition.”

• “Make sure that the teacher edition and axillaries have been proofread and edited to agree
and correspond with the student edition. Overall, I think it’s a good product.”

• “I think the Reading Study Guides could use some improvement—better alignment with
texts in terms of content and concepts, better edited to match text, focusing on important
information that can be easily retrieved from text—not just those obscure words buried in
paragraphs.”

• “Your computer programs. Make them more interactive for the students. Also, if they
could be networked so that I could load the programs on individual lap tops for the
students to do the chapter reviews. And fix the images portion so that they are better
accessed.”

Greatest Successes Experienced by Teachers. At various times throughout the study, we


asked teachers about the successes they were experiencing. The following comments illuminate
the importance of some of the successes they described in improving student learning.
65

• “I only had one student make a failing grade during our use of the M-L materials,
including every assignment that was graded during our use of the M-L materials in the
entire semester. My four science classes had a class average of 86 or higher—something
that has never happened.”

• “The increased literacy was the greatest success. Because graphic organizers were used
to help define vocabulary or concepts, the students needed to spend more time digesting
the material and trying to make a self-to-text or text-to-world connection. Our school has
reading classes for seventh graders which also teach the same way that the science study
guides try to teach....to make a personal connection so the material is easier to
comprehend.”

• “Students continuing to use the notetaking strategies even after they were not required to
do so.”

• “Introducing the lessons using the formats setup in the text. It was quick and easy and
best of all the kids got it. They knew what we were doing. Also, using the textbook was
not a chore. Even during class time using the text with the kids proved to be very
effective especially for "thorough" lessons where we are trying to get a deeper
understanding of what has been introduced but are not quite ready for review.”

• “Being able to use some of the differentiated instruction material to meet the needs of
individual students instead of using homogenized, cookie cutter programs for all
students.”

• “At our Middle School, our Special Ed students are all assigned to one team for the year.
This year, I have forty SPE students in my classes. It is so rewarding to see the
excitement in their eyes when they perform lab activities and understand the concept
being taught. Recently, some of them wanted to take specimens of DNA to other classes
to explain how the DNA was extracted from strawberries and show the example of DNA
under a magnifying glass.”

• “Seeing students take interest in concepts we covered...asking questions, inquiring,


wanting to know more!”

• “My greatest success in using the program was learning how to incorporate technology
into my teaching. ClassZone is absolutely FABULOUS! My students liked every activity
we did using the materials supplied by the web site. They especially loved the one on
building a prairie.”
66

Increasing Teachers’ Science Content Knowledge and Changing Teaching Practices. We


asked teachers to choose, on a scale of 1 (not at all) to 5 (a great deal), how much their science
knowledge had increased as a result of the curriculum they had used during this school year, and
we asked them to tell how they had changed their teaching this school year. For the teachers
who responded, the table below lists the extent to which they felt the programs increased their
science content knowledge.

Table 16. Changes in Teachers’ Content Knowledge

Content knowledge changed… Treatment Comparison

A great deal 1 teacher


Quite a lot 2 teachers
Somewhat 7 teachers 8 teachers
Not much 2 teachers 2 teachers
Not at all 1 teacher 2 teachers

Comparison teachers listed four main topics of change in their teaching: (a) use of Marzano’s
instructional strategies, (b) more hands-on or inquiry activities, (c) increased use of technology,
and (d) changes to their structure or pacing to accommodate diverse learners.

Treatment teachers listed a greater variety of more specific changes to their teaching, almost all
of which reflect materials and strategies included in McDougal Littell Life Science.

• “Using the McDougal Littell Curriculum has changed my method of teaching greatly.
Having taught for 35 years, I guess I was "old school." Being able to incorporate
technology into my teaching has opened up all kinds of possibilities for me. The lesson
plans and ideas in the teacher's edition have also helped me make my classroom more
learner friendly. My students seem to respond better when I utilize the suggestions and
activities suggested in the resources provided by McDougal Littell.”

• “I am such a better prepared teacher!! I have tools to reach such a wide variety of
students now. I am not creative--but I can take an idea and run. This has allowed me to
run wild!!”

• “First, I refer back to the "Big Idea" and the "Key Concepts" more often by using a poster
for each section of the chapter. I also keep the previous chapters "Big Idea" and "Key
Concepts" on my board. The previous chapters are taken off the poster and are just in the
corner of my white board. Since my school is emphasizing reading/literacy this year, the
idea of constantly referring to the main idea/key points helps the student make a
connection (text to self) which in turn increases their understanding/comprehension (WE
HOPE!). I told the reading teachers what I was doing, and they liked the publisher's way
of breaking down the material into smaller, more manageable key points. This is the first
time that I have used a poster to emphasize what students should be getting out of the
67

chapter and also the first time that I keep pointing out and referring back to what the main
ideas and key points are for each section/chapter. This visual should help students who
listen and pay attention, but who cannot read at grade level.”

• “I'm more conscious of bringing concepts back to Big Ideas and Unifying Principles to
help students frame things in their minds.”

• “I don’t know that the McDougal Littell Curriculum has ‘changed’ how I teach. I have
always been a hands-on teacher. I certainly think that this program has enhanced my
teaching. McDougal Littell has expanded my horizons to incorporate the latest
technology into my lesson plans. Never before have I had access to a website and
PowerPoint presentations. These added resources have made my teaching more
interesting. The visuals make teaching difficult science concepts more understandable to
all my students (especially Special Ed students).”

• “I have put much more responsibility on the students. In the past, I would give extensive
notes and discuss them with the students. Now I require much more reading, individual
note taking, and have increased my use of graphic organizers.”

• “I am letting the children pull more from their reading (notes) rather than just giving
them the notes myself.”

• “Before McDougal Littell, I didn’t do much with notetaking and vocabulary. Now, after
using McDougal Littell, I have learned how to use all these different notetaking and
vocabulary strategies.”

• “As a result of using the McDougal Littell Curriculum, I am more focused and better able
to infuse projects and labs.”

• “It has helped me focus my attention into a routine. I give vocabulary on Mondays, using
the strategies from the book, and reading assignments on Tuesdays again using the
strategies from the book. I give a reading or vocabulary worksheet on Wednesdays and
an activity from the book on Thursday and a quiz or test on Friday. It has worked great.”

• “I use graphic organizers more.”

• “I use more media and visual aids.”

• “I now am using more overhead transparencies and more internet references and web
sites, because they are readily available and useful.”

• “Some of the other teacher resources (editable lessons and labs) have provided shortcuts
and made teacher preparation easier.”

• “I'm implementing technology more frequently.”


68

Effectiveness of Professional Development

To what extent does professional development follow research-based conceptual models and
cover the content and pedagogy contained in the curricula?

As mentioned previously, some districts agreed to participate in the study just prior to the start of
school. Because of this, the in-service (a) was short in length, precluding sufficient time for
hands-on use of and practice with the materials, (b) occurred after school started leaving little
time for teachers to plan and prepare with a new program that included substantial materials
prior to facing students, and (c) focused mainly on use of the materials without sufficient
introduction to the research, theories, and instructional strategies that form the foundation of the
program. Most teachers wanted additional training, particularly in the use of the technology
components. However, 100% of the 13 treatment teachers who responded agreed that the in-
service helped them to implement McDougal Littell Life Science very successfully or
successfully.

Overall, these sessions covered a substantial amount of materials and concepts in a relatively
short time. The trainer was enthusiastic and approachable, and he moved at a reasonable pace
while pushing to complete activities listed on the agenda. He answered questions with sufficient
detail and monitored teachers’ confusion to provide additional help when necessary. His rapport
with the teachers was excellent, and teachers remained engaged and on-task throughout the
professional development time.

Technology use was a problem at each session attended by researchers. Software was not
installed in advance, and obtaining network passwords, installing software, and helping teachers
who were not technology proficient took time. The trainer was not fully able to solve technology
problems, so time allotted for technology was not always productive and some technology
components could not be demonstrated or used during the professional development time. For
example, in one district, the level of security in the computer lab prevented software from being
installed on computer hard drives, so teachers were only able to access web-based resources.
Because this same level of security was maintained on computers in their classrooms, some of
these teachers who did not use computers at home or who did not have their own laptop to use in
classrooms were unable to install software or regularly use technology resources.

Teachers were positive and enthusiastic about the materials, the level of organization, and the
number and variety of ancillary materials and resources. Teachers enjoyed the hands-on,
inquiry-based investigations and were excited about trying them out with their students. Many
positive comments about the types and variety of resources—including transparencies,
PowerPoint presentations, online simulations of concepts, differentiated instruction, support for
English language learners, the Notetaking Guide, and vocabulary supports—showed a high level
of excitement to use the resources and enthusiasm for participating in the study. Teachers
unanimously said they needed additional training and time to fully implement the program and
expressed concern that they needed to learn how to use so many resources so quickly. Most said
they would have preferred the professional development sessions to be held prior to the summer
break so they had additional time to prepare.
69

The professional development sessions focused mainly on the use and integration of materials
within the program. Although “Big Ideas” were used to focus some discussion, there was little
time allocated for practicing or applying instructional strategies other than a couple of short
inquiry-based or problem-solving activities. Teachers said that the most important pieces of the
in-service were (a) the notetaking and vocabulary strategies, (b) integrating the pieces to create a
lesson, and (c) the use of technology. Given the length of the in-service—as little as 6 hours at
one site—teachers received very little training on the research-based conceptual models, content,
and pedagogy contained in the program. Additionally, little time was devoted to practice or
application of those models, content, and pedagogy during the in-service. As a result, the in-
service was not sufficient to fully train the teachers on how to best use the program, integrate the
materials, and increase their use of research-based instructional strategies—as reflected in their
later comments. Additionally, while it was important that the teachers knew how to integrate the
materials and use the technology to implement the program, some of the most important
strategies designed into the program—such as the use of review, on-going assessment,
differentiated instruction, reinforcing effort and providing feedback, priming background
knowledge, and others—as well as the theories and research underlying some of these strategies,
received scant mention.
70

CONCLUSIONS

As a small-scale feasibility study, this study demonstrated that a randomized experimental design
could be successfully conducted in typical school settings. While the relatively small number of
participating classrooms limits statistical power, it was sufficient to show differences that
consistently favor the treatment group and that suggest McDougal Littell Life Science was more
effective than life science curriculum used in comparison classrooms.

An analysis was conducted using Hierarchical Linear Modeling to identify the unique impact of
McDougal Littell Life Science while accounting for both pre-existing student and classroom
differences and any effect of classroom-clustering. The analysis supports the conclusion that
McDougal Littell Life Science was more effective in increasing achievement and improving
attitudes than life science curricula implemented in comparison classrooms. Additionally, the
classrooms in which students scored the lowest on the pretest—those classrooms that could be
described as most needing effective intervention—showed a stronger positive impact on student
achievement and attitudes when McDougal Littell Life Science was implemented than in
comparison classrooms. Additionally, teachers in the treatment group exhibited a higher level of
use of research-based teaching strategies, and students in classrooms where teachers used more
research-based teaching strategies experienced more opportunities to learn.

Overall, teachers viewed McDougal Littell Life Science favorably. Some will continue to use the
program even if districts do not purchase it (including a teacher who also used the materials with
8th grade science classes!), and some of the districts that participated in the study purchased
McDougal Littell Science for the 2005-2006 school year. While teachers made many
suggestions for improving the program, the positive comments and attitudes towards the program
were quite strong. Teachers’ reported that the most effective and valuable components were the
notetaking and vocabulary strategies, the organization of the materials—including “Big Ideas,”
“Key Concepts,” and chapter/section headings, the resources supporting differentiated
instruction, and the technology components that supported instruction.

Finally, when teachers were asked about changes in their science content knowledge and
teaching practices based on their primary Life Science program, the evidence supporting
McDougal Littell Life Science seems striking! When compared to comparison teachers, the
changes treatment teachers reported in their content knowledge were considerable, and
differences in teaching practices between the groups—both reported and observed—were
substantial. If the successes in implementing McDougal Littell Life Science and changes in
teaching practices can be replicated in a lengthier, large-scale study, greater differences in
student outcomes might be expected—particularly if teachers received more timely and effective
professional development than were provided in this study, given the many districts that agreed
to participate just prior to the start of the school year.

Because of the many positive findings of this study, we recommend that funding for a larger-
scale study be sought. Additionally, this study identifies issues that would need to be considered
in a subsequent study, and as such, lays the groundwork for a larger-scale, lengthier study to be
conducted. In particular, this report identifies limitations to the current study, as well as
contextual factors that should be better measured, should a large-scale study be conducted.
71

LIMITATIONS OF THE STUDY

As a small-scale feasibility study, this study included 31 7th grade classrooms whose teachers
were randomly assigned to either the treatment group which implemented McDougal Littell Life
Science, or to a comparison group that used science curriculum that they had previously used.
After half a school year of implementation and data collection, 29 classrooms were represented
in the final analysis of student outcomes. While this relatively small number of classrooms
limits statistical power, it was sufficient to show differences that point to the potential
effectiveness of McDougal Littell Life Science.

Because most of the comparison group teachers had used their curriculum prior to this school
year, they already knew the materials and had developed lesson plans based on it and any
supplementary activities they incorporated. The treatment group, on the other hand, was
required to implement a program they had never used previously and they had little time to
familiarize themselves with the program. These practice and novelty effects, and the additional
stress experienced by the treatment group in having to quickly learn and implement a new
program, were not measured.

Ideally, classroom observations would have been conducted prior to the beginning of the study.
Because use of effective teaching practices has been shown by research to have an impact on
student achievement—a finding that was supported by results from this study—having a better
measure of the degree to which McDougal Littell Life Science changed teaching practices would
be an important factor in a longer term study. Additionally, changes in teacher content
knowledge should be measured.

The lack of randomized studies of published curriculum is often rationalized by the difficulty of
implementing a randomized experimental design in school settings where so many factors cannot
be controlled. Many factors that cannot be easily controlled did have an impact on this study:

• the hurricanes and fire in one school district that caused lost school days and a shortened
daily schedule because two schools (both of which included classrooms that participated
in this study) were forced to share a building for several months,

• the delay in obtaining final data from classrooms because teachers were unable to
complete the two required units (i.e., “Cells and Heredity” and “Ecology”) by mid-year—
due to (a) other teaching requirements, such as units of measure and laboratory skills,
(b) struggling students who slowed the instructional pace, and (c) difficulties in
accommodating both the schedule required by this study and district-mandated calendars,

• teachers who were assigned by district science coordinators rather than asked to
participate in the research, causing some reluctance to participate in the research, and as
treatment teachers, to “fully implement” the curriculum,

• teachers whose lives were so busy and filled with outside influences that they found it
difficult to return data on timely schedules, and
72

• teachers who were uncomfortable being observed in a classroom setting—they explained


that few people (including school-based administrators) intruded on their classrooms.

The impact of many of these factors was not measured, although some (e.g., resistance to
participation, less instructional time) would likely impact student learning and attitudes. While
the randomized design of the study should have helped control for factors that would have
equally affected both treatment and comparison groups, some factors (e.g., resistance to
participation) might have impacted the groups differentially.
73

RECOMMENDATIONS FOR FUTURE RESEARCH

Given the positive effects found in this small-scale feasibility study that included random
assignment to groups, we would recommend that a larger-scale study be conducted. In
particular, a large scale study that includes schools as the unit of analysis, where schools are
randomly assigned to groups and all teachers in the school use the same program, should be
conducted over a longer time period, such as two full school years.

This study included schools and students with varying levels of socioeconomic status, minority
status, and facility with English. Better access to district- and school-level data, and better
measures of student characteristics, would help to more fully determine the impacts of the
program, and to disaggregate data to determine differential effects across groups—particularly as
schools in this country make efforts to “close the gaps.”

It was difficult to recruit schools and teachers for participation in a randomized experimental
study. One of the reasons that some districts agreed to participate in the science study was due to
the upcoming requirements for nationwide science testing. Districts were working to select the
best science programs for use in their district and were willing to try a new program for which
little research data was available. In the math studies conducted simultaneously with this study,
schools and teachers were even more difficult to recruit, given the current focus on using math
test results to determine Annual Yearly Progress. Yet science knowledge and skills will be
required to be assessed within the next few years. For these reasons, we would recommend that
district, school, and teacher incentives be included in research budgets. Without an immediate
reward for participating that is more relevant to schools and teachers than having access to the
final study results, studies such as this are unlikely to continue to be conducted successfully.

With Annual Yearly Progress mandated by No Child Left Behind and high-stakes testing,
districts, schools, and teachers have even less incentive to participate in studies that involve
programs about which they know little. Additionally, many districts refused because of the
number of studies in which they were invited to participate, and their difficulty in allocating time
to review the study proposals to decide which studies might be high quality and valuable to
them. If the “gold standard” of randomized designs continues to be prioritized so highly,
publishers and agencies that fund studies must be willing to provide the incentives necessary to
convince districts, schools, and teachers that their participation is valued and valuable. While
research-based evidence may provide some of this incentive, concrete benefits to schools and
teachers in terms of monetary and material incentives will go far in attracting participants.

The use of effective teaching practices should be measured prior to the beginning of a study, so
the impact of the program on teaching practices can be examined. Classroom observations,
conducted several times per year throughout the study, would be crucial to determining changes
in use of research-based teaching practices, the impact of teaching practices on student
achievement, and changes in the level of implementation of the program. The classroom
observation instrument developed for this study worked well, but the training required to reliably
collect data and the additional data that could be collected to more fully measure implementation
of instructional strategies should be considered in designing a large-scale study and developing
instrumentation.
74

Finally, because teachers in both groups implemented the program they were using to various
degrees, better ways to measure level of implementation for different programs should be
determined prior to the study. While differential levels of implementation between teachers and
groups were found using qualitative measures and teachers’ self-report of implementation of
materials, we believe more complete and accurate implementation data could be captured with
better measures. In particular, implementation needs to include sensitive measures of both
materials and instructional strategies embedded within programs.
75

REFERENCES

Angrist, J., Imbens, G.W., & Rubins, D.B. (1996). Identification of causal effects using
instrumental variables. Journal of the American Statistical Association, 91, 444-472.

Bloom, H.S. (Ed.) (2005). Learning more from social experiments: Evolving analytic
approaches. New York: Russell Sage Foundation.

Bloom, H.S., Bos, J.M., & Lee, S.W. (1999). Using cluster random assignment to measure
program impacts: Statistical implications for the evaluation of education programs.
Evaluation Review, 23, 445-469.

Boruch, R., May, H., Turner, H., Lavenberg, J., Petrosino, A., de Moya, D., Grimshaw, J., &
Foley, E. (2004). Estimating the effects of interventions that are deployed in many places:
Place-randomized trials. American Behavioral Scientist, 47, 608-633.

Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ:
Erlbaum.

Cornfield, J. (1978). Randomization by group: A formal analysis. American Journal of


Epidemiology, 108, 100-102.

Cronbach, L. & Snow, R. (1977). Aptitudes and instructional methods: A handbook for research
on interactions. New York: Irvington.

Donner, A., & Klar, N. (2000). Design and analysis of group randomization trials in health
research. London: Arnold.

Educational Testing Service. (2004). The Praxis Series: Professional assessments for beginning
teachers. Princeton, NJ: Author.

Frontier 21 Educational Solutions. (2004). GeoKids classroom observation protocol.


Philadelphia, PA: Author.

Grossen, B., & Burke, M.D. (1998). Instructional design that accommodates special learning
needs in science. Information Technology and Disabilities, 5(1-2), article 3. [Retrieved on
July 29, 2005, from http://www.rit.edu/~easi/itd/itdv5n12/article3.htm]

Hofmeister, A., & Lubke, M. (1990). Research into practice: Implementing effective teaching
strategies. Boston: Allyn and Bacon.

Horizon Research, Inc. (2000). Inside the classroom observation and analytic protocol. Chapel
Hill, NC: Author.

Hox, J. (2002). Multilevel analysis: Techniques and applications. Mahwah, NJ: Lawrence
Erlbaum Associates.
76

Institute of Education Sciences. (2003). Identifying and implementing educational practices


supported by rigorous evidence: a user friendly guide. Washington, DC: U.S. Department of
Education. [Retrieved on December 12, 2004, from
http://www.excelgov.org/usermedia/images/uploads/PDFs/User-
Friendly_Guide_12.2.03.pdf]

Kameenui, E.J., & Carnine, D.W. (1997). Effective teaching strategies that accommodate
diverse learners. Menlo Park, CA: Prentice Hall.

Korey, J. (2000). Dartmouth College mathematics across the curriculum evaluation summary:
Mathematics and humanities courses. Hanover, NH: Dartmouth College.

Lawrenz, F., Hoffman, D., & Appeldoorn, K. (2002). Classroom observation handbook.
Minneapolis, MN: University of Minnesota, College of Education and Human Development.
[Retrieved on July 5, 2005, from
http://education.umn.edu/CAREI/cetp/Handbooks/COPHandbook.pdf]

Little, R.J.A., & Rubin, D.B. (1987). Statistical analysis with missing data. New York: John
Wiley.

Loh, W.L. (2002). Regression trees with unbiased variable selection and interaction detection.
Statistica Sinica, 12, 361-386.

Marzano, R.J., Pickering, D.J., & Pollock, D.J. (2001). Classroom instruction that works:
Research-based strategies for increasing student achievement. Alexandria, VA: Association
for Supervision and Curriculum Development.

McDougal Littell Market Research. (2004). McDougal Littell Science: Success in the
classroom, evaluation studies. Evanston, IL: Author.

Miles, M. & Huberman, M. (1994). An expanded sourcebook : Qualitative data analysis.


Thousands Oaks, CA: Sage Publications.

Murray, D.M. (1998). Design and analysis of group-randomized trials. New York: Oxford
University Press.

National Research Council. (1996). National science education standards. Washington, DC:
National Academies Press.

Raudenbush, S.W., & Liu, X. (2000). Statistical power and optimal design for multisite
randomized trials. Psychological Methods, 5(2), 199-213.

Raudenbush, S.W. (1997). Statistical analysis and optimal design for cluster randomized trials.
Psychological Methods, 2, 173-185.

Rowan, B., Correnti, R., & Miller, R. (2002). What large-scale survey research tells us about
teacher effects on student achievement: Insights from the prospects study of elementary
schools. Teachers College Record, 104(8) p. 1525-1567.
77

Rubin, D.B. (1976). Inference and missing data. Biometrika, 63, 581-592.

Santa, C.M., Havens, L.T., & Maycumber, E.M. (1996). Project CRISS: CReating Independence
through Student-owned Strategies. 2nd ed. Dubuque, Iowa: Kendall/Hunt Publishing
Company.

Sawada, D., & Piburn, M. (2000). Reformed teaching observation protocol (RTOP). (ACEPT
Technical Report No. IN00-1). Tempe, AZ: Arizona Collaborative for Excellence in the
Preparation of Teachers.

Slavin, R.E. (1990). Mastery learning re-reconsidered. Review of Educational Research, 60(2),
300-02.

Snow, R. (1989). Aptitude-treatment interaction as a framework for research on individual


differences in learning. In P. Ackerman, R.J. Sternberg, & R. Glaser (eds.), Learning and
Individual Differences. New York: W.H. Freeman.

Snow, R., Federico, P., & Montague, W. (1980). Aptitude, learning, and instruction, Volumes 1
& 2. Hillsdale, NJ: Erlbaum.

Stenhoff, D.M., Davey, B.J., Slocum, T.A., Lignugaris/Kraft, B., & Salzberg, C.L. (2004). The
teacher performance measure (TPM) (Tech. Rep. No. 1). Logan, UT: Utah State University,
Department of Special Education and Rehabilitation.

Stigler, J.W., Gonzales, P., Kawanaka, T., Knoll, S., & Serrano, A. (1999). The TIMSS
videotape classroom study: Methods and findings from an exploratory research project on
eighth-grade mathematics instruction in Germany, Japan, and the United States.
Washington, DC: National Center for Education Statistics. [Retrieved on December 12,
2004, from http://nces.ed.gov/pubs99/1999074.pdf]

Weiss, I.R., Pasley, J.D., Smith, P.S., Banilower, E.R., & Heck, D.J. (2003). Looking inside the
classroom: A study of K–12 mathematics and science education in the United States. Chapel
Hill, NC: Horizon Research, Inc.

Wiess, I.R., Banilower, E.R., McMahon, K.C., & Smith, P.S. (2001). Report of the 2000
National survey of science and mathematics education. Chapel Hill, NC: Horizon Research,
Inc.

Yin, R. (1994). Case study research: Design and methods. Thousands Oaks, CA: Sage
Publications.
78

APPENDICES

Appendix A: Sample Agenda of Professional Development Provided for Treatment Teachers


Appendix B: McDougal Littell Sample Script for Seeking Study Participation
Appendix C: Letters Faxed or Emailed to Interested Schools or Districts
Appendix D: Parent Letter of introduction and Passive Permission
Appendix E: Pretest/Postest Forms A and B
Appendix F: Student Attitude Survey
Appendix G: Classroom Observation Instrument
Appendix H: Using Science Materials Observation
Appendix I: Using Science Materials Checklist
Appendix J: Teacher Questionnaire
Appendix K: Student Questionnaire
Appendix L: Teacher Focus Groups
Appendix M: Student Focus Groups
Appendix N: Journaling Email Questions
Appendix O: Descriptive Summaries and Statistics
Appendix O.1: Descriptions of Samples and Measures
Appendix O.2: Responses to Student Questionnaire
Appendix O.3: Responses to Teacher Questionnaire
Appendix O.4: Using McDougal Littell Science Materials Checklist
Appendix O.5: Attrition Analysis
Appendix O.6: Journaling Email Responses

You might also like