10 1002@tea 21082

JOURNAL OF RESEARCH IN SCIENCE TEACHING VOL. 50, NO. 5, PP.
561–596 (2013)
Research Article
Learning to Argue and Arguing to Learn: Argument-Driven Inquiry as a Way to

Help Undergraduate Chemistry Students Learn How to Construct Arguments
and Engage in Argumentation During a Laboratory Course
Joi Phelps Walker1 and Victor Sampson2
1
Division of Science and Math, Tallahassee Community College, Tallahassee, Florida
2
School of Teacher Education & FSU-Teach, Florida State University, Tallahassee, Florida
Received 13 September 2011; Accepted 25 January 2013
Abstract: This study examines whether students enrolled in a general chemistry I laboratory course
developed the ability to participate in scientific argumentation over the course of a semester. The laboratory
activities that the students participated in during the course were designed using the Argument-Driven
Inquiry (ADI) an instructional model. This model gives a more central place to argumentation and the role of
argument in the social construction of scientific knowledge. The development of the students’ ability to
construct a scientific argument and to participate in scientific argumentation was tracked over time using
three different data sources. These data sources included a performance task, which was administered at the
beginning, middle, and end of the course, video recording of the students participating in episodes of
argumentation, and the lab reports the students wrote as part of each lab activity. As time was the independent
variable in this study, a repeated measure ANOVA was used to evaluate changes in the ways students
performed on each task over the course of the semester. The results of the analysis indicate that there was
significant growth in the quality of the students’ written arguments and nature of their oral argumentation.
There also was a significant correlation between written and oral arguments. These results suggest that
the use of an integrated instructional model that places emphasis on argument and argumentation can
have a positive impact on the quality of the arguments students include in their investigation reports,
the argumentation they engage in during lab activities, and their overall performance on tasks that
require them to develop and support a valid conclusion with genuine evidence. # 2013 Wiley Periodicals,
Inc. J Res Sci Teach 50: 561–596, 2013
Keywords: argumentation; argument; chemistry; undergraduate; laboratory; inquiry
Science educators, with the goal of increased science literacy, have developed numerous new
curricula and instructional approaches over the last decade in order to give students a more
authentic science experience in the classroom (e.g., Cooper, 1994; Farrell, Moog, &
Spencer, 1999; Gosser & Roth, 1998; Lewis & Lewis, 2005; Wallace, Hand, & Yang, 2005).
Consistent with this effort are activities and instructional models that are geared towards
introducing scientific argumentation and the construction of scientific arguments into the
classroom. Argumentation has been identified as a key curricular feature that can support students
in developing literacy in the context of science (Krajcik & Sutherland, 2010). However, in order to
Additional Supporting Information may be found in the online version of this article.
Correspondence to: J. Walker; E-mail: walkerj@tcc.fl.edu
DOI 10.1002/tea.21082
Published online 25 March 2013 in Wiley Online Library (wileyonlinelibrary.com).
# 2013 Wiley Periodicals, Inc.

562 WALKER AND SAMPSON
become literate in science students should be given an opportunity to learn how to construct
scientific arguments and participate in scientific argumentation in a manner that reflects the norms
and epistemological commitments of the scientific community (Hodson, 2008). Scientific
argumentation is not a focus of most undergraduate science courses, and it has not been prominent
in the discourse of college science educators. Recently, a new instructional model called
Argument-Driven Inquiry (ADI) has been introduced as a way to transform the nature of
laboratory courses at the undergraduate level (Walker, Sampson, & Zimmerman, 2011). This
instructional model is designed to give a more a central place to argumentation and the role of
argument in laboratory-based instruction.
While many of the instructional models used in undergraduate education provide for
collaborative discourse during an investigation (Poock, Burke, Greenbowe, & Hand, 2007) or
while attempting to solve a problem (Gosser & Roth, 1998), there is little research directly linking
the level of student engagement in the process of argumentation with the construction of individual
arguments. Furthermore, the research that has been conducted on the benefits of collaborative
discourse at the post-secondary level is often limited to pre/post assessments (Walker, Sampson,
Grooms, Zimmerman, & Anderson, 2012), exam questions (Rudd, Greenbowe, Hand, &
Legg, 2007) or traditional assessments (Lewis & Lewis, 2008) that do not capture changes in the
ways students construct scientific arguments or participate in an episode of oral argumentation
over time. Given these gaps in the literature our goal, was not to evaluate the efficacy of the ADI
instructional model in comparison to other methods of instruction, rather it was to examine how
undergraduate students’ participation in collaborative scientific argumentation changes over time
when they are given the opportunity to engage in a series of laboratory activities that privilege
argument and argumentation. We also explored how the quality of individual arguments
constructed by these same students changed over the course of a semester and attempted to link the
level of oral argumentation discourse to the individual product.
Theoretical Framework
This study is grounded in constructivist theories of learning (Bransford, Brown, &
Cocking, 1999; Tobin, 1993; von Glasersfeld, 1989). This perspective on learning is founded on
the basic assumption that knowledge cannot be transmitted directly from one person to another,
but is actively built or “constructed” by the learner (Driver, Asoko, Leach, Mortimer, &
Scott, 1994). This construction of knowledge can occur individually (personal constructivism) or
as a social process (social constructivism). Personal constructivism describes the process by
which individuals develop their own meaning or sense of the natural world through individual
processes, a theory first described by Piaget (1964; 2003). Piaget described learning as a process of
assimilation and accommodation of new information based on what an individual already knows.
Thoughts concerning the social construction of knowledge emerged from the work of Vygotsky
(1978), who described how the interactions that occur amongst groups of learners influence the
nature of the knowledge that is constructed by an individual. In science these interactions are
embedded within a culture that is shaped by tools and symbols which are not found in nature, but
are constructs of the scientific community (Hodson, 2008). It is therefore important to focus on
how individuals learn to use the tools, symbols, and language of the scientific community (Driver
et al., 1994) and how the tools, symbols, and language of the scientific community shape what an
individual learns.
Teaching a laboratory course from a social constructivist perspective, however, requires a
reconsideration of the nature of laboratory activities (Driver, Newton, & Osborne, 2000). In light
of social constructivist views of learning, rather than being a place where students verify concepts
taught in the lecture, the laboratory should provide a setting where students can interact directly
Journal of Research in Science Teaching
LEARNING TO ARGUE AND ARGUING TO LEARN 563
with the material world, using the tools, data collection techniques, models, and theories of
science. The laboratory, it can be argued, is the ideal setting for accessing the aspects of science
that may be missing in the lecture. For example, students should conduct investigations in
conversation and collaboration with peers as well as more experienced adults. Through these types
of interactions, learners encounter and explore ideas in a subject, the uses for these ideas, and ways
that ideas are validated; that is, students learn what constitutes legitimate knowledge in a field
(Colburn, 2004; Linn & Burbules, 1993). Ford (2008) calls this a “grasp of practice” and argues
that it is fundamental to learning science. The specific disciplines that underlie the undergraduate
level courses, such as Biology and Chemistry, have established bodies of knowledge, a unique
language, and rules for gathering evidence and evaluating results. As students engage in
conversations with others, they can draw on their expertise; explain, extend, and reflect on their
own ideas; and gain exposure to the habits of mind exhibited by disciplinary experts (Sunal,
Wright, & Day, 2004).
Historically, undergraduate science laboratories have served little purpose other than to allow
students to demonstrate for themselves the concepts presented in a lecture and to perhaps provide
some training in laboratory technique (Cooper & Kerns, 2006). Most laboratory curricula are
therefore designed to ensure that students get the “right” results and draw the “appropriate”
conclusions from these results. Although these verification or “cookbook” laboratories can be
highly efficient for introducing students to many different topics over the course of a semester,
virtually no meaningful learning takes place when students participate in them (Domin, 1999;
Hofstein & Lunetta, 2004). In response to this finding, the Society of College Science Teachers
(SCST) has recommended that laboratory courses be redesigned so the activities are more inquiry-
based, grounded in current research on how people learn, and promote more critical thinking,
problem solving, and collaborative work on meaningful tasks (Halyard, 1993). A primary
recommendation for science reform is for more inquiry-based instruction in science courses
(National Research Council, 1996, 1999, 2000, 2005; National Science Foundation, 1996).
Inquiry-based instruction is a term used to refer to the teaching and learning strategies that
enable scientific concepts and science process skills to be mastered through investigations
(NRC, 1996). In order to describe the nature of inquiry-based instruction that can take place within
the laboratory setting, Schwab suggested three possible approaches that have been characterized
as Levels 1–3 inquiry. In Level 1 inquiry, laboratory manuals or textbook materials could be used
to pose questions and describe methods to investigate the questions, thus allowing students to
discover relationships they do not already know. In Level 2 inquiry, instructional materials
are used to pose questions, but the methods and answers are left open for students to determine on
their own. The most open approach, Level 3 inquiry, has students confront phenomena without
textbook- or laboratory-based questions. Students are allowed to ask questions, gather data, and
propose scientific explanations based on their own investigations (NRC, 2000). There are a variety
of inquiry-based instructional models that are used at the undergraduate level and most of these
models can be classified using Schwab’s system as Level 2 inquiry. Practitioners, however, tend to
refer to these types of methods as guided-inquiry (Farrell et al., 1999; Lewis & Lewis, 2005; Rudd
et al., 2007).
To bring about reforms in science teaching requires changes to be made to the roles student
must assume, the nature of content, tasks and assessments, and the social organization of the
classroom. The ADI instructional model that is the focus of this study, is one such approach
(Walker, Sampson, & Zimmerman, 2011). This instructional model gives a central place to
argumentation and the role of argument in the social construction of scientific knowledge as well
as the interpretation of empirical data during laboratory-based instruction. In the sections that
follow, we will describe the ADI instructional model in more detail. However, before we can
describe the model we must first clarify the meaning of the terms argumentation and argument and
elaborate on the current research in this area.
Argumentation refers to a form of “logical discourse whose goal is to tease out the relationship
between ideas and evidence” (Duschl, Schweingruber, & Shouse, 2007). Scientific argumentation
requires individuals to analyze and evaluate data and then rationalize its use as evidence for a
claim. A scientific argument, on the other hand, consists of a claim supported by evidence and a
rationale (as represented in Figure 1). The evidence component of an argument refers to data that
has been collected and then used to support the validity or the acceptability of the claim. This data
can take a number of forms ranging from traditional measurements (e.g., pH, mass, temperature)
to observations (e.g., color, descriptions of an event). However, in order for this information to be
considered evidence it needs to be analyzed and interpreted in order to show (a) a trend over time,
(b) a difference between groups or objects, or (c) a relationship. The rationale component of a
scientific argument explains not only why the evidence supports the claim but also establishes the
validity or the relevance of the evidence.
The major obstacle associated with integrating argument and argumentation into the teaching
and learning of science is not that students cannot support ideas and challenge claims or viewpoint.
In fact, a number of researchers have described how adept students are at constructing arguments
and engaging in argumentation in everyday contexts (e.g., Baker, 1999; Resnick, Salmon, Zeitz,
Wathen, & Holowchak, 1993). Instead, it is within the context of science that students’
argumentation skills fall short (Driver et al., 2000). A number of studies, for example, have shown
that students will manipulate, distort or trivialize data rather than reject a preconceived conclusion
(Berland & Reiser, 2009; McNeill & Krajcik, 2007; Sandoval & Millwood, 2005). Other
researchers, who have examined the ways students craft a written argument, have found that
Figure 1. The argument framework used in this study.

students are unable to discern what data are relevant or what counts as evidence (Kelly &
Bazerman, 2003; Kelly & Takao, 2002). Instead, students tend to provide details about their
methods or observations rather than using appropriate evidence to support their claim
(Zeidler, 1997). Finally, even when students have grasped the concept of evidence they will submit
the evidence as all inclusive, neglecting to rationalize the use of this evidence in supporting
their claim (e.g., Klahr, Dunbar, & Fay, 1990). The available literature, in summary, suggests
that students’ lack of argumentation skills in science reflects their misunderstanding of
“what counts” in science rather than being indicative of low cognitive abilities or poor
argumentation skills.
Overview of the ADI Instructional Model
Inquiry and argumentation are complementary goals that make laboratory experiences more
scientifically authentic and educative for students (Jimenez-Aleixandre, 2008; Osborne, 2010).
The ADI instructional model is designed to give a more central place to argumentation and the role
of argument in the social construction of scientific knowledge while promoting inquiry. Figure 2
outlines the seven steps of ADI that are designed to integrate the learning of scientific concepts
with inquiry, argumentation and writing in such a way that little explicit instruction is necessary,
rather students gain proficiency through engagement in the laboratory investigations moving
from investigation design, to analysis and argument development, to argumentation, and to final
written argument.
The first step of the ADI instructional model is the identification of the task through discussion
of the research question. This step is designed to provide students with a challenging problem to
solve and to capture the students’ attention and interest. According to Hofstein and Lunetta (2004)
“students often perceive that the principal purpose for a laboratory investigation is either following
the instructions or getting the right answer.” Thus, it is critical to present the laboratory
investigation as an opportunity to discover something or solve some problem. To accomplish this
goal a good research question is critical in order to provide the foundation for scaffolding student
argumentation and creating a need for evidence. The second step of the ADI instructional model,
the generation of data, is therefore situated by the research question. In this step of the model,
students work in a collaborative group (three or four students) in order to develop and implement a
method (e.g., an experiment or analysis) to answer the research question provided in Step 1. This
provides students with an opportunity to learn how to design and conduct informative
investigations and to learn to deal with the ambiguities of empirical work. The nature of these
investigations is best described as a “guided inquiry” because each group of students must decide
the ways to gather and analyze the data they will need to justify an answer to the research question
(R.L. Bell, Smetana, & Binns, 2005). Inquiry can also provide an opportunity for students to
interact with competing claims or in the case of the ADI; inquiry provides an opportunity for
argumentation (Abrams, Southerland, & Evans, 2008).
The third step, production of a tentative argument, calls for students to craft an argument that
consists of an explanation supported with evidence, and a rationale for the choice of evidence in a
medium, such as a large whiteboard, that can be shared with others. This third step of the model is
designed to emphasize the importance of argument in science. In other words, students need
to understand that scientists must be able to support an explanation, conclusion, or other claim
with appropriate evidence and a rationale because scientific knowledge is not dogmatic
(Hodson, 2008). During the fourth step, an argumentation session, the small groups have an
opportunity to share their arguments and to critique the arguments of other groups. In other words,
argumentation sessions are designed to give students an opportunity to learn to critique the
products (i.e., conclusions, explanations or arguments), processes (i.e., methods), and context
Figure 2. The seven steps of ADI.
(i.e., theoretical or empirical foundations) of an inquiry. These two steps help students develop a
basic understanding of the elements that count as a high-quality argument in science and ways to
determine if available evidence is valid, relevant, sufficient, and convincing enough to support a
conclusion (or claim). More importantly, these steps are designed to make students’ ideas,
evidence, and rationale visible to each other; which, in turn, enables students to evaluate
alternatives and eliminate conjectures or conclusions that are inaccurate or do not fit with the
available data. The ability to evaluate claims in the context of science is important because
research suggests that students often rely on criteria such as plausibility or external authority in
order to determine which ideas to accept or reject as they attempt to negotiate meaning with others
(Kuhn & Reiser, 2005; Sampson & Clark, 2009).
The fifth step of the ADI instructional model is the creation of a written investigation report
by individual students. This report gives the students an opportunity to share the goal of their
investigation, the method they used, and their overall argument. To help students learn to write in
science, ADI requires students to produce a report that is organized around three fundamental
questions: What were you trying to do, and why? What did you do, and why? What is your
argument? Students are encouraged to organize the data they gathered and analyzed during into
tables or graphs that they embed into the report and reference in the body of the text in order to
help students learn to communicate information in multiple modes. The sixth step of the ADI
instructional model is a double-blind peer-review supports the appropriation of evaluation criteria
by students as well as engagement in the assessment practices embedded in the model. Once
students complete their investigation reports they submit four typed copies identified only by a
code number assigned by the classroom teacher. The teacher then randomly distributes three or
four sets of reports (i.e., the reports written by three or four different students) to each laboratory
group along with a peer review handout for each set of reports. The peer review handout includes
specific criteria to be used to evaluate the quality of an investigation report and space to provide
feedback to the author. The laboratory groups review each report as a team and then decide if it can
be accepted as is or if it needs to be revised based on the criteria included on the peer review sheet.
Groups are also required to provide explicit feedback to the author about what needs to be done in
order to improve the quality of the report and the writing as part of the review. The seventh, and
final, step of the ADI instructional model is the revision of the investigation report based on the
results of the peer-review. The reports that are accepted by the reviewers may be submitted to the
instructor at the end of Step 6; however, all students (even when their first draft is “accepted as is”)
have the option to revise their reports based on what they have read and the comments on their
initial draft.
The ADI instructional model, in summary, is designed to function as an integrated
instructional unit (NRC, 2005) and to encourage students to engage in a sequence of activities
(inquiry, argumentation, writing, and peer review) that are intended to help students understand
important concepts and practices in science. The focus of five of the seven steps on argumentation
and generation of a written argument provided the basis for this research into the impact of such an
instructional model on development of student proficiency in these areas.
Previous Research on ADI
The ADI instructional model was piloted in select sections of General Chemistry I
laboratories during the 2008–2009 academic year at a community college in the southeastern
United States. During these three semesters the remaining laboratory sections were taught using
a traditional approach in order to compare the achievement of the students who were enrolled
in the ADI sections to the students in the traditional laboratory sections (Walker et al., 2012).
The results of this study indicated that the students in the ADI sections had an equivalent
conceptual understanding of key concepts (as measured with a conceptual inventory) as
students in the traditional sections however the students in the ADI sections were better able to
provide evidence to support their conclusions (in both familiar and unfamiliar contexts).
In addition, the female students in the ADI sections had significantly better attitudes towards
science at the end of the semester when compared to the female students in the traditional sections
of the course.
In a separate, but related study, the degree to which the ADI instructional model influenced
students scientific writing was also examined (Sampson & Walker, 2012). The authors then scored
the reports written by the students with a rubric developed by the researchers. The results of this
study, although exploratory, suggest that both strong and weak writers can make substantial gains
in their science writing skills over the course of a semester and students benefit a great deal from
the peer-review process.
Research Questions
The results of these two studies, when taken together, provided the justification needed to
implement the ADI instructional model into all of the laboratory sections. Having this
instructional model in place at a community college allowed us to focus our research on the
development of oral argumentation skills in science and the quality of students’ written arguments.
The following research questions guided this study:
(1) How does the quality of undergraduate students’ performance on an argument-based

task change over the course of a semester when they engage in a series of argument-
focused lab activities?
(2) How does the quality of the oral scientific argumentation that takes place between
undergraduate students change over the course of a semester when they engage in series
of argument-focused lab activities?
(3) How does the quality of the scientific arguments undergraduate students write in their
report change over the course of a semester when they engage in a series of series of
argument-focused lab activities?
(4) How does the quality of the collaborative scientific argumentation that occurs
between the undergraduate students during the investigation relate to the quality of
the written arguments produced by individual undergraduate students at the end of
the investigation?
Methods
Context
The General Chemistry I Laboratory course was intended to provide students with the
opportunity to conduct investigations into concepts identified by the faculty as important for
understanding chemistry. These concepts include physical properties, molecular structures,
solutions, chemical reactions, and energy of reactions. The topics were not unique to the
community college curriculum; rather it was the instructional method that distinguished these
laboratory activities. The ADI laboratory handouts provided a brief introduction to the topic, a
statement of the problem to be investigated, the guiding question for the investigation, the
materials available for use, safety precautions and getting started suggestions (Walker, Sampson,
& Zimmerman, 2011). The students were encouraged throughout the semester to keep a notebook
in order to record the process they use to answer the guiding questions and to create data tables that
were meaningful for their investigation.
The sequence of ADI investigations progressed from simple manipulation of glassware and
objects (Density) to more advanced chemistry investigations (Chemical Reactions). The specific
order of the ADI investigations was synchronized with the order of concept presentation in the
lecture course. The first laboratory activity, “Using Laboratory Equipment and Glassware,” was
not an ADI laboratory, but was used as an orientation to the laboratory and to introduce the
students to some of the components of the ADI instructional model. The next five activities,
Density, Hydrate, Solutions, Mole Ratio, and Chemical Reactions, were ADI investigations that
utilized all seven steps of the instructional model. The final two laboratory sessions did not follow
all 7-steps of the ADI instructional model.
Participants
Students enrolled in two sections of a General Chemistry I Laboratory course at a community
college in the southeastern United States participated in this study. The community college is an
open admission, comprehensive 2-year institution that serves nearly 40,000 students annually. All
the General Chemistry I laboratory sections were taught using the ADI instructional model. This
study was conducted during the summer semester, which at the community college lasts 10 weeks
rather than the 15 weeks of a typical semester. This resulted in an accelerated pacing for the
laboratory course such that students were in laboratory for 4 hours some days, essentially
completing 2 weeks’ worth of material relative to the longer semester pacing.
An adjunct instructor with two semesters of experience using the ADI instructional model in
the chemistry laboratory taught the laboratory sections used in this study. The two laboratory
sections met at 2:30 pm on different days. There were originally 23 students enrolled in each
section. Over the course of the semester seven students withdrew, five from Section A and 2 from
Section B. Table 1 presents demographic data for the Summer 2010 semester. The research
participants were representative of the population in terms of gender, race, age, and grade point
average (GPA).
The laboratory had six octagonal tables that accommodated up to four students. The students
were divided into groups by sorting a spreadsheet first by gender, then by GPA and assigning a
table number (1–6). This resulted in six groups made up of both male and female students, a range
of GPAs and of mixed race.
The community college setting presents a research context that is rich in diversity as shown in
Table 1. This student population, however, is not residential and many students work full or part-
time jobs or have families and other obligations. These factors made data collection a challenge,
due to attrition, absences, missed assignments, or assignments that represent a minimal effort.
Despite these contextual limitations, the ADI instructional model was implemented with overall
fidelity in the two sections and the data collected can be considered representative of student work
in an undergraduate chemistry laboratory.
Data Sources
We used a single group repeated measures research design to examine how the participants’
written arguments and ability to participate in an episode of scientific argumentation changed over
time and to explore the conditions that may have influenced these changes. We relied on three
different data sources (a performance task, video recordings of oral argumentation and written
arguments) to characterize the influence of various instructional experiences on the ways students’
crafted scientific arguments and participated in scientific argumentation. The intent of the
repeated measures design was to evaluate more than just absolute change from beginning to end;
our goal, instead, was to capture the process of change that occurred for the collaborative groups as
well as the individual students.
Table 1
Enrollment demographics for general chemistry I laboratory for Summer 2010
Age GPA
Enrollment Female (%) Minority (%) Range Mean Range Mean
All sections 119 49 47 17–51 22.0 1.54–4.00 2.93
Participants 46 56 44 19–35 21.5 1.56–4.00 3.04
Note: Minority means Black, Hispanic, Asian or Multiracial.

During the first meeting of the laboratory sections the students were given a performance-
based assessment to complete individually. This assessment established a baseline for individual
argument skill as well as knowledge of limiting reactants. For each of the five ADI investigations
the groups were observed until they began constructing their argument and designing their poster
at which point video recording using a separate video camera for each table began and continued
through the argumentation session and follow-up discussion. The laboratory reports were copied
and evaluated using a rubric. Variations of the performance task were administered near the
middle and at the end of the semester to capture individual growth over the semester. Table 2
outlines the timetable for data collection. The primary author was present for all but one laboratory
meeting as a participant observer and kept a journal of observations. In the sections that follow, we
will describe each data source in greater detail.
Performance-Based Assessment. A performance task was previously designed to evaluate
student understanding of limiting reactants in a simple chemical reaction (Walker, Sampson,
Grooms, & Zimmerman, 2011). Performance-based assessments such as this allow students to
demonstrate their inquiry and reasoning skills as well as their understanding of the content (Doran,
Chan, & Tamir, 1998). The task entails mixing constant volumes of 1 M acetic acid with different
amounts of sodium bicarbonate added from a balloon covering the mouth of a flask. The balloon
inflates to specific volume based on the amount of sodium bicarbonate it contained. Adding
universal indicator to the flasks provides an additional variable to consider as the solutions that are
initially acidic either retain their acidity or become basic as indicated by a color change from red to
green. This performance task was used in previous studies to document students’ ability to use
evidence and to justify their choice of evidence with an appropriate rationale (Walker et al., 2012).
In this study, each student analyzed a pair of flasks. The students emptied the sodium bicarbonate
into the flasks, swirled the flasks, and observed the balloon inflation and color change. The
students did not discuss the demonstration but made independent observations and conclusions.
The answer sheet for the performance task provided general information about the reaction and
asked the students to identify the limiting reactant in each flask, to provide evidence for their
Table 2
Data collection timetable
Week Activities Data Collected

1 Introduction to ADI, How to use Laboratory Demographics, Performance Task #1
Equipment
2 Density Investigation (Steps 1–4) Video Recording (Steps 3 and 4)
3 Density Investigation (Step 6), Hydrate Video Recording, Investigation Report #1
Investigation (Steps 1–4)
4 Hydrate Investigation (Step 6), Dilution Observations
Exercise
5 Dye Solutions Investigation (Steps 1–4) Video Recording (Steps 3 and 4),
Investigation Report #2
6 Dye Solutions Investigation (Step 6), Performance Task #2, Video Recording
Mole Ratio Investigation (Steps 1–4) (Steps 3 and 4), Investigation Report #3
7 Mole Ratio Investigation (Step 6), Chemical Video Recording, Investigation Report #4
Reactions Investigation (Steps 1–4)
8 Hot Pack/Cold Pack Investigation Observations
9 Molecular Shapes Investigation Report #5, Performance Task #3

conclusion and to justify the use of this evidence with appropriate rationale. Although, this same
performance task was used three times, the amount of sodium bicarbonate was varied so that the
students evaluated a slightly different scenario with each assessment. The three variations for the
performance task are described in Table 3.
Episodes of Oral Argumentation. Two episodes of argumentation were video recorded during
each ADI investigation. The first episode occurred as the core group finalized their claim with
supporting evidence in the construction of their poster. This argument generation required
students to collaborate and come to consensus regarding the use of their evidence in supporting a
claim. This was followed by an argumentation session that involved splitting of the core group into
the traveling members and one presenting member. The traveling members typically visited three
tables resulting in formation of a transitory group with the presenter at that table. This new group
of students engaged in the second type of argumentation that was more persuasive and evaluative
in nature, before the travelers moved to the next table. Following the argumentation session the
core group reconvened and discussed their results in light of their discussion with the other groups.
These discussions were also recorded and were merged with the initial argument generation
process. The argumentation session with the transitory groups was viewed as a single group of
participants engaged in the presentation and evaluation of an argument.
Written Arguments. While the group argument generation and evaluation provided
opportunity for evaluation of evidence and revision of claim, the individual laboratory reports
requires the student to commit to a claim and then support it with genuine evidence and appropriate
rationale. In addition, evaluating the written arguments of students provided an assessment of
individual skills (although the influence of collaboration outside the laboratory and the peer
review process prevents the written argument from being considered entirely individual). The
final version of the lab report written by each student was therefore collected and photocopied
for analysis.
Table 3
Description of performance task variations
HC2H3O2:NaHCO3
Assessment Flask 1 Flask 2 Description of Reactions
1 2:1 1:2 Flask 1 had excess acid, so the indicator remained red. Flask 2 had
excess sodium bicarbonate that was visible as unreacted solid
and the indicator changed to green. The balloons were different
sizes with the balloon for flask 1 being smaller.
2 3:2 2:3 Flask 1 had excess acid, so the indicator remained red. Flask 2 had
excess sodium bicarbonate that was visible as unreacted solid
and the indicator changed to green. The balloons were different
sizes with the balloon for flask 1 being smaller.
3 1:1 2:3 Flask 1 had equal molar amounts of acetic acid and sodium
bicarbonate. The indicator turned orange. Flask 2 had excess
sodium bicarbonate that was visible as unreacted solid and the
indicator changed to green. The balloons were close to the same
size.

Data Analysis
During the semester students were absent and failed to turn in assignments making the actual
numbers collected for each assessment vary to some extent. In addition, over the course of the
semester Section A lost five students that resulted in two of the groups being reduced to two
members. A third group in section A had intermittent attendance by two members, so three groups
from this section were not scored for argument generation or evaluation. Section B retained
enough students for all six groups to be evaluated for the entire semester.
The performance task and laboratory reports were scored with an established rubric and the
video recordings were scored using the ASAC observation protocol. An independent researcher
was used to validate the scoring for each data source and the inter-rater reliability of the ASAC and
the other two rubrics was assessed using Cohen’s Kappa. Cohen’s Kappa is a statistical measure of
inter-rater agreement and is generally thought to be a more robust measure than simple percent
agreement calculation because it takes into account the agreement occurring by chance. Cohen’s
Kappa values range from 0 to 1 and a value of >0.6 indicates good to excellent agreement between
two raters (Landis & Koch, 1977).
The Performance Task

The scoring rubric for the performance task awards a single point for correctly identifying the
limiting reactant in each balloon, three points for evidence (color, balloon size, and solid residue)
and three points for providing a valid rationale for each piece of evidence. See the Supporting
Information section for the scoring rubric and Walker, Sampson, Grooms, et al. (2011) for
information about how the rubric was developed and validated. The total score possible on this
task was seven points for each flask or 14 points per assessment. The performance task was given
to the students three times during the semester. The first author scored all of the answer sheets. A
second scorer, a science educator who did not have a stake in this study, scored 30% of the answer
sheets. The inter-rater reliably of the scores, as measured by Cohen’s Kappa, was 0.69. An
example of a student’s answer for the yellow balloon that had the 2:1 ratio of acetic acid to sodium
bicarbonate is provided in Figure 3 to illustrate the scoring. The student correctly identified
sodium bicarbonate as the limiting reactant and provided all three pieces of evidence; however,
their rationale failed to fully explain why this evidence supported their claim.
Figure 3. Answer for one flask on the mid-term performance task.

Assessment of Scientific Argumentation in the Classroom (ASAC)

In order to assess argumentation, most researchers video or audio record students as they
engage in argumentation, then transcribe the discourse, and finally code or score the transcription
using a framework such as Toulmin’s Argument Pattern (Erduran, Simon, & Osborne, 2004). This
process can be difficult for researchers because argumentation is often nonlinear in nature and the
various aspects of a verbal argument (e.g., data, warrants, backings) are often difficult to identify.
Frameworks, such as Toulmin’s Argument Pattern, also tend to ignore social interactions during
an episode of argumentation. In consideration of these issues, the video recorded argument
generation and evaluation sessions were scored using an observation protocol called the
Assessment of Scientific Argumentation in the Classroom (ASAC).1 This instrument was
designed to capture argumentation events in a more holistic fashion allowing for a more
comprehensive assessment of the quality of an argumentation event. The development and
validation of the ASAC instrument is described in Sampson, Enderle, and Walker (2011).
The ASAC is divided into three sections, conceptual and cognitive, epistemic, and social.
These sections are based on the three integrated domains that Duschl (2008) describes as essential
for generating and evaluating arguments in educational contexts. The Conceptual and Cognitive
Aspects of Argumentation section consists of seven items, which enable the researcher to evaluate
how well the participants’ consider and evaluate alternative claims or reasons and their ability to
provide reasons in support of ideas. Scores on this aspect of argumentation range from 0 to 21. The
Epistemic Aspects of Argumentation section contains seven items, which focus on the ways the
participants’ support and challenge claims (e.g., privileging evidence and scientific theories, laws
or models during the discussion). Scores on this aspect of argumentation range from 0 to 21. The
Social Aspects of Argumentation section contains six items, which are designed to assess the ways
participants interact with each other during an episode of argumentation. Scores on this aspect of
argumentation range from 0 to 18.
The video recordings for the five argument generation and argument evaluation sessions were
evaluated separately using the ASAC observation protocol. The researcher took notes while
watching each video recorded event. Immediately after watching the video each item on the
ASAC was scored for that event using a 4-point scale based on how often a particular aspect of
argumentation was observed (0 ¼ not observed to 3 ¼ observed often). The ASAC has two items
for assessment of inappropriate aspects of argumentation, so for these the scale is reversed
(3 ¼ not observed to 0 ¼ observed often). The first author scored both events for all nine groups.
A fellow researcher who participated in development of the ASAC but who did not have a stake in
this study scored both events for one group, representing 11% of the data. The inter-rater reliability
of the scores, as measured by Cohen’s Kappa, was 0.60.
Lab Reports
The laboratory reports consist of three sections. Section 1 describes the problem and the
context, Section 2 describes the method used and section three is the argument. Only Section 3 of
the laboratory report was deemed relevant given the objectives of this study. Section 3 required the
student to provide and support an answer to the research question with appropriate evidence and a
rationale (i.e., the argument or the product of the inquiry). A scoring rubric,2 which was developed
and validated by Sampson and Walker (2012), was used to score the five aspects of the student’s
written argument. These aspects focused on the author’s ability to:
(1) Provide a well-articulated, adequate, and accurate claim that answers the research
question.

(2) Use genuine evidence to support the claim and to present the evidence in an appropriate
manner.
(3) Provide enough valid and reliable evidence to support the claim.
(4) Provide a rationale is sufficient and appropriate.
(5) Compare his or her findings with other groups in their laboratory section.
The rubric provides a basis for scoring components of the argument on a scale of 0
(not observed) to 3 (indicting all criteria were met), making a total score of 0–15 possible. A
professor of chemistry at the community college who teaches the general chemistry laboratories
collaborated in scoring five reports from each ADI laboratory (14% of the reports). A comparison
of the scores on each item yielded a Cohen’s Kappa value of 0.83. An example of an argument
section from one of the student’s laboratory report is provided in Figure 4. The capitalized words
in brackets precede the relevant rubric criteria.
The student in this example lost points on his presentation of data, because he did not provide
the actual mass values for anhydrous salt, hydrate, and water. All of his values were total masses
including the test tube. The student lost a single point for his comparison with the scientific
community because he did not provide details to support his assertion that the values were “close.”
Also, the inference that agreement is indicative of correctness is flawed.
Results and Discussion
This section presents a statistical analysis of each data source (e.g., performance task, written
argument, and ASAC) and our interpretation of each analysis. In addition, we use examples of
student work and selected transcriptions from oral argumentation to illustrate trends we identified
Figure 4. Example of written argument scoring.

using quantitative techniques. A repeated measures ANOVAwas conducted for each data set. The
pre-set alpha level of 0.05 was used to determine statistical significance except for the follow-up
contrasts. The significance of the follow-up contrasts was determined using the Bonferroni
approach in order to control for Type I error. The graphical presentation of the mean scores is
shown with bars representing 1 SD (or 68% of the scores) in order to illustrate the variation in
student performance. Effect sizes are reported as partial eta squared (h2) for the repeated measures
ANOVA and as Cohen’s d for the pairwise comparisons (Cortina & Nouri, 2000).3
How does the quality of undergraduate students’ performance on an argument-based
task change over the course of a semester when they engage in a series of argument-focused
lab activities?
We conducted a repeated measures ANOVA with the factor being time of the assessment and
the dependent variable being the total score in order to determine if students’ scores on the
performance task improved over time. The means and standard deviations for the performance
task scores are presented in Table 4. The results of the ANOVA indicated a significant time effect,
Wilks’ L ¼ 0.67, F(2, 29) ¼ 7.06, p < 0.01, multivariate h2 ¼ 0.33. The results of the follow-
up contrasts using a series of paired-samples t-tests indicated that the mean score for the mid-term,
t(33) ¼ 3.59, p < 0.01 and post-test assessment, t(32) ¼ 3.38, p < 0.01, were significantly
different than the pre-test. The Cohen’s d values for the two contrasts were, 0.62 and 0.60,
respectively. There was not a significant difference between the mid-term and post-test
assessments. Using the Bonferroni approach to control for Type I error across the three contrasts, a
p-value of <0.02 (p ¼ 0.05/3 ¼ 0.02) was required for significance. The means and standard
deviations are presented graphically in Figure 5.
Given the significant result for the overall performance task score, we decided to conduct
a repeated measures ANOVA for each component of the argument. The factor for this analysis
was time of the assessment and the dependent variable was the answer, evidence, or rationale
score. The results of the ANOVA indicated a significant time effect, Wilks’s L ¼ 0.45,
F(2, 29) ¼ 17.69, p < 0.01, multivariate h2 ¼ 0.55 for the answer component and a
significant time effect, Wilk’s L ¼ 0.71, F(2, 29) ¼ 5.87, p < 0.01, multivariate h2 ¼ 0.28 for
the rationale component of the argument. The ANOVA for the evidence component was not
significant.
The results of the follow-up contrasts using a series of paired-samples t-tests are presented in
Table 5. The means and standard deviations for each component are presented graphically in
Figure 6. Using the Bonferroni approach to control for Type I error across the three contrasts for
each component, a p value of <0.02 was required for significance. The pre-test mean score for the
answer component was significantly different from the mid-term mean score, t(32) ¼ 6.35,
p < 0.001. The Cohen’s d was large with a value of 1.1. The mean rationale score was significantly
different for the pre-test versus post-test comparison, t(32) ¼ 3.17, p < 0.001 as well as for the
Table 4
Means and standard deviations for performance task scores
Performance Task Mean Score (SD)

Event N Answer Evidence Rationale Total
Pre-test 36 0.81 (0.75) 1.92 (1.32) 0.92 (0.94) 3.64 (1.92)
Mid-term 35 1.66 (0.64) 2.26 (1.12) 1.11 (1.05) 5.03 (2.02)
Post-test 34 1.44 (0.75) 2.38 (1.37) 1.68 (1.27) 5.50 (2.90)

Figure 5. Performance task scores.
mid-term versus the post-test comparison, t(32) ¼ 2.52, p ¼ 0.02. The Cohen’s d values were
moderate with a value of 0.5 for both rationale comparisons.
In all three iterations of the performance task one of the balloons contained excess sodium
bicarbonate. This repeated scenario provided a consistent point for comparison of student
understanding of the content and their skill at constructing an argument. This progression is
illustrated here by the responses of a single student whose total scores on the three performance
Table 5
Results of paired samples t-tests for performance task component scores
Component Event Mean Difference t p d

Answer Pre-test 0.85 6.35 <0.001 1.1
Mid-term
Answer Pre-test 0.61 3.12 0.04 0.54
Post-test
Answer Mid-term 0.28 1.72 0.10 NA
Post-test
Rationale Pre-test 0.21 1.05 0.30 NA
Mid-term
Rationale Pre-test 0.70 3.17 <0.001 0.55
Post-test
Rationale Mid-term 0.53 2.52 0.02 0.45
Post-test

Significant at p < 0.02.

Figure 6. Scores on each component of the performance task.
tasks reflected the mean for each assessment. This student provided the following answers the first
time she completed the performance task:
Is the Limiting Reactant HC2H3O2 or NaHCO3 or Neither? Neither

The evidence is. . . The concentrate of HC2H3O2 changed to a yellow color. There may have
been minimal contents of acid in the NaHCO3, enough to chemically change the color to
yellow, but not enough to turn green
The rationale is. . . Because no acid concentrate would show by no color change in
HC2H3O2, and the contents in the vile did change color, there had to have been some other
chemical—if not acid—that reacted with the HC2H3O2
On this task the student earned a total of one point of the seven possible points. First, she
incorrectly identified neither reactant as limiting (0/1 point). As evidence, she only lists the yellow
color (1/3 points). Finally, she proposes a third mystery chemical that prevented the reaction from
turning completely green for her rationale instead of attempting to justify her claim with the
evidence (0/3 points). Her answers clearly represent a sincere attempt to complete the
performance task but also illustrate her limited understanding of the concept and the components
of an argument. The answer is incorrect, the evidence is only an observation and the rationale is
illogical.
At the midpoint in the semester, she provided the following answers on the performance task:
Is the Limiting Reactant HC2H3O2 or NaHCO3 or Neither? HC2H3O2

The evidence is. . . The acid turned green.
The rationale is. . . The base reacted with the acid, and because there was an excess amount
of base, the solution turned green.

She correctly identified acetic acid as the limiting reactant (1/1 point), but she again simply
used an observed color change as evidence (1/3 points). Although she only included a single
observation to support her claim, she no longer uses an unsubstantiated inference as evidence,
which is an improvement over her first attempt. Her rationale is sound but minimal (1/3 points)
giving her a total score of three out of seven points possible. This would then be a threshold where
the student has the content knowledge, but not the depth of understanding or the skill to construct a
robust argument.
For the third performance task, which was the most challenging of the three, she provided the
following answers:
Is the Limiting Reactant HC2H3O2 or NaHCO3 or Neither? HC2H3O2

The evidence is. . . There was a much greater amount of NaHCO3 in this reaction. Because
the formula provided is balanced and is an optimum mole ratio, having more NaHCO3 made
HC2H3O2
The rationale is. . . Green indicates that no acid is present because the entirety of HC2H3O2
was allowed to completely react. There was more NaHCO3 in the balloon than in the orange
balloon. There was still some NaHCO3 that was not dissolved.
On the final performance task this student constructed a robust argument in support of a
correct answer (1/1 point). She only identifies excess sodium bicarbonate specifically as evidence
(1/3 point), however she uses all of the possible evidence in the rationale (3/3 points). The evidence
score on the performance task was assessed using only the information written directly under
evidence, on occasion as in this case additional evidence was used in the rationale that was not
used in the evidence score. Her total score for this flask was five out of the seven points possible.
Anecdotally, both the researcher and the second scorer noted the improvement in the student
rationales for the third performance task.
There are several reasons that the improvement between the second and third assessment
might not be significant. First, and foremost, the third assessment was the most challenging of the
three. The difference between the two flasks was less obvious, the balloons inflated to roughly the
same size and one flask did not remain red, but turned orange. The instructions told the students
that red indicated acid was present and green indicated that no acid was present. The students had
to infer that orange indicated a stoichiometric mix of the reactants, such that neither was limiting.
Their difficulty with this is reflected in the slight drop in the average answer score for the final
assessment. The rationale score, in contrast, showed significant improvement from the midterm
assessment to the post-test. This result is similar to the findings of other research that has shown
that students have difficulty providing the backing, or what we refer to as rationale, for why they
chose to use certain pieces of data as evidence in their written arguments (P. Bell & Linn2000;
McNeill, Lizotte, Krajcik, & Marx, 2006). McNeill and Krajcik (2007), for example, reported that
the middle school students in their study learned how to support their claim with evidence quickly,
but that they less frequently articulated the scientific principles that allowed them to make that
connection. The numerous studies on scientific argumentation suggest that this ability does not
come naturally to most individuals but rather is acquired through practice (Osborne, Erduran, &
Simon, 2004). Therefore, science educators debate the need for students to be explicitly taught
how to construct an argument complete with claim, evidence and rationale or whether
instructional methods that emphasize argument can be used to foster the development of these
skills over time (Jimenez-Aleixandre, 2008). The results of this study suggest that the latter case
may be sufficient for developing these skills.

Student knowledge and understanding of chemical reactions could provide an additional or

complementary explanation for the delayed development of the rationale in student arguments on
the performance task. Metz (2000) points out that the ability to construct an accurate scientific
explanation represents a complex interaction of scientific reasoning capacities and domain-
specific knowledge. The second performance-based assessment was given after the concept of
limiting reactant had been covered in the lecture course, but prior to the ADI investigation on the
topic. It seems reasonable to suggest that the ADI investigation augmented the domain-specific
knowledge to such a degree that when coupled with continuous experience with argumentation
provided in the laboratory that the students provided a better rationale for the claim on the final
performance task.
How does the quality of the oral scientific argumentation that takes place between
undergraduate students change over the course of a semester when they engage in a series of
We conducted a repeated measures ANOVA for each of the two argumentation events that
take place during the ADI instructional model in order to determine if the quality of the
argumentation that takes place between the students during the five investigations improved over
time. The factor in each analysis was time and the dependent variable was the ASAC score. The
means and standard deviations for the ASAC scores are presented in Table 6. The results of the
ANOVA indicated a significant time effect for the argument generation events, Wilks’s L ¼ 0.13,
F(4, 5) ¼ 8.38, p ¼ 0.02, multivariate h2 ¼ 0.87, as well as the argument evaluation events,
Wilks’s L ¼ 0.10, F(4, 5) ¼ 10.79, p ¼ 0.02, multivariate h2 ¼ 0.90. As illustrated in Figure 7,
the ASAC scores for both argument generation and argument evaluation follow a similar trend that
is higher for the first investigation than the second then increases for the third and fourth
investigations with slight drop for the final investigation.
The first investigation on density included an unfamiliar piece of equipment, the spill can.
This resulted in a significant amount of discussion on how to use the spill can and why the resulting
volume tended to be considerably different from the value obtained with calipers. In addition,
some groups received an object with a density <1.0 g/mL. This presented the groups with a
particular challenge when trying to find volume by water displacement. The unusual nature of the
spill can and the floating object led to an unexpected degree of discussion about method that
incorporated more elements of argumentation than the other investigations. Students struggled
with how to conduct the investigation and repeatedly questioned the validity of the data they
collected. The spill can method for water displacement did not provide reproducible results either
within a group or between groups. The floating object was handled two ways, either the floating
was ignored and explained away as a “source of error” or the object was pushed under water with
paper clip which was considered unreliable but necessary.
The drop in ASAC scores for the final investigation, chemical reactions, was probably due to a
weak guiding question, “Can you prepare 0.5 g of product?” The intention was for students to try
different combinations of reactants in order to achieve higher yields. The time required to dry the
product and obtain an initial yield precluded repeating the experiment for most groups, so that they
just reported their yield for stoichiometric amounts of the reactants. In addition, these yields
tended to be 80–90%, which was a reasonably good yield and as a result did not necessitate more
of an answer to the question beyond “yes”; which consequently did not engender meaningful
argumentation.
In addition, Figure 8 reveals a shift in the relative ASAC scores during each argumentation
event. In the first two investigations, the argument generation scores are higher than the argument
evaluation scores with mean differences of 3.11 and 1.66 for the Density and Hydrate
Investigations, respectively. For the third investigation, Solutions, the means only differ by 0.20.
580
Table 6
Means and standard deviations for ASAC scores
Mean Scores (SD)

Argument Generation Argument Evaluation
Total Mean
Activity Cognitive Epistemic Social Total Cognitive Epistemic Social Total Difference
Density 7.13 (1.73) 11.63 (2.39) 7.25 (2.66) 25.78 (5.07) 6.78 (2.39) 10.78 (2.73) 5.11 (1.96) 22.67 (6.34) 3.11
Hydrates 6.56 (2.60) 8.89 (1.45) 4.44 (2.07) 19.44 (4.88) 5.33 (1.41) 7.22 (3.19) 5.22 (1.64) 17.78 (5.12) 1.66
Solutions 7.63 (3.07) 11.38 (2.56) 5.38 (3.81) 24.82 (7.05) 7.38 (2.72) 10.75 (4.10) 6.50 (1.93) 24.62 (7.86) 0.20
Mole ratio 9.29 (2.81) 12.57 (4.43) 6.86 (3.34) 28.36 (8.24) 8.88 (2.36) 12.63 (2.50) 9.25 (3.11) 30.76 (6.78) 2.40
Reactions 6.78 (1.58) 10.25 (1.16) 1.78 (3.11) 21.76 (2.91) 7.38 (1.77) 11.00 (1.85) 6.88 (1.96) 25.26 (4.66) 3.50
WALKER AND SAMPSON
Figure 7. ASAC scores for both argumentation events.
Figure 8. Written argument scores.

In the final two investigations the means have reversed, with the argument evaluation event being
higher than argument generation event with mean differences of 2.40 and 3.50 for the Mole Ratio
and Reactions Investigations, respectively. The argument generation scores likely suffered from
group familiarity. In watching the video recordings, it was clear that group members took on pre-
set roles for each investigation. These roles included whiteboard designer, calculator, chemical
gatherer, and director. In a typical ADI laboratory section, students are shuffled for each new
investigation to avoid this situation. The argument evaluation session required students to evaluate
claims, evidence, and underlying assumptions, which are tasks that students have less experience
navigating (Kuhn & Reiser, 2005).
We decided not to perform a repeated measures ANOVA on the three section scores for the
ASAC given the small sample size at the group level; however, some insight can be gained by
visual inspection of the means and standard deviations for the cognitive, epistemic, and social
components of the ASAC scores provided in Table 6. The cognitive and epistemic sections each
had seven items to score making direct comparison possible. The epistemic score was higher for
all five investigations in both argumentation events. This can be explained most directly by the
relatively few instances of consideration of alternative claims or claim changes, for which the
cognitive section has three items. During the argument generation stage of the investigation,
the students would discuss their data, the acquisition of the data, the calculations involving
the data, but generally once they came to a conclusion they did not consider alternatives. The
follow-up discussion after the argument evaluation events, where claim assessment would be most
likely, was consistently minimal. Students were eager to leave the laboratory and often sent
someone back to their table to make claim changes or edit data during the argumentation session.
This particular issue may have contributed to the low occurrence of “claim change” and
consequently the lower cognitive scores.
Some understanding into this situation was gleaned from video discussions during the argument
evaluation and the following week regarding the laboratory report. During the argument evaluation
the students did not know how to address differences in results and data, they operated with the idea
that one answer was right and one answer was wrong and, as a result, the students did not attempt to
identify a potential source of error in one or both claims. To illustrate this trend, consider the
following discussions that took place during the hydrate investigation, between two groups
attempting to identify the same unknown. These groups visited each other’s table at the beginning of
the argument evaluation session to compare claims. The first discussion occurred at table 2B with
travelers from table 6B (P is the presenter for table 2B and T1–T3 are the travelers from table 6B).
P: Ok, so we got completely different things from yours (she had gone and looked at
their whiteboard earlier). We got zinc sulfate.
T1: Huh?
P: Obviously we were both supposed to identify the unknown hydrate C, we said it
was zinc sulfate because our hydrous amount was 0.508 and our anhydrous
amount was 0.304 and after calculating we have 59.65% of salt left, subtracting
that by one hundred and we 40.35% for trial A. For trial B we got 45.56%. Those
are close enough that we took an average, uh—we could have done more trials, but
we had to start over because we initially put all in the bottom instead of spreading
it.
T1: Yeah, fantastic, yeah. (nods head as if understanding the problem)
P: Our average was 42.955 and our percent error was 2.086, uh, I don’t know if
you did anything different in your calculations that would change the percent.
Did you take into consideration that it was supposed to spread throughout the
entire (uses hand to show spreading up the side of the test tube).
T1: Yeah, yeah, yeah, we got that, did that.

P: Uhm, I don’t know what it could be then. The mass would not have affected it that
much. Did y’all start with 0.508 grams?
T1: Yeah, yeah we did that. Let’s see, what did we do (refers to notes).
T2: We did point five.
T3: Point five.
P: How many trials did you do?
T2: About two, three? (looks to T1 for confirmation)
P: Y’all did three?
T1: We did two separate trials.
P: Yeah, we did everything the same; I don’t know why they’re different. I am trying
to think of what would offset it.
T1: You used two different tubes, right?
P: Yeah, and we labeled them.
T1: This is really confusing. I just don’t understand how we got different percents.
(T1 and T3 have a conversation off camera that sounds like they are discussing the
calculations.)
P: It doesn’t really matter as long as you write down what you did. What was your
percent error?
T2: It was point something.
T1: Yeah, point seven percent.
P: Well, yours is definitely more accurate than ours.
This discussion illustrates the focus on correctness and the inability to fully investigate
sources of error. The only real source of error in this experiment was derived from failure to heat
the sample to a constant mass. The zinc sulfate pentahydrate was 42% water, the reason for table
6B’s low value was that they did not remove all of the water. The participants never discussed the
likelihood that the sample was not fully dehydrated. The method of heating was discussed which
was a legitimate concern, however, the discussion regarding the amount of material heated was
irrelevant as the percent water was a constant.
The following discussion took place at table 6B. This discussion demonstrated the reliance on
the authority of the instructor to resolve the discrepancy in claim (P’ is the presenter for table 6B
and T1’–T3’ are the travelers from table 2B).
P’: Our goal was to identify the unknown hydrate. Our claim was that it was copper
two sulfide, I mean sulfate actually. The experiment that we did was, we had test
tube 1 and test tube 2, what we did was we weighed the test tube by itself then the
test tube with the hydrate. Then we subtracted the test tube by itself that gave us
the known. After we used the Bunsen burner to remove all of the H2O we
subtracted the test tube from the amount we got from that to get the experiment.
Then we plugged it all into the formula and got the percent. The results are very
precise (refers to numbers on board), but not that accurate.
T1’: We actually got the zinc (sulfate), I don’t know if it is right, but we got the zinc.
T2’: Did he (the instructor) say y’all’s is right?
P’: No, not yet.
T2’: I guess we will find out.
P’: Do y’all have your numbers here? (Takes paper from T2.) Yeah, I think you guys
probably did get zinc.
T2’: Yeah, we had one that was like 45% and one that was. . .
T3’: Y’all got copper sulfate, who else got copper sulfate?

T1’: They did and they did (pointing around the room). The other groups that switched
had copper, no one else has zinc.
T3’: But one of them could have got it wrong.
T1’: I think he (the instructor) will say at the end who was right. (Instructor comes by at
this point.) Are we supposed to have the same hydrate, because we have different
answers?
I: Yes, so somebody is wrong.
T1’: Does that mean we will fail our lab report?
This discussion was focused entirely on deducing the correct answer by eliminating hydrates
identified by other groups. There was no discussion of the data, methodology or even the
calculations. The instructor unfortunately only exacerbates the situation by declaring, “someone
is wrong.” A more inquiry-oriented response would have guided the students to consider why the
water percentage would be low or even to suggest that they examine the data generated by each
group.
The following conversation took place after the argument evaluation. The excerpt picks up
with the presenter from table 6B presenting a forced conjecture as to why his group’s hydrate was
not blue like other groups that had a copper salt.
P’: There were three people that had this and they all had a blue one, be we don’t know
if they dyed it or not, so we have to find out if we are right. (Instructor comes over
and talks for sometime about how to do the calculations correctly.)
P: (The presenter from table 2B comes over to talk.) I think it was the way you burned
it, did you only heat the bottom?
P’: No, we heated up the side to get the water out. So, it still had some water in it?
P: Yeah, that is what I think.
T1: What should we do?
P’: We have to just explain that man.
Several trends are evident in this series of conversations. The problem with table 6B’s data
was that it still had water present. The presenter from table 2B suggested this possibility at the end
of the lab, but the idea did not resonate with the group. The focus was on right versus wrong rather
than evaluating the evidence. The students’ reliance on authority is clear in this situation as well as
the tendency to use inappropriate reasoning to justify a claim, both of which are recognized
barriers to productive scientific argumentation (Kuhn & Reiser, 2005; Sampson & Clark, 2008).
For example, the lack of blue color was actually a clear indication that the group did not have a
copper salt, rather than using this to adjust their claim they decide that maybe there is dye in some
of the samples. Finally, after all of the input from the other group and the instructor, the presenter
for table 6B was caught on tape the following week saying, “We had a little error last week, but I
just wrote what we did and that it was wrong.” The student’s laboratory report was an exact copy of
his table’s whiteboard with the conclusion that his answer was wrong because the other group was
right with a claim of zinc sulfate. He had included none of the corrections that came out of the
argument evaluation or the discussion with the instructor, but stuck to his original claim.
This example was used, even though it was early in the semester, because of the conversation
that was recorded on the tape at the end of laboratory the following week. In fact, this situation
occurred repeatedly over the course of the semester. This is an interesting observation given that
the ADI instructional model is designed to create a scientific community in both the collaborative
groups and through the poster sessions; yet, the written arguments consistently disregarded or
minimized the evidence presented by other groups. Field observations found that the students

operated under an assumption that using another table’s data to support a claim or to support a
claim change violated an unspoken code of ethics (i.e., cheating). The students’ struggled with
accepting that they would not be told if they were “right,” rather that they had to make a strong
argument, which took into account evidence from multiple sources, that is, evidence generated by
their peers. The focus of the students on whether a claim is right or wrong during the episodes of
argumentation rather than the quality of the evidence or how consistent a claim is with the
available evidence was indicative of a failure to understand an overarching premise of
argumentation and thereby its role in science.
Two weeks later during the argumentation session for the solutions laboratory activity, there
was a similar situation where the tables that had the same unknown had different molar
concentrations for the two dyes in their solution. Rather than just accepting that someone was right
or wrong, this time the students tenaciously compared their data and their calculations until they
found an error. The group of students with the error corrected their concentration values on their
whiteboard and used the corrected values in their laboratory reports as well. This was one of the
view instances of claim change following the argument evaluation session. This occurred on the
third ADI investigation suggesting that the students were accepting the role of argumentation and
at least these two tables were participating in classroom culture that was comparable to the culture
of science.
Over time the arguments improved in both use of reasoning and willingness to change claims.
A particularly illustrative example occurred during the argument evaluation for the mole ratio
investigation. This is an investigation were everyone could come to the same conclusion and most
groups did agree that the mole ratio is 1:1. In this example, the presenter for table 5A adamantly
moves the discussion from claim to reason.
T1: (looks at board) Yeah, we got the same thing.

P: Yeah, ok, but let’s see if we had the same reasons. We said 1:1. (referring to graph)
We thought that since it was increasing by so much for the first three and then it
started leveling off after so that adding moles after you got the 1:1 ratio, didn’t
make much more of the CO2. For the 1:1, we got 380 mL, if we did it twice we
would get 760 mL. Now, this one uses twice as much and you only get 420.
T1: That makes sense.
T2: A lot of people are saying, the 1:2.
P: I think once the people that got 1:1 start explaining, they will switch over because
it just makes sense.
The statistical evidence for growth, which these examples illustrate, supports the conclusion
that argument-focused instructional models, like ADI, can impact the argumentation that takes
place between undergraduate students over the course of a semester. When taken together, these
results also suggest that the notion of claim changes in light of evidence obtained by others was
new to students as well as the concept of moving beyond right and wrong to a reasoned
explanation.
How does the quality of the scientific arguments undergraduate students write in their
report change over the course of a semester when they when they engage in a series of
We conducted a repeated measures ANOVA to determine if the students wrote a better
argument section of their laboratory reports over the five different laboratory activities. The factor
for this analysis was time and the dependent variable being the argument score. The means and
standard deviations for the written argument scores are presented in Table 7. The results of the

Table 7
Means and standard deviations for written arguments
Investigation N M (SD)
Density 35 7.74 (3.58)
Hydrate 34 6.85 (3.29)
Solutions 32 7.91 (4.15)
Mole ratio 33 9.12 (4.41)
Reactions 29 9.52 (4.03)
ANOVA indicated a significant time effect, Wilks’ L ¼ 0.46, F(4, 31) ¼ 9.05, p < 0.01,
multivariate h2 ¼ 0.54. The means for each investigation 1 SD are presented in Figure 8. The
results of this analysis indicate the quality of the arguments included in the investigation reports
improved over time.
The written argument scores follow a similar pattern to the ASAC scores with a higher score
for the first investigation than the second then increasing for the third and fourth investigations and
leveling off for the final investigation. As with the ASAC scores, the arguments for the first
investigation focused on methodology rather than claim that resulted in the elevated score on the
first report. The mean written argument score for the final investigation did not drop like the ASAC
score but increased slightly.
Overall, the participants in this study appeared to understand the importance of presenting
their data in tables or graphs so that most scored consistently high on this aspect of the rubric. Most
students also included a claim or conclusion in their argument. Many of these students, however,
did not provide a sound rationale for their claim. To illustrate the improvement in the students’
arguments over time, consider the following series of arguments that were written by the same
student. This student was chosen because his scores closely followed the class trend. This student
consistently scored the maximum three points on the data presentation component of the
argument, so only the text of their argument is given. The capital words in brackets are placed just
before the relevant rubric criteria. Table 8 provides the student’s scores for each report.
Hydrate Example
For most students, the second laboratory report did not differ much from the first with regard
to the elements of argument included or missing. There was a claim and a data table, but very little
backing and little or no comparison with their matched group for the unknown. The second
investigation asked the students to identify an unknown hydrate compound as one of four possible
compounds by using the percent water. To illustrate this trend, consider the following example.
Table 8
Written argument scores for a single student
Investigation Claim Data Evidence Rationale Community Total

Density 2 3 2 1 0 8
Hydrate 1 3 0 0 0 4
Solutions 3 3 1 0 2 8
Mole ratio 3 3 3 3 2 14
Reactions 3 3 3 2 3 14

[RATIONALE] Our hypothesis was that if the percentages of water in our sample and one of
the provided hydrates matched, then they were the same substance. [CLAIM] At the
conclusion of the experiment, we determined that our unknown hydrate was BaCl22H2O.
This excerpt is remarkably succinct and while there were examples of student arguments that
were more detailed, this argument was typical of the sample. The first sentence sets up the basis for
a rationale but there is no follow through. The percentage of water that would be consistent with
the claim of barium chloride dehydrate was not included in the argument, nor was the experimental
value provided as evidence. In addition, this student, like most of the other participants in the
study, did not attempt to compare or contrast their findings with the findings of the other groups in
the second laboratory report.
Solutions Example
The third laboratory activity was a more challenging investigation for the students in
that it required them to integrate two techniques that were unfamiliar, spectroscopy, and
chromatography. The guiding question was, “How could I prepare more of this dye?”
In order to answer this question, students needed to identify the components of their solution
using paper chromatography and determine the concentration of these components using
spectroscopy. Most students struggled to transform the data from both sources into the evidence
they needed to support their answer to the research question. To illustrate this trend, consider the
argument below.
[CLAIM] Ultimately, by ascertaining the identity and concentrations of each dye, we were
able to identify the unknown purple dye as a mixture of two colors. [EVIDENCE] We
determined that the unknown purple dye contained 5.65E6 M blue dye B1 and
9.60E6 M red dye R3. [RATIONALE] By comparing our calculations, the chromatogram
illustrated that our solute distances of the blue B1 and red R3 dyes lined up with the unknown
purple dye. In order to make more of the unknown purple dye and solve the lab problem, one
could take the molarities of each dye and make a solution. To make 1 liter of solution, one
could take 5.65E6 moles of blue dye B1 and 9.60E6 moles of red dye R3 and add enough
water to make a 1 L solution. If more dye is required, the same ratios should be used with
increasing amount of solution. [COMMUNITY] In complying with good scientific practice,
we compared our results with those of the other group that had the same unknown. They
advised us that they performed the same process and also received the same results. Being
that they also determined the unknown purple dye to contain blue dye B1 and red dye R3 our
results are supported and we are more confident in our findings.
This excerpt was typical of most student arguments for the solutions investigation. As in this
case most students focused on one method, spectroscopy, or chromatography, but not both. The
identification of the B1 and R3 with chromatography was discussed, but the concentration values
were dropped into the argument with no discussion or basis. This student does go into detail on
actually preparing the solution, in contrast other students simply stated that a solution of the given
component concentrations could be prepared. This argument still lacks backing and the evidence
is incomplete. However, for the first time the student compared their results with another group
and in a relatively robust manner. Typically students made a general statement as to the
consistency of results and then inferred accuracy of their claim. This excerpt uses the group
comparison in a manner that demonstrates that the student is not looking for correctness but
support for a claim. Overall, this excerpt illustrates the transition from the traditional answer on a
laboratory report presented in hydrate example to an actual argument.
Mole Ratio Example

The mole ratio laboratory investigation was simpler in terms of technique and data
acquisition; however, the results were more ambiguous. The guiding question asks, “What is the
optimum mole ratio for a chemical reaction?” The students had to compare the volume of carbon
dioxide collected using varying molar amounts of sodium bicarbonate and acetic acid. The answer
was a 1:1 ratio, but the amount of carbon dioxide gas actually increased slightly when sodium
bicarbonate was in excess. The students had to come to grips with the idea that “more” was not
necessarily the “optimum” ratio. To illustrate the nature of the argument written by the students as
part of this investigation, consider the following example.
[CLAIM] In comparing our calculations, we determined the optimum mole ratio to be 1:1
because the smallest number of moles was used to receive the largest amount water
displacement from the production of carbon dioxide. [EVIDENCE] For the 1:1 ratio, only
2.10 g of sodium bicarbonate were used to produce 462 mL of carbon dioxide. The 3:1 and
2:1 ratios containing smaller masses of sodium bicarbonate produced much less water
displacement from the carbon dioxide. Although the 1:2 and 1:3 mole ratios containing
larger masses of sodium bicarbonate produce large amounts of water displacement from the
carbon dioxide, they were still smaller than the 1:1 ratio and much less relative to their given
masses. After the 1:1 ratio, the water displacement from the carbon dioxide products started
to decrease. [RATIONALE] The larger masses of sodium bicarbonate did not give the best
ratio because there was an excess amount of the substance leftover in the flask and all of the
acetic acid had reacted. This illustrated that the acetic acid was the limiting reactant in this
chemical reaction because it restricted the amount of product formed and was completely
consumed prior to the other reactant. [COMMUNITY] In complying with good scientific
practice, we compared our results with the other groups. All but one group determined the
optimum mole to mole ratio would also be 1:1. This overwhelming corresponding data
within the class confirms that our methods and results are sufficient and acceptable.
This excerpt, for the first time, contains all the elements of an argument. The student’s claim is
supported by evidence and a sound rationale. The rationale not only justifies the claim but the
evidence used for the claim as well. His comparison with the community was used to support
the claim and the methods as being “sufficient and acceptable.” Overall, this report was
representative of the peak observed in the linear trend for the written arguments.
This series of written arguments illustrates how the reports improved over time. This example
demonstrates how the arguments at the beginning of the semester were short and only provided an
answer to the question and some of the data that was collected during the investigation. This
student, like the others in the class, crafted an argument that did not provide genuine evidence,
provide a rationale for the evidence, or discuss the findings of other groups. The third report simply
answered the question repeatedly without providing evidence or rationale for the concentration
values. The fourth report is remarkable in comparison with the third report in the degree to which
the student evaluates the evidence and provides rationale for the claim. This seemingly abrupt
adapting of the elements of a scientific argument was noted in the oral argumentation as well.
There are two possible explanations for this observation; one is the nature of the task and the
second is the instructional model. The mole ratio investigation is elegant in its simplicity and is a
sharp contrast to the more complex solutions investigation that required integration of
spectroscopy and chromatography. On the other hand, the fourth report was submitted the seventh
week of the semester, following 28 hours of ADI based instruction that demanded students engage
in argumentation in multiple contexts. If the simplicity of the investigation was the determining
factor, then the hydrate investigation would have warranted a better argument. The more likely
explanation is that immersion of students in a classroom environment that provides instruction in

and promotes the principles of argumentation leads to student growth in written argument. The
student scores on the written arguments level off with the fourth and fifth reports. This is most
likely due to the quality of the guiding question in the fifth ADI investigation (Can you make 0.5 g
of product?) generating little reason for argument. There also, could be a plateau or ceiling in
student written argument in this context. This issue will have to be resolved with further research
following modification of the guiding question for the fifth investigation. The examples from the
written reports, however, demonstrate how the process of growth in written arguments can be
viewed as progression where students move from a simple answer combined with a data table, to a
robust answer supported with evidence and a sound rationale.
How does the quality of the collaborative scientific argumentation that occurs between
the undergraduate students during the investigation relate to the quality of the written
arguments produced by individual undergraduate students at the end of the investigation?
There was a significant positive correlation between scores on individual Written Argument
scores and the group Argument Generation scores (r ¼ 0.32, p ¼ 0.02). Given the similar
patterns for the written argument scores and the ASAC scores, the significant correlation between
the student written arguments and oral argumentation was predictable. This finding is consistent
with the general thesis that participation in oral argumentation can transfer to an individual’s
written argument. Other researchers attempting to connect oral and written discourse, however,
have found that written arguments are typically not as robust or complex as the oral argumentation
(Berland & McNeill, 2010). Berland and McNeill (2010) suggest that the presence of an audience
during an episode of argumentation forces students to develop rich, convincing arguments in
response to the questions and critiques of their peers. Therefore, when a student writes an
argument for his or her teacher, the written argument may become little more than a demonstration
of content understanding for the teacher. Berland and Reiser (2009), as a result, suggest that
different instructional support may be required to help students develop better skills in these two
contexts. Our results, however, indicate that students can make significant growth in both oral
argumentation and written argument when they participate in a series of investigations that place a
great deal of emphasis on the generation and evaluation of both written and oral arguments.
Limitations
In recognition of the limitations of a small N value, the results of this study should be
interpreted cautiously. That being said, sample size would be a greater concern in a direct
comparison study. In a repeated measures study, in contrast, including a smaller number of
participants is appropriate because multiple measures are taken over time. So while the N value
for each data source is approximately 30, the analyses required scoring 150 laboratory reports,
90 argumentation events, and 90 performance-based assessments. The participants in this study
were a diverse group that is representative of a general undergraduate population in terms of
gender, race, and academic ability; yet this sample also presented challenges for research in terms
of student accountability. Inconsistencies in attendance and assignment submission ultimately
reduced the data available and limited the scope of the analyses. Given larger N values, a
microanalysis of the data might have provided more insight regarding how different subgroups
(gender, race, or GPA) benefit from this type of instruction. Analysis of the components of the
ASAC and the written argument would have been informative, but were precluded by the sample
size. Future research might be able to address these issues. Finally, the same instructor taught both
laboratory sections. While the use of the single instructor minimized variation between the two
sections, it also reduced the generalizability of the results.
While this study did not have a comparison group, previous research (Walker et al., 2012) did
directly compare the ADI instructional method with more traditional laboratory instruction. The
results of that study showed a significant difference between the comparison group and ADI group
in ability to use evidence and a rationale of the evidence to support a claim. While a comparison
group would have been valuable for assessing directly the degree to which the ADI instructional
model contributed to the observed growth, it was not critical in this study because we were
interested in looking at growth over time. In addition, undergraduate laboratories are typically not
designed with argumentation opportunities like those in ADI so it would have been difficult to
identify legitimate comparison scenarios. Students learning the expectations of the model and
simply adapting to the requirements could also help explain the observed growth. However, if
students only needed to learn the expectation of the course in order to do well, then there should
have been greater improvements in student performance at the beginning of the semester and less
at the end. Despite becoming comfortable with this type of instruction and gaining important
content knowledge over time, students still struggled to justify a claim with sound evidence and an
adequate rationale for the evidence. It is therefore important to examine how students learn how to
craft a high quality argument and participate productively in an episode of scientific argumenta-
tion over extended periods of time.
Implications
It was the intent of this research to add to the growing body of information that demonstrates
the importance and value of instructional models that engage students in argument and critique,
not only for understanding of science content but for developing science literacy as well
(Cavagnetto, 2010; Sampson & Clark, 2008, 2009). Although there has been much research on
argumentation in science education over the last decade, little has been situated at the post-
secondary level. There is primary school research (Reznitskaya et al., 2001) a large body of middle
school research (Berland & McNeill, 2010; Osborne et al., 2004) and some high school research
(Jimenez-Aleixandre, Bugallo Rodrı́guez, & Duschl, 2000; Sampson & Clark, 2009). In addition,
the research that is available is often limited to pre-post assessment designs that measure absolute
change, but do not examine the process of change. This is due, to some extent, to the limited time
frame involved, with most interventions lasting a few days or weeks. While these studies have
found significant differences between a control and treatment group on the outcome of interest or
documented improvement in student performance as measured by pre–post intervention assess-
ments (Nussbaum, Sinatra, & Poliquin, 2008; Osborne et al., 2004; Walker et al., 2012), there is
less information about how students’ argumentation skills develop over time. This study, with its
focus on student growth over a ten week summer term and its use of multiple measures, provided a
number of insights into the ways undergraduate students craft scientific arguments and participate
in scientific argumentation and the conditions that can influence what they do and what they learn.
In addition, we were able to tease out some specific consequences of the inquiry investigations
conducted in this study both in terms of student engagement and individual products.
An overarching outcome of this study is a response to the on-going debate regarding the best
way to provide instruction in argumentation. Some science educators, such as Simon, Erduran,
and Osborne (2006), have designed sets of activities that can help people learn what counts as an
argument in science and how to participate in scientific argumentation. These activities, however,
may or may not be connected to the content of a course. Other science educators, such as Sandoval
and Reiser (2004), have attempted to make argumentation a more central focus during inquiry-
based instruction so students can learn how to craft an argument and participate in argumentation
over time while they learn content and develop their inquiry skills. The significant growth in all
three measures used in this research suggests that an integrated approach to laboratory instruction,
such as ADI, can foster the development of content knowledge and argumentation skills at the
same time. From this position, argument and argumentation are viewed as learning tools and there
is no separation between these processes and the learning of content. When this type of approach is
used, “students become active producers of justified knowledge claims and efficient critics of
others’ claims” (Jimenez-Aleixandre, 2008) and students can develop new criteria for supporting
and evaluating claims in the context of science.
While there is research on student engagement in collaborative oral argumentation (Ryu &
Sandoval, 2008; Sampson & Clark, 2011) and the quality of students’ written artifacts following
an intervention (McNeill & Krajcik, 2007; Sampson & Clark, 2009; Sampson & Walker, 2012),
less attention has been paid to the connection between the process and the product. This study
demonstrates that there is a relationship between the quality of the argumentation that takes place
within a group and the quality of the argument that the individual students write on their own. This
significant correlation coupled with the significant growth in individual scores on the
performance-based assessment suggest that when students participate in high quality collabora-
tive argumentation they also write better arguments in the context of science. The potential
influence of the process of collaborative argumentation on the individual product as indicated by
this research suggests that instructional models, such as ADI, that provide opportunities for
students to engage in oral argumentation can be beneficial for helping students learn to construct
better arguments.
The graphical representation of the ASAC scores and the written argument score reveal
parallel fluctuations in student performance. The oral argumentation and written argument scores
in the first laboratory investigation were much higher than expected and in the last investigation,
the argumentation that took place between the students was minimal resulting in a drop in scores.
Berland and Hammer (2012) would interpret these results in terms of framing, which they describe
as specific attention to student’s understanding of the purpose of the investigation. They describe
framing as a dynamic, on-going process influenced by student’s interpretation of the activity and
by instructor signals either explicit or implicit. Interpretation of our data in terms of framing
provides an explanation for the unexpected fluctuations in the linear growth. The first laboratory
investigation was framed such that the students focused on the influence of the tools they used to
collect data rather than the content of their claims. The nature of the investigation and students’
familiarity with a specific laboratory technique, as a result, influenced the type of argumentation
observed during the investigation and resulted in method-focused argumentation rather than
claim-focused argumentation. In the final investigation, students did not recognize the
opportunities for argumentation intended by the investigation because the framing for the
investigation was insufficient. Students, as a result, appeared to be just “doing the lesson” rather
than “doing science” (Jimenez-Aleixandre et al., 2000) because all the groups had the same
answer to the guiding question, which resulted in little discussion about the evidence that was used
to support it.
Our findings, therefore, suggest that the materials supplied to students during an investigation
and how the investigation is framed can influence the focus of the argumentation that takes place
between students inside the classroom. If the goal is to encourage students to generate data that
they can use to develop a claim, support it, and evaluate the claims of others, then the tools
presented for data collection and the techniques they use to collect the data should be familiar
ones. This type of approach will result in discussions that are focused on the content of the claim
and how well the claim fits with the available evidence. On the other hand, if the goal is to
encourage students to question data collection methods, then in might be better to provide students
with unfamiliar tools and techniques for them to use during an investigation. This type of approach
will result in discussions that are more focused on the collection and analysis of data. Although
neither approach is inherently better than the other in terms of promoting and supporting student
engagement in argumentation, science educators must be aware of how the context of the task can
influence the focus of an episode of argumentation.
Ford (2010) advocates that students be made aware of the kind of disciplinary critique that is
typically brought to bear on knowledge claims in science, and that they be able to actively
participate in such critique. However, as has been noted by others (Berland & Reiser, 2009),
students can be uncomfortable evaluating and critique the claims of their peers and when they do,
they rarely use criteria that are valued in science. Berland and Reiser (2009), for example, suggest
that students often find the critique and evaluation of scientific arguments difficult at first because:
The act of persuading carries with it the social expectations that individuals have an
audience for their arguments, that this audience will interact with the presenters to determine
whether they are persuaded of one another’s ideas, and that the audience may not have had
the same experiences and beliefs as the authors of the argument (p. 49).
The results of this study support this conjecture. Our findings indicate that at the beginning of
the semester, the nature of the argumentation that took place between the undergraduates as they
attempted to develop their initial argument was of higher quality than the nature of the
argumentation that took place when the students were evaluating the arguments of the other
groups. By the third investigation of the semester, however, the quality of the argumentation that
took place between the students in these two contexts was equivalent and by the end of the
semester the students were actually engaging in higher quality argumentation when evaluating the
arguments of their peers (see Figure 7).
These types of interactions between students tend to be rare in traditional classrooms and
students, as a result, never learn how to be comfortable with critique and evaluation. In order to
affect change in the social norms of the classroom students need time to learn how to critique and
evaluate claims using criteria valued in science and teachers must encourage them to use these
criteria when needed. One lesson, in other words, will not be enough to help students become more
comfortable with this type of interaction; instructors will need to make the evaluation and critique
of explanations and arguments a cornerstone of learning environment. The results of this study
suggest that indeed that it can take multiple experiences of participation in critique and evaluation
before the students embrace the practice. Our findings also suggest that the use of cultural tools,
such as using a whiteboard to share arguments during the argumentation session and the use of
specific criteria to evaluate claims, may help promote and support growth in student ability to
evaluate what counts as quality evidence in support of a claim. As the students became more
familiar with “what counts in science,” their ability to participate in scientific argumentation
seems to improve.
While there is much discussion regarding the best way to foster development and
understanding of scientific argument, the results of this study suggest that students need to engage
in argumentation in order to develop a better understanding of the content (i.e., arguing to learn)
but also students need to learn what counts as an argument and argumentation in science
(i.e., learning to argue) as part of the process. The ADI instructional model is well aligned with this
more integrated perspective. These findings, therefore, demonstrate that science instructors can,
accomplish both tasks (i.e., arguing to learn by learning to argue) at the same time.
Notes
1
See the Supporting Information section for the condensed ASAC.
2
See the Supporting Information section for the scoring rubric.
3
The partial eta squared (h2) statistic ranges in value from 0 to 1. A 0 indicates no relationship
between the repeated measures factor and the dependent variable, whereas a 1 indicates the
strongest possible realtionship. Cohen’s d values of 0.2 are characterized as small, 0.5 are
moderate, and 0.8 are large.
References
Abrams, E., Southerland, S. A., & Evans, C. (2008). Inquiry in the classroom: Realities and
opportunities. In E. Abrams, S. A. Southerland, & P. Silva (Eds.), An introduction to inquiry (pp. i–xiii).
Greenwich, CT: Information Age Publishing.
Baker, M. J. (1999). Argumentation and constructive interactions. In P. Coirer & J. Andriessen (Eds.),
Foundations of argumentative text processing (pp. 179–202). Amsterdam: University of Amsterdam Press.
Bell, P., & Linn, M. C. (2000). Scientific arguments as learning artifacts: Designing for learning from the
web with KIE. International Journal of Science Education, 22(8), 797–818.
Bell, R. L., Smetana, L., & Binns, I. (2005). Simplifying inquiry instruction: Assessing the inquiry level
of classroom activities. The Science Teacher, 72(7), 30–33.
Berland, L. K., & Hammer, D. (2012). Framing for scientific argumentation. Journal of Research in
Science Teaching, 49(1), 68–94.
Berland, L. K., & McNeill, K. L. (2010). A learning progression for scientific argumentation:
Understanding student work and designing supportive instructional contexts. Science Education, 94(5),
765–793.
Berland, L. K., & Reiser, B. (2009). Making sense of argumentation and explanation. Science
Education, 93(1), 26–55.
Bransford, J., Brown, A., & Cocking, R. (1999). How people learn: Brain, mind, experience and school.
Washington, DC: National Academy of Science Press.
Cavagnetto, A. R. (2010). Argument to Foster Scientific Literacy: A Review of Argument Interventions
in K-12 Science Contexts. Review of Educational Research, 80(3), 336–371.
Colburn, A. (2004). Focusing labs on the nature of science. The Science Teacher, 71(9), 32–35.
Cooper, M. (1994). Cooperative chemistry laboratories. Journal of Chemical Education, 71(4),
307–311.
Cooper, M., & Kerns, T. (2006). Changing the laboratory: Effects of a laboratory course on students
attitudes and perceptions. Journal of Chemical Education, 83, 1356–1361.
Cortina, J. M., & Nouri, H. (2000). Effect size for ANOVA designs. Thousand Oaks, CA: SAGE
Publications, Inc.
Domin, D. (1999). A review of laboratory instruction styles. Journal of Chemical Education, 76(4),
543–547.
Doran, R., Chan, F., & Tamir, P. (1998). Science educators guide to assessment. Arlington, VA: National
Science Teachers Association.
Driver, R., Asoko, H., Leach, J., Mortimer, E., & Scott, P. (1994). Constructing scientific knowledge in
the classroom. Educational Researcher, 23(7), 5–12.
Driver, R., Newton, P., & Osborne, J. (2000). Establishing the norms of scientific argumentation in
classrooms. Science Education, 84(3), 287–313.
Duschl, R. (2008). Science education in three-part harmony: Balancing conceptual, epistemic, and
social learning goals. Review of Research in Education, 32, 268–291.
Duschl, R., Schweingruber, H., & Shouse, A. (Eds.). (2007). Taking science to school: Learning and
teaching science in grades K-8. Washington, DC: National Academies Press.
Erduran, S., Simon, S., & Osborne, J. (2004). Tapping into argumentation: Developments in the
application of Toulmin’s argument pattern for studying science discourse. Science Education, 88, 915–933.
Farrell, J., Moog, R., & Spencer, J. (1999). A guided inquiry general chemistry course. Journal of
Chemical Education, 76(4), 570–574.
Ford, M. J. (2008). Disciplinary authority and accountability in scientific practice and learning. Science
Education, 92(3), 404–423.

Ford, M. J. (2010). Critique in academic disciplines and active learning of academic content. Cambridge
Journal of Education, 40(3), 265–280.
Gosser, D., & Roth, V. (1998). The workshop chemistry project: Peer-led team learning. Journal of
Chemical Education, 75(2), 185–187.
Halyard, R. A. (1993). Introductory science courses: The SCST position statement. Journal of Research
in Science Teaching, 23(1), 29–31.
Hodson, D. (2008). In Towards scientific literacy: A teachers’ guide to history, philosophy and sociology
of science Rotterdam. The Netherlands: Sense Publishers.
Hofstein, A., & Lunetta, V. (2004). The laboratory in science education: Foundations for the twenty-first
century. Science Education, 88(1), 28–54.
Jimenez-Aleixandre, M. P. (2008). Designing argumentation learning environments. In S. Erduran & M.
P. Jimenez-Aleixandre (Eds.), Argumentation in science education: Perspectives from classroom-based
research (pp. 91–115). Dordrecht: Springer Academic Publishers.
Jimenez-Aleixandre, M. P., Bugallo Rodrı́guez, A., & Duschl, R. A. (2000). ‘Doing the lesson’ or ‘doing
science’: Argument in high school genetics. Science Education, 84(6), 757–792.
Kelly, G. J., & Bazerman, C. (2003). How students argue scientific claims: A rhetorical-semantic
analysis. Applied Linguistics, 24(1), 28–55.
Kelly, G. J., & Takao, A. (2002). Epistemic levels in argument: An analysis of university oceanography
students’ use of evidence in writing. Science Education, 86(3), 314–342.
Klahr, D., Dunbar, K., & Fay, A. L. (1990). Designing good experiments to test bad hypotheses.
In J. Shrager & P. Langley (Eds.), Computational models of scientific discovery and theory formation
(pp. 355–401). San Mateo, CA: Morgan Kaufman.
Krajcik, J. S., & Sutherland, L. M. (2010). Supporting students in developing literacy in science.
Science, 328, 456–459.
Kuhn, L., & Reiser, B., (2005). Students constructing and defending evidence-based scientific
explanations. Paper presented at the annual meeting of the National Association for Research in Science
Teaching, Dallas, TX.
Landis, J., & Koch, G. (1977). The measurement of observer agreement for categorical data. Biometrics,
33, 159–174.
Lewis, S., & Lewis, J. (2005). Departing from lectures: An evaluation of a Peer-Led Guided Inquiry
alternative. Journal of Chemical Education, 82, 135.
Lewis, S., & Lewis, J. (2008). Seeking effectiveness and equity in a large college chemistry course: An
HLM investigation of peer-lead guided inquiry. Journal of Research in Science Teaching, 45(7), 794–811.
Linn, M. C., & Burbules, N. C. (1993). Construction of knowledge and group learning. In K. Tobin,
(Ed.), The practice of constructivism in science education (pp. 91–119). Hillsdale, NJ: Lawrence Erlbaum
Associates, Publishers.
McNeill, K. L., & Krajcik, J., (2007). Middle school students’ use of appropriate and inappropriate
evidence in writing scientific explanations. In M. Lovett, & P. Shah (Eds.), Thinking with data: The
proceedings of 33rd Carnegie Symposium on Cognition (pp. 233–266). Mahwah, NJ: Lawrence Erlbaum
Associates, Inc.
McNeill, K. L., Lizotte, D. J., Krajcik, J., & Marx, R. W. (2006). Supporting students’ construction of
scientific explanations by fading scaffolds in instructional materials. The Journal of the Learning Sciences,
15(2), 153–191.
Metz, K. E. (2000). Young children’s inquiry in biology: Building the knowledge bases to empower
independent inquiry. In J. Minstrell & E. H. Van Zee (Eds.), Inquiry into inquiry learning and teaching in
science (pp. 371–404). Washington, DC: American Association for the Advancement of Science.
National Research Council. (1996). National science education standards. Washington, DC: National
Academy Press.
National Research Council. (1999). Transforming undergraduate education in science, mathematics,
engineering, and technology. Washington, DC: National Academy Press.

National Research Council. (2000). Inquiry and the national science education standards. Washington,
DC: National Academy Press.
National Research Council. (2005). America’s Lab Report: Investigations in High School Science.
Washington, DC: National Academy Press.
National Science Foundation, (1996). Shaping the future: Strategies for revitalizing undergraduate
education. Paper presented at the National Working Conference, Washington, DC.
Nussbaum, M., Sinatra, G., & Poliquin, A. (2008). Role of the epistemic beliefs and scientific
argumentation in science learning. International Journal of Science Education, 30(15), 1977–1999.
Osborne, J. (2010). Arguing to learn in science: The role of collaborative, critical discourse. Science,
328(5997), 463–466.
Osborne, J., Erduran, S., & Simon, S. (2004). Enhancing the quality of argumentation in science
classrooms. Journal of Research in Science Teaching, 41(10), 994–1020.
Piaget, J. (1964; 2003). Development in learning. Journal of Research in Science Teaching, 40(S),
S8–S18.
Poock, J., Burke, K., Greenbowe, T., & Hand, B. (2007). Using the science writing heuristic in the
general chemistry laboratory to improve students academic performance. Journal of Chemical Education, 84
(8), 1371–1378.
Resnick, L. B., Salmon, M., Zeitz, C. M., Wathen, S. H., & Holowchak, M. (1993). Reasoning in
conversation. Cognition and Instruction, 11(3 & 4), 347–364.
Reznitskaya, A., Anderson, R. C., McNurlen, B., Nguyen-Jahiel, K., Archodidou, A., & Kim, S.-Y.
(2001). Influence of oral discussion on written argument. Discourse Processes, 32(2&3), 155–175.
Rudd, J., Greenbowe, T., Hand, B., & Legg, M. (2007). Using the Science Writing Heuristic to improve
students understanding of general equilibrium. Journal of Chemical Education, 84(12), 2007–2011.
Ryu, S., & Sandoval, W. A. (2008). Interpersonal influences on collaborative argumentation during
scientific inquiry, Annual Meeting of the American. New York, NY: Educational Research Association.
Sampson, V., & Clark, D. (2008). Differences in the ways more and less successful groups engage in
argumentation: A case study. Annual International Conference of the National Association of Research in
Science Teaching (NARST), Baltimore, MD.
Sampson, V., & Clark, D. (2009). The effect of collaboration on the outcomes of argumentation. Science
Education, 93(3), 448–484.
Sampson, V., & Clark, D. (2011). A comparison of the collaborative scientific argumentation practices
of two high and two low performing groups. Research in Science Education, 41(1), 63S–97S.
Sampson, V., Enderle, P. J., & Walker, J. P. (2011). The development and validation of the Assessment of
Scientific Argumentation in the Classroom (ASAC) observation protocol: A tool for evaluating how students
participate in scientific argumentation. In M. Kline (Ed.), Perspectives in scientific argumentation: Theory,
practice and research (pp. 235–264). New York: Springer.
Sampson, V., & Walker, J. (2012). Learning to write in the undergraduate chemistry laboratory:
The impact of Argument-Driven Inquiry. International Journal of Science Education, 34(10), 1443–1485.
Sandoval, W., & Millwood, K. (2005). The quality of students’ use of evidence in written scientific
explanations. Cognition and Instruction, 23(1), 23–55.
Sandoval, W. A., & Reiser, B. J. (2004). Explanation driven inquiry: Integrating conceptual and
epistemic scaffolds for scientific inquiry. Science Education, 88(3), 345–372.
Simon, S., Erduran, S., & Osborne, J. (2006). Learning to teach argumentation: Research and
development in the science classroom. International Journal of Science Education, 28(2&3), 235–260.
Sunal, D. W., Wright, E. L., & Day, J. B. (Eds.). (2004). Reform in undergraduate science teaching for
the 21st century. Greenwich, CT: Information Age Publishing.
Tobin, K. (Ed.). (1993). The practice of constructivism in science education. Hillsdale, NJ: Lawrence
Erlbaum Associates, Publishers.
von Glasersfeld. (1989). Cognition, construction of knowledge, and teaching. Synthese, 80(1),
121–140.

Vygotsky, L. S. (1978). Mind in society. The development of higher psychological processes.

Cambridge: Harvard University Press.
Walker, J., Sampson, V., Grooms, J., & Zimmerman, C. (2011). A performance-based assessment for
limiting reactants. Journal of Chemical Education, 88(8), 1243–1246.
Walker, J., Sampson, V., Grooms, J., Zimmerman, C., & Anderson, B. (2012). Argument-driven inquiry
in undergraduate chemistry labs: The impact on students’ conceptual understanding, argument skills, and
attitudes toward science. Journal of College Science Teaching, 41(4), 74–81.
Walker, J., Sampson, V., & Zimmerman, C. (2011). Argument-driven inquiry: An introduction to a new
instructional model for use in undergraduate chemistry labs. Journal of Chemical Education, 88(10), 1048–
1056.
Wallace, C., Hand, B., & Yang, E.-M. (2005). The science writing heuristic: Using writing as a
tool for learning in the laboratory. In W. Saul (Ed.), Crossing borders in literacy and science instruction
(pp. 375–398). Arlington, VA: NSTA Press.
Zeidler, D. L. (1997). The central role of fallacious thinking in science education. Science Education, 81
(4), 483–496.

10 1002@tea 21082

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

10 1002@tea 21082

Uploaded by

Copyright:

Available Formats

JOURNAL OF RESEARCH IN SCIENCE TEACHING VOL. 50, NO. 5, PP.

Learning to Argue and Arguing to Learn: Argument-Driven Inquiry as a Way to

Received 13 September 2011; Accepted 25 January 2013

# 2013 Wiley Periodicals, Inc.

Figure 1. The argument framework used in this study.

Figure 2. The seven steps of ADI.

(1) How does the quality of undergraduate students’ performance on an argument-based

Note: Minority means Black, Hispanic, Asian or Multiracial.

Journal of Research in Science Teaching

Week Activities Data Collected

Journal of Research in Science Teaching

Journal of Research in Science Teaching

The Performance Task

Figure 3. Answer for one flask on the mid-term performance task.

Journal of Research in Science Teaching

Assessment of Scientific Argumentation in the Classroom (ASAC)

Journal of Research in Science Teaching

Figure 4. Example of written argument scoring.

Performance Task Mean Score (SD)

Journal of Research in Science Teaching

Figure 5. Performance task scores.

Component Event Mean Difference t p d

Journal of Research in Science Teaching

Figure 6. Scores on each component of the performance task.

Is the Limiting Reactant HC2H3O2 or NaHCO3 or Neither? Neither

Is the Limiting Reactant HC2H3O2 or NaHCO3 or Neither? HC2H3O2

Journal of Research in Science Teaching

Is the Limiting Reactant HC2H3O2 or NaHCO3 or Neither? HC2H3O2

Journal of Research in Science Teaching

Student knowledge and understanding of chemical reactions could provide an additional or

Mean Scores (SD)

Journal of Research in Science Teaching

Figure 8. Written argument scores.

T1: Yeah, yeah, yeah, we got that, did that.

Journal of Research in Science Teaching

Journal of Research in Science Teaching

T1: (looks at board) Yeah, we got the same thing.

Journal of Research in Science Teaching

Investigation Claim Data Evidence Rationale Community Total

Journal of Research in Science Teaching

Mole Ratio Example

explanation is that immersion of students in a classroom environment that provides instruction in

Journal of Research in Science Teaching

Journal of Research in Science Teaching

Journal of Research in Science Teaching

Vygotsky, L. S. (1978). Mind in society. The development of higher psychological processes.

Journal of Research in Science Teaching

You might also like