Scientific Abilities and Their Assessment

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

PHYSICAL REVIEW SPECIAL TOPICS - PHYSICS EDUCATION RESEARCH 2, 020103 共2006兲

Scientific abilities and their assessment

Eugenia Etkina,* Alan Van Heuvelen, Suzanne White-Brahmia, David T. Brookes, Michael Gentile, Sahana Murthy,
David Rosengrant,* and Aaron Warren
Department of Physics and Astronomy, Rutgers, The State University of New Jersey, Piscataway, New Jersey 08854-8019, USA
共Received 20 December 2005; published 1 August 2006兲
The paper introduces a set of formative assessment tasks and rubrics that were developed for use in an
introductory physics instruction to help students acquire and self-assess various scientific process abilities. We
will describe the rubrics, tasks, and the student outcomes in courses where the tasks and rubrics were used.

DOI: 10.1103/PhysRevSTPER.2.020103 PACS number共s兲: 01.40.Fk, 01.50.Qb

I. INTRODUCTION whether students can design and conduct investigations,


communicate, and defend scientific argument, etc. This does
Several hundred thousand science and engineering stu- not mean that such instruments do not exist. More than
dents take introductory physics courses each year. What are 60 years ago, Kruglak and colleagues developed perfor-
the goals of these courses? In most courses the goals are to mance tests to evaluate student achievement in introductory
help students acquire a conceptual and quantitative under- college physics courses at the University of Minnesota.
standing of major physics principles and the ability to use These tests involved the use of laboratory equipment to mea-
this understanding for problem solving. In addition to these sure different aspects of experimentation such as control of
goals, we argue that our introductory courses 共and, in fact, all variables, selection of an appropriate method to answer an
courses in physics兲 should help students develop other abili- experimental question, and analysis and interpretation of ex-
ties that will be useful in their future work. According to perimental data.14–17 Later, items similar to Kruglak’s ques-
many studies our students after leaving academia will be tions but adapted to the paper-and-pencil environment ap-
asked to solve complex problems, design experiments, and peared in national 共NAEP兲 and international 共TIMSS-R兲
work with other people.1–4 Several documents guiding the science tests for middle school and high school students.18,19
K–16 program design and evaluation incorporate the devel- These tests show the importance of achieving science pro-
opment of these abilities as primary goals. National and in- cess goals. Although a valuable resource, the performance
ternational science tests at the 4–12th level have items as- tests and paper-and-pencil items on the written tests are sum-
sessing how students achieve these goals 共reference to mative in nature; they assess the results of learning and do
NAEP, TIMSS and the UK tests兲. Another example here is not help students in the process of learning.
the new accreditation requirements for engineering colleges.5 In this paper we describe a set of the tasks and formative
As opposed to the old checklist of courses taken, engineering assessment instruments that can be used to help achieve
colleges must now show that their students have acquired “science-process” goals, as formulated by the National Re-
various abilities that are important in the practice of science search Council, the National Science Foundation, ABET, and
and engineering. The National Science Standards6 suggest others. We also describe the results of using these tasks and
that students need to learn to 共i兲 identify questions and con- instruments in introductory physics courses whose curricu-
cepts that guide scientific investigation; 共ii兲 design and con- lum was designed to specifically achieve these goals in ad-
duct scientific investigations; 共iii兲 use technology and math- dition to the traditional goals of a physics course.
ematics to improve investigations and communications; 共iv兲
formulate and revise scientific explanations and models us-
ing logic and evidence; 共v兲 recognize and analyze alternative II. DEFINING SCIENTIFIC ABILITIES
explanations and models; and 共vi兲 communicate and defend a
scientific argument. We use the term “scientific abilities” to describe some of
Today even the most reformed introductory physics cur- the most important procedures, processes, and methods that
ricula do not focus explicitly on developing these abilities scientists use when constructing knowledge and when solv-
and more importantly on assessing them. The physics educa- ing experimental problems. We use the term scientific abili-
tion research 共PER兲 community uses summative assessment ties instead of science-process skills to underscore that these
instruments that tell us whether students mastered the con- are not automatic skills, but are instead processes that stu-
cepts of Newton’s laws, thermodynamics, electricity and dents need to use reflectively and critically.20 The list of sci-
magnetism, and so on. Physics by Inquiry, Workshop Phys- entific abilities that our physics education research group de-
ics, Interactive Lecture Demonstrations, and the Washington veloped includes 共A兲 the ability to represent physical
tutorials7–12 use a formative assessment of student learning in processes in multiple ways; 共B兲 the ability to devise and test
the process of learning, but their focus is also mostly on a qualitative explanation or quantitative relationship; 共C兲 the
conceptual understanding. Some reformed curricula such as ability to modify a qualitative explanation or quantitative
SCALE-UP 共Ref. 13兲 have recognized some of these goals relationship; 共D兲 the ability to design an experimental inves-
and implemented strategies to achieve them. However, in the tigation; 共E兲 the ability to collect and analyze data; 共F兲 the
PER community, there are no instruments that assess ability to evaluate experimental predictions and outcomes,

1554-9178/2006/2共2兲/020103共15兲 020103-1 ©2006 The American Physical Society


ETKINA et al. PHYS. REV. ST PHYS. EDUC. RES. 2, 020103 共2006兲

conceptual claims, problem solutions, and models, and 共G兲 brics to guide their work. Rubrics are descriptive scoring
the ability to communicate. schemes that are developed by teachers or other evaluators to
This list is based on the analysis of the history of the guide students’ efforts.29 This activity led to a fine-tuning of
practice of physics,21–23 the taxonomy of cognitive skills,24,25 the abilities, that is, to break each ability into smaller sub-
recommendations of science educators,26 and an analysis of abilities that could be assessed. For example, for the ability
science-process test items.19 to collect and analyze data we identified the following sub-
To help students develop these abilities, one needs to en-
abilities: 共i兲 the ability to identify sources of experimental
gage students in appropriate activities, and to find ways to
uncertainty, 共ii兲 the ability to evaluate how experimental un-
assess students’ performance on these tasks and to provide
timely feedback. Activities that incorporate feedback to the certainties might affect the data, 共iii兲 the ability to minimize
students are called formative assessment activities. As de- experimental uncertainty, 共iv兲 the ability to record and repre-
fined by Black and Wiliam, formative assessment activities sent data in a meaningful way, and 共v兲 the ability to analyze
are “all those activities undertaken by teachers, and by their data appropriately.
students in assessing themselves, which provide information Each item in the rubrics that we developed corresponded
to be used as feedback to modify the teaching and learning to one of the subabilities. We agreed on a scale of 0–3 in the
activities in which they are engaged.”27 These authors re- scoring rubrics to describe student work 共0, missing; 1, inad-
viewed 580 articles and found that learning gains produced equate; 2, needs some improvement; and 3, adequate兲 and
by effective use of formative assessment are larger than those devised descriptions of student work that could merit a par-
found for any other educational intervention 共effect sizes of ticular score. For example, for the subability “to record and
0.4–0.7兲. Black and Wiliam also found that self-assessment represent data in a meaningful way” a score of 0 means that
during formative assessment is more powerful than the data are either missing or incomprehensible, a score of 1
instructor-provided feedback; meaning the individual, small- means that some important data are missing, a score of 2
group, and large-group feedback system enhances learning means that all important data are present but recorded in a
more than instructor-guided feedback. Sadler28 suggested way that requires some effort to comprehend, and a score of
three guiding principles, stated in the form of questions, that 3 means that all important data are present, organized, and
students and instructors need to address in order to make recorded clearly.
formative assessment successful: 共1兲 Where are you trying to Simultaneously, while refining the list of abilities, we
go? 共Identify and communicate the learning and performance started devising activities that students could perform in
goals.兲 共2兲 Where are you now? 共Assess, or help the student recitations and laboratories. Defining subabilities and
to self-assess, current levels of understanding.兲 共3兲 How can developing rubrics to assess them informed the writing of
you get there? 共Help the student with strategies and skills to these activities. After we developed the rubrics, we started
reach the goal.兲 using them to score samples of student work. Each person in
As noted above, students need to understand the target our nine-person group 共eight coauthors and one member who
concept or ability that they are expected to acquire and the left before the project was completed兲 assigned a score to a
criteria for good work relative to that concept or ability. They given sample using a particular rubric; we then assembled
need to assess their own efforts in light of the criteria. Fi- all the scores in a table and discussed the items in the rubrics
nally, they need to share responsibility for taking action in where the discrepancy was large 共See Table I for an ex-
light of the feedback. The quality of the feedback rather than ample兲.
its existence or absence is a central point. The feedback Based on these discussions we revised the wording of the
should be descriptive and criterion-based as opposed to nu- rubrics and tested them by scoring another sample of student
merical scoring or letter grades without clear criteria. With work. This process was iterated until we achieved 80% or
all the constraints of modern teaching, including large- higher agreement among our scores.
enrollment classes and untrained teaching assistants, how In the sections below we list scientific abilities and corre-
can one make formative assessment and self-assessment pos- sponding subabilities that we identified, provide examples of
sible? scoring rubrics that we devised, and discuss where in the
One way to implement formative assessment and self- instructional process we use the rubrics. For each scientific
assessment is to use assessment rubrics. An assessment ru- ability, we provide examples of the tasks written for the stu-
bric is one of the ways to help students see the learning and dents. Often the tasks target several abilities. In subsequent
performance goals, self-assess their work, and modify it to sections we will report on how we used the rubrics to study
achieve the goals 共three guiding principles as defined by Sa- students’ acquisition of some of the suggested abilities.
dler above兲. The rubrics contain descriptions of different lev-
els of performance, including the target level. A student or a A. Ability to represent physical processes in multiple
group of students can use the rubric to help self-assess her or ways
their own work. An instructor can use the rubric to evaluate
students’ responses and to provide feedback. While constructing and using knowledge, scientists often
represent the knowledge in different ways, check for consis-
III. FINE-TUNING SCIENTIFIC ABILITIES AND
tency of the representations, and use one representation to
DEVISING RUBRICS TO ASSESS THEM
help construct another.30,31 For example, in the 1950s Feyn-
After making the list of scientific abilities that we wanted man diagrams helped quantum electrodynamics move for-
our students to develop, we started devising assessment ru- ward somewhat more rapidly by providing a more visual and

020103-2
SCIENTIFIC ABILITIES AND THEIR ASSESSMENT PHYS. REV. ST PHYS. EDUC. RES. 2, 020103 共2006兲

TABLE I. Scoring student work using the rubrics. For each student write-up, nine people assigned scores based on the descriptors of a
particular subability in a rubric. The rubric descriptors with large discrepancies 共more than 1 score point兲 were discussed and revised. In this
case the largest discrepancy was in the assessment scores for the subability to represent data in a meaningful way.

Score assigned for the


Score assigned for the demonstrated ability Score assigned for the
demonstrated ability to evaluate how demonstrated ability to Score assigned for
A member of the to identify sources experimental uncertain- record and represent the demonstrated
group scoring of experimental ties might affect the data in a meaningful ability to analyze
student work uncertainty data way data appropriately

Alan 2 1 3 2
Sahana 2 1 2 2
David 2 2 2 2
Michael 3 2 1 3
Dave 2 2 3 2
Jen 2 2 2 2
Aaron 2 2 2 2
Suzanne 2 2 3 2
Marina 2 2 1 2

understandable representation of a scattering process. Rules In addition to such subabilities that students need to mas-
were also developed for converting these diagrams into com- ter while using multiple representations, there are specific
plicated scattering cross section equations. Such qualitative subabilities needed for each type of representation. For ex-
representations, particularly diagrammatic or in some cases ample, to use free-body diagrams 共FBDs兲 productively for
graphical representations, help physicists reason qualitatively problem solving, students must learn to: 共i兲 Choose a system
about physical processes and to see patterns in data without of interest before drawing the diagram; 共ii兲 use force arrows
engaging in difficult mathematical calculations. to represent the interactions of the external world with the
In our introductory physics courses students are often system object or objects; 共iii兲 label the force arrows with two
given a verbal description of a physical process and a prob- subscripts 共for example, the force exerted by the Earth on the

lem to solve relative to that process. They can start their object is labeled as F E on O兲; 共iv兲 try to make the relative
analysis by constructing a sketch to represent the process and lengths of force arrows consistent with the problem situation
include in the sketch the known information provided in the 共the net force should be in the same direction as the system
problem statement. They construct more physical representa- object’s acceleration兲; 共v兲 include labeled axes on the dia-
tions that are still relatively easy to understand—for ex- gram.
ample, motion diagrams, free-body diagrams, qualitative Such diagrams, if drawn correctly, can be used to help
work-energy and impulse-momentum bar charts, ray dia- write Newton’s second law in a component form; to repre-
grams, and so forth. Finally, they use these physical repre- sent the situation mathematically. Based on these consider-
sentations to help construct a mathematical representation of ations we constructed a rubric to help students self-assess
the process. themselves while drawing FBDs 共see Table II兲.
What subabilities help to make this multiple representa- How can a student use this rubric for self-assessment?
tion strategy productive for reasoning and problem solving? After she draws a free-body diagram she can use the rubric
共i兲 The ability to correctly extract information from a repre- to ask herself whether she labeled all the forces with two
sentation; 共ii兲 the ability to construct a new representation subscripts to indicate interacting objects or for some of the
from another type of representation; 共iii兲 the ability to evalu- forces she cannot find a pair of the objects? She can also
ate the consistency of different representations and modify check whether the number of forces on the diagram is equal
them when necessary. to the number of objects that interact with the object of in-

TABLE II. A scoring rubric to assess a free-body diagram.

Scientific ability 0 共Missing兲 1 共Inadequate兲 2 共Needs improvement兲 3 共Adequate兲

Is able to construct No free-body FBD is constructed but contains FBD contains no errors in vectors The diagram contains
a free-body diagram diagram is major errors such as incorrect but lacks a key feature such as no errors and each
constructed. force vectors; length of vectors; labels of forces with two force is labeled so that
wrong direction; extra incorrect subscripts; vectors are not drawn it is clear what each
force vector, or missing vector. from a single point; or axes are force represents.
missing.

020103-3
ETKINA et al. PHYS. REV. ST PHYS. EDUC. RES. 2, 020103 共2006兲

processes involved in the phenomenon.35 Based on these


considerations we identified the following subabilities that
we want our students to develop: 共i兲 ability to make a rea-
sonable prediction based on the proposed hypothesis; 共ii兲
ability to identify assumptions used in making the prediction;
共iii兲 Ability to determine specifically the way the assump-
tions might affect the prediction; 共iv兲 Ability to revise the
hypothesis based on new evidence.
For each of the subabilities, we developed an item in the
rubrics to guide students. A sample set of rubrics is shown in
Appendix A. In item 7 in the set of rubrics in Appendix A,
one can see an example of the descriptors used to assess
FIG. 1. An example of a multiple representation task 共cells 3–5 student’s ability to make a prediction based on a relationship
are not completed兲. or explanation. In addition to the assessment goal this item
helps students see the importance of using the explanation to
make a prediction and asks them to pay attention to the as-
terest. Then she can check for less significant things such as sumptions. Although it does not tell them whether the as-
coordinate axes. Obviously using the rubric for self- sumptions they described are correct, it reminds the students
assessment requires the knowledge of physics; thus this pro- to describe the assumptions.
cess cannot proceed completely without an interaction with To engage students in testing hypotheses, we provide
an instructor. The same is true for using other rubrics for them with alternative hypotheses that they need to test. This
self-assessment: they help students to ask relevant questions is usually done when students are constructing a new under-
about their work but they do not provide answers. Thus a standing and have to reconcile new ideas with their prior
rubric guides a student in the process of self-assessment but knowledge. We emphasize that they need to try to design an
does not provide feedback by itself. experiment to rule out the hypothesis, not to support it. For
We also made a list of several types of multiple represen- example, 共i兲 design an experiment to test the following pro-
tation activities 共a task may consist of some combination of posed hypothesis: an object always moves in the direction of
these activities兲. For example: 共i兲 Provide students with one the unbalanced force exerted on it by other objects; 共ii兲 de-
representation and have them create another; 共ii兲 provide stu- sign an experiment to test the following proposed hypoth-
dents with two or more representations and have them check esis: in an electric circuit the current is used up by different
for consistency between them; 共iii兲 provide students with one elements.
representation and have them choose from a multiple-choice The primary setting where students can strengthen their
list a consistent different type of representation 共for example, ability to develop and test hypotheses is the laboratory. In
provide a mathematical description of a process and have place of a laboratory write-up that tells students what to do,
students select from a list a consistent word description of we structure the write-up around the rubric abilities that we
the process兲; 共iv兲 have students use a representation while think it should develop. The lab write-ups provide guidance
solving a problem. as shown below.
An example of a task that helps students develop the sub-
abilities to construct new representations from other repre- 1. Testing experiment
sentations and to evaluate the consistency of different repre- Your friend says that as current flows through a circuit, it
sentations is given in Fig. 1. In addition, this example helps is used up by the elements of the circuit. Design an experi-
students learn to draw free-body diagrams. ment to test your friend’s idea.
Equipment. Voltage source, resistors, light bulbs of differ-
ent ratings, ammeter, voltmeter.
B. Ability to devise and test a qualitative explanation or
Write the following in your lab report,
quantitative relationship
共a兲 State the hypothesis you will test.
One of the purposes of science is to explain observed 共b兲 Devise a procedure to test the hypothesis. Write an
phenomena.32,33 Hypotheses that scientists generate to ex- outline of your design;
plain phenomena need to be testable—this means that they 共c兲 Draw a circuit diagram. Use different resistors or light
can be used to make predictions about the outcomes of new bulbs as your circuit elements.
experiments.34 If the outcome matches the prediction, it does 共d兲 Make a prediction of the outcome of your experiment,
not mean that the hypothesis under the test is always correct; based on the hypothesis that you are testing. State your as-
it only means that the hypothesis was not ruled out by the sumptions.
testing experiment. Thus, it is more productive to try to de- 共e兲 Perform the experiment. Record your data in an ap-
sign an experiment whose actual outcome may not match the propriate format.
prediction based on the hypothesis under the test. However, 共f兲 What is the outcome of the experiment? Did the out-
the outcome of the testing experiment depends not only on come match the prediction?
the correctness of the hypothesis but also on other auxiliary 共g兲 Based on your prediction and the outcome of the ex-
hypotheses used to make a prediction. These are usually sim- periment, what is your judgment about the hypothesis you
plifying assumptions about objects, interactions, systems, or were testing?

020103-4
SCIENTIFIC ABILITIES AND THEIR ASSESSMENT PHYS. REV. ST PHYS. EDUC. RES. 2, 020103 共2006兲

TABLE III. A scoring rubric to assess a student’s revision of an explanation

Scientific ability 0 共Missing兲 1 共Inadequate兲 2 共Needs improvement兲 3 共Adequate兲

Is able to revise the No attempt is made to An attempt is made to The revision of the The revision of
explanation of a explain the outcome of the explain the outcome and previous explanation the explanation or
prediction, based experiment, to revise the revise the previous or assumptions is assumptions is
on the results of previous explanation or explanation or assumptions, partially complete explained
an experiment. assumptions. The difference but is 共a兲 mostly incomplete and correct, yet still completely
between the prediction and and/or 共b兲 based on lacking in some and correctly.
the outcome of the experi- incorrect reasoning. relevant details.
ment is not addressed.

共h兲 How can you modify the hypothesis to account for the used in a laboratory. Students perform the activity after they
outcomes of the experiment? learn how to build simple circuits, understand what current
In Appendix A we provide examples of student work in a and voltage are, how to measure them, and that the voltage-
laboratory where they design experiments to test proposed versus-current ratio for commercial resistors is constant.
hypotheses and rubrics that help students self assess their
work. The rubric scores assigned by instructors can be used 1. Testing experiment
for research purposes. 共Examples are given in Sec. IV.兲
Voltage-versus-current ratio for a light bulb.
Equipment. Constant voltage source, a light bulb, a com-
C. Ability to modify a qualitative explanation or quantitative mercial resistor, ammeter, voltmeter, connecting wires, and a
relationship switch.
Another important ability that scientists use in their work Connect a light bulb in series with an ammeter 共to read
is the ability to account for anomalous or unexpected data. the current through the bulb兲. Connect a voltmeter across the
Often when a scientist performs an experiment, she obtains bulb 共to read the voltage across the bulb兲. Do not close the
some information that seems to contradict her expectations. switch yet.
After performing the experiment she needs to modify the Write the following in your lab report.
explanation or revisit the simplifying assumptions. We de- 共1兲 Draw a circuit diagram.
vised “surprising data tasks” that engage students in similar 共2兲 Close the switch, record the current through the bulb
activities. They are used at the stage of learning when stu- and the voltage across it. Determine the voltage to current
dents have constructed some scientific understanding 共expla- ratio for the bulb.
nation兲 of relevant phenomena and are ready to refine them. 共3兲 Next, connect the bulb in series to a resistor. Draw a
Students are asked to predict what will happen as a result of circuit diagram. Do not close the switch yet. Predict the
a particular new experiment. Students need to write the pre- voltage- to current-ratio for the bulb after you close the
diction and a justification of the prediction. After making switch. Explain your prediction.
their prediction and writing a justification, students observe 共4兲 Close the switch and record the values of voltage and
the experiment directly. Most likely the outcome of the ex- current. Determine the ratio. Did it match the prediction?
periment will not match their prediction—they will have 共5兲 How can you explain the discrepancy between the
anomalous data. Then the students have to revise their pre- predicted value and the experimental value?
diction by revising the explanation on which their prediction 共6兲 Devise an experiment you could perform to test your
was based, or the simplifying assumptions that they used to explanation 共you do not have to actually perform the experi-
make it. ment兲.
There are several reasons why surprising data tasks might In addition to the rubrics in Appendix A, the rubric in
be helpful in forming students’ scientific abilities:36 students Table III helps students self-assess their ability to revise an
learn to analyze data and revise explanations; students re- explanation based on the results of an experiment. More
ceive almost instant feedback about their prediction when tasks of this type can be found at http://paer.rutgers.edu/pt3.
they observe the experiment. Students learn to differentiate
between a description and an explanation,37 which is often
D. Ability to design an experimental investigation
confusing for the students.38 These tasks also help develop
an evaluation ability, as students have an opportunity to re- To devise and test relationships and explanations students
consider their reasoning after observing an experiment. need to develop experimental abilities. For pedagogical pur-
One can use them during instruction to build more sophis- poses we have classified experimental investigations that stu-
ticated models of phenomena and, at the end of learning a dents perform in introductory courses into three broad
particular topic, to assess students’ understanding of assump- categories:39 observational experiments, testing experiments,
tions that they often make unconsciously. An example of a and application experiments.
surprising data task that engages students in revising a model When conducting an observational experiment, a student
of constant electric resistance is provided below. It can be focuses on investigating a physical phenomenon without

020103-5
ETKINA et al. PHYS. REV. ST PHYS. EDUC. RES. 2, 020103 共2006兲

TABLE IV. Subabilities involved in designing three different types of experimental investigation.

Subabilities involved in the Subabilities involved in the Subabilities involved in the


ability to design and conduct ability to design and conduct ability to design and conduct
an observational experiment a testing experiment an application experiment

Identifying the phenomenon to be Identifying the relationship Identifying the problem to be solved
investigated or explanation to be tested
Designing a reliable experiment Designing a reliable experiment that Designing an experiment that
that investigates the phenomenon tests the prediction using the available solves the problem
equipment
Deciding what is to be measured, Deciding what is to be measured, Deciding what is to be measured
and identifying independent and and identifying independent and
dependent variables dependent variables
Using available equipment to make Using available equipment to make Using available equipment to make
measurements measurements measurements
Describing what is observed, both Deciding whether to confirm the Making a judgment about the
in words and by means of a picture prediction based on the results of results of the experiment
of the experimental setup the experiment
Describing a pattern or devising an Making a reasonable judgment about Evaluating the results by means of
explanation the relationship or explanation an independent method
Identifying shortcomings in an Identifying shortcomings in an Identifying shortcomings in an
experimental design and suggest experimental design and suggest experimental design and suggest
specific improvements specific improvements specific improvements

having expectations of its outcomes. When conducting a test- provided.” The rubrics describe the levels of proficiency for
ing experiment, a student has an expectation of its outcome each of the subabilities identified in Table IV.
based on concepts constructed from prior experiences. In an Students use these rubrics in the laboratories. Ideally we
application experiment, a student uses established concepts want them to continuously refer to the rubrics while design-
or relationships to address practical problems. In the process ing and performing the experiment. The rubrics guide them
of scientific research the same experiment can fall into more as to what experimental aspects they should specifically pay
than one of these categories. attention to. After they perform the experiment, they write a
What abilities do students need when designing these in- lab report 共in the lab兲. During the process of writing, they use
vestigations? We have identified the following steps that stu- the descriptors in the rubrics to improve their report.
dents need to take to design, execute, and make sense out of
a particular experimental investigation. We assigned a sub-
E. Ability to collect and analyze data
ability for each step and wrote corresponding descriptors in
the rubrics. The results of these discussions are presented in The abilities to collect and analyze data are independent
Table IV. of the type of experiment that is being performed and hence,
For each of the identified subabilities, we devised a rubric have been placed in a different category. We identified sub-
item that describes different levels of proficiency. For ex- abilities that students need for successful data collection and
ample, for the subability “using available equipment to make analysis and devised rubrics for each subability. 共The simpli-
measurements” a 0 level of proficiency is described as “at fied list below is appropriate for students—scientists do this
least one of the chosen measurements cannot be made with at a much more sophisticated level.兲 共i兲 Ability to identify
the listed equipment;” the level 1 is described as “all of the sources of experimental uncertainty. 共ii兲 Ability to evaluate
chosen measurements can be made but there are no details how experimental uncertainties might affect data. 共iii兲 Ability
given of how it is done;” the level 2 is described as “all to minimize experimental uncertainty. 共iv兲 Ability to record
chosen measurements can be made but the details of how it and represent data in a meaningful way. 共v兲 Ability to ana-
is done are vague or incomplete;” and the level 3 “all mea- lyze data appropriately. The rubric for each subability has
surements can be made and all details of how it is done are descriptors indicating what needs to be done for a satisfac-

TABLE V. A scoring rubric to assess a student’s presentation of the data.

Subability 0 共Missing兲 1 共Inadequate兲 2 共Needs improvement兲 3 共Adequate兲

Is able to record and Data are either Some important All important data are All important data are
represent data in a missing or data are missing or present but require some present, organized, and
meaningful way incomprehensible incomprehensible effort to comprehend recorded clearly

020103-6
SCIENTIFIC ABILITIES AND THEIR ASSESSMENT PHYS. REV. ST PHYS. EDUC. RES. 2, 020103 共2006兲

tory achievement. An example of a rubric for the subability The evaluation is a crucial ability for our students. During
to record and represent data in a meaningful way is shown in a physics course, students are expected to identify, correct,
Table V. and learn from their mistakes with the help of an instructor.
Students develop these subabilities in labs. As we dis- This aid may come in many forms, such as when an instruc-
cussed above, the lab write-ups we provide guide them tor provides problem solutions to a class, or tutoring to an
through the process by focusing their attention on the sub- individual student. However, in each case the student relies
abilities outlined in the rubrics. Below, we show an example upon an instructor 共or sometimes a textbook兲 in order to
of an application experiment for use in labs. This lab activity determine whether, and how, their work is mistaken. Since
helps students develop the abilities to design an application the students are not given any other means with which to
experiment to solve a practical problem, and to collect and evaluate their work, the students come to see an evaluation
by external authorities as the only way for them to identify
analyze data.
and learn from their mistakes.
There are several sets of criteria and strategies that are
1. Application experiment: Coefficient of friction between a shoe
commonly used by practicing physicists. Each of these strat-
and a floor tile egies relies upon hypothetico-deductive reasoning,34
A floor tile company needs to decide whether their new whereby the information is used to create a hypothesis which
floor tiles meet minimum safety standards. They don’t want is then tested. The logical sequence for this testing can be
people to slip on their tiles and sue the company! Design two characterized as: If (general hypothesis) and (auxiliary as-
independent experiments to determine the maximum coeffi- sumptions) then (expected result) and/but (compare actual
cient of static friction between your shoe and the sample of result to expected result), therefore (conclusion). For ex-
floor tile provided. Equipment: Spring scale, ruler, protractor, ample, when a student derives an equation and needs to
floor tile, tape, string, clips. Include in your report the fol- evaluate it with dimensional analysis, the logical sequence is
lowing for each independent experiment. as follows:
共a兲 Draw a sketch of your experimental design. If the equation is physically self-consistent,
共b兲 Write a brief outline of the procedure you will use. And I correctly remember the units for each quantity in
共c兲 Decide what assumptions about the objects, interac- the equation,
tions, and processes you need to make to solve the problem. Then I expect the units for each term in the equation to be
How might these assumptions affect the result? Make sure identical,
you only consider relevant assumptions. And/But the units for each term are/are not identical,
共d兲 Draw a free-body diagram for the shoe for the situa- Therefore the equation is/is not physically self-consistent.
tion. 共Recall your assumptions.兲 Include an appropriate set of The types of subabilities that students need to develop to
coordinate axes. Use the free-body diagram to devise the be successful in evaluations are numerous. Some of them are
mathematical procedure to solve the problem. 共i兲 ability to conduct a unit analysis to test the self-
共e兲 What are possible sources of experimental uncer- consistency of an equation; 共ii兲 ability to analyze a relevant
tainty? Which instrument gives you the highest uncertainty? limiting and/or special case for a given model, equation,
How would it affect the data? How could you minimize claim; 共iii兲 ability to identify the assumptions a model, equa-
it? tion, or claim relies upon; 共iv兲 ability to make a judgment
共f兲 Perform the experiment and record your observations about the validity of assumptions; 共v兲 ability to use a unit
in an appropriate format. Make sure you take steps to analysis to correct an equation which is not self-consistent;
minimize uncertainties. What is the outcome of the experi- 共vi兲 ability to use a special-case analysis to correct a model,
ment? equation, or claim; 共vii兲 ability to judge whether an experi-
共g兲 When finished with both experiments, compare the mental result fails to match a prediction; 共viii兲 ability to
two values you obtained for the coefficient of static friction. evaluate the results of an experiment by means of an inde-
Decide, using assumptions and uncertainties, if these values pendent method.
are different or not. If they are different, what are possible Evaluation subabilities are integral components of mul-
reasons? tiple representation abilities, design abilities, etc. Examples
共8兲 List shortcomings, if any, in the experiments. Suggest of evaluation subabilities rubrics are given in Table VI.
specific improvements. To help students learn evaluation strategies, we have de-
veloped two categories of tasks. One category consists of
supervisory evaluation tasks, wherein students act like a su-
F. Ability to evaluate experimental predictions and outcomes,
pervisor by evaluating 共and, if necessary, correcting兲 some-
conceptual claims, problem solutions, and models
one else’s work 共usually the work of an imaginary friend兲.
We define an evaluation as making judgments about in- The other category consists of integrated evaluation tasks,
formation based on specific standards and criteria.40 More which ask the students to evaluate, and if necessary to cor-
specifically, a given particular is judged by determining rect, their own work. For both categories of task, the evalu-
whether it satisfies a criterion well enough to pass a certain ated work may be a problem solution, experiment design,
standard. Scientists constantly use evaluations to assess their experiment report, conceptual claim, or a proposed model.
own work and the work of others when conducting their own Supervisory evaluation tasks are meant to help the students
research, serving as referees for peer-reviewed journals, or learn the goals, criteria, and method of use for each evalua-
serving on grant-review committees. tion strategy, while integrated evaluation tasks encourage

020103-7
ETKINA et al. PHYS. REV. ST PHYS. EDUC. RES. 2, 020103 共2006兲

TABLE VI. Scoring rubrics to assess a student’s evaluation abilities

Scientific ability 0 共Missing兲 1 共Inadequate兲 2 共Needs improvement兲 3 共Adequate兲

Is able to evaluate No attempt is made A second independent A second independent A second independent
the results by the to evaluate the method is used to method is used to method is used to
means of an result by means of evaluate the results. evaluate the results. evaluate the results.
independent an independent However, there is little or Some discussion about The discrepancy between
method method no discussion about the the differences in the the results in the two
differences in the results results is present but methods and possible
of both methods. there is little or no reasons are discussed.
discussion of the A percentage difference
possible reasons for is calculated.
the difference
Is able to analyze No meaningful An attempt is made to An attempt is made to A relevant special case
a relevant special attempt is made to analyze a special case, analyze a relevant special is correctly analyzed
case for a given analyze a relevant but the identified special case, but the student’s and a proper
model, equation, special case. case is not relevant. analysis is flawed. judgment is made.
or claim. OR major steps are OR the student’s judgment
missing from the analysis is inconsistent with their
共e.g., no conclusion is analysis.
made兲

students to incorporate evaluation into their learning behav- 共b兲 For this special-case, state what you think the answer
ior. During a semester we tend to use mostly supervisory should be, and explain your reasoning.
tasks for the first few weeks so that the students can get 共c兲 Next, recalculate your friend’s answer for your special
acquainted with each strategy, and then transition to inte- case.
grated tasks so that they gain experience at using the strate- 共d兲 Compare your conceptual expectation with the result
gies to evaluate and correct their own work. from part 共3兲.
Below are two example tasks. Each of these tasks features 共e兲 Make a conclusion about the validity of your friend’s
the same physical scenario and question. The first task ex- solution based on this comparison.
emplifies the format of a supervisory task, and is structured Integrated task. As a certain type of green bean ripens, it
to help the students work through each step in the evaluation. builds up gas inside until the bean pod explodes from the
The second task is in the format of an integrated task. In pressure and shoots its seeds outwards. Let us assume one
general, any of our tasks may be framed either as supervisory particular seed starts off at ground level, and is shot out at an
or integrated tasks. angle of 30° above the ground at a speed of 12.0 m / s.
Supervisory task: You have been given a problem, which 共a兲 What is the maximum height the seed reaches above
says: “As a certain type of green bean ripens, it builds up gas the ground?
inside until the bean pod explodes from the pressure and 共b兲 Do a special-case analysis of your work in part 共a兲.
shoots its seeds outwards. Let us assume one particular seed 共c兲 If your work does not pass the special-case analysis,
starts off at ground level, and is shot out at an angle of 30° describe how you should change your solution in order to
above the ground at a speed of 12.0 m / s. What is the maxi- pass the analysis. If your work did pass the special-case
mum height the seed reaches above the ground?” Your friend analysis, describe how you benefited from doing the analysis
Scooter comes up with the following solution. anyway.
First, he solves for v0y 共the y component of the initial Evaluation tasks are usually employed as part of recita-
velocity兲, tions and homework. During recitations, students have a
chance to try using each strategy while getting real-time
v0y = v0 cos共␪兲 = 共12.0 m/s兲cos共30 ° 兲 = 10.4 m/s. feedback from their peers and instructor. Moreover, recita-
Then he uses this to solve for the maximum height: tions provide a setting where the students can learn how to
associate topic-specific knowledge with the general evalua-
v2y = v20y + 2a共⌬y兲, tion strategies. The homework then provides an opportunity
for students to gain further practice at using each strategy.
共0 m/s兲2 = 共10.4 m/s兲2 + 2共− 9.8 m/s2兲共⌬y兲,
G. Ability to communicate
⌬y = 5.5 m.
An important ability in the work of scientists is their oral
Do a special-case analysis of Scooter’s solution: and written communication, an ability that can be fostered in
共a兲 Choose a special-case for a situation where you know a physics course. For example, the quality of a lab report can
what the answer should be, conceptually. be judged for its completeness and clarity. A communication

020103-8
SCIENTIFIC ABILITIES AND THEIR ASSESSMENT PHYS. REV. ST PHYS. EDUC. RES. 2, 020103 共2006兲

TABLE VII. A scoring rubric to assess a student’s communication ability.

Scientific ability 0 共Missing兲 1 共Inadequate兲 2 共Needs improvement兲 3 共Adequate兲

Is able to communicate Diagrams are missing Diagrams are present but Diagrams and/or experimen- Diagrams and/or
the details of an experi- and/or experimental unclear and/or the experi- tal procedure are present but experimental pro-
mental procedure clearly procedure is missing mental procedure is present with minor omissions or cedure are clear
and completely or extremely vague. but important details are vague details. and complete.
missing.

ability rubric can help students know what is expected in drawing the diagrams, and 共b兲 see if the use of a FBD cor-
communications in the scientific world. An example of a related with success in solving Newton’s second law prob-
communication ability rubric is provided below in Table VII. lems. We collected data from several multiple-choice exams
共the quantitative study兲 and interviewed some students 共the
qualitative study兲. In the first year, we chose five problems
IV. DO STUDENTS DEVELOP SCIENTIFIC ABILITIES?
from four exams; in the second year, seven problems from
In this section we will describe briefly four research four exams. The problems were chosen if they involved
projects whose goals were to investigate whether students forces and were difficult enough to merit the construction of
who learn physics in courses which focus on the develop- a diagram. In our first year, we followed 125 randomly cho-
ment of specific abilities actually acquire them and use them sen students and 120 in the second year.
while solving problems, designing experiments, evaluating We examined FBDs that students drew on the exam sheets
their work, etc. We also investigated students’ attitudes to- near the problem statement. First we counted the number of
wards the development of some of these abilities. students who drew diagrams on the problem sheets. The ex-
ams were multiple choice and students were only given
credit for a correct answer. Thus, if a diagram appeared on
A. The study of multiple-representation abilities
the problem sheet next to some kind of mathematical solu-
This two-year descriptive study was conducted in a large- tion, we considered that a student drew it to help solve the
enrollment 共about 500 students兲 algebra-based physics problem. We found that on average about 58% of our stu-
course for science majors. One of the course goals was to dents drew a FBD for each of the chosen exam problems
help the students use multiple representations for the analysis even when they knew they would not receive any credit for
of phenomena and for problem solving. In lectures the in- doing so. For traditionally taught students, this number is
structor discussed with the students how to represent the around 20%.41
same process in multiple ways, and how to use one type of To see if the FBD construction correlated with success in
representation to help construct another, and how to apply problem solving, we scored student diagrams using the ru-
the representations for problem solving. Students worked in- bric. Our coding scheme followed the rubric descriptors
dividually and in small groups on activities that required 共missing, inadequate, needs improvement, and adequate兲
them to represent problem situations in different ways with- shown earlier. We found that for over 12 problems on 4
out actually solving for a particular quantity. In recitations exams 85% of students who had a correct free-body diagram
students worked on similar activities; they were also part of 共“adequate,” according to the rubric—Table II兲 had a correct
homework assignments and appeared as multiple-choice answer. 71% of those who had an incomplete diagram
problems on the exams. When students solved traditional 共“needs improvement”兲 had a correct answer. 38% of those
problems, they were encouraged to use a problem-solving who had an incorrect diagram 共inadequate兲 had a correct an-
strategy in which physical representations 共such as free-body swer and 49% of those who did not draw a diagram 共miss-
diagrams, energy bar charts, ray diagrams, etc.兲 were used to ing兲 had a correct answer. The average percentage of stu-
construct mathematical equations. Homework solutions for dents who solved these problems correctly was 60%. A two-
traditional problems used a multiple-representation strategy. sample t test for independent samples indicated that those
For the study we used a free-body diagram as an example who had an adequate diagram significantly outperformed the
of a physical representation. Students learned to construct class average, and those who had inadequate diagrams were
free-body diagrams according to the advice provided in the significantly below the class average. There were some small
rubric described earlier. Students learned to convert a dia- variations between the categories per individual question.
gram into Newton’s second law in component forms. Occa- However, there was never a case when the students who did
sionally, they were given Newton’s second law in a compo- not draw a diagram were more successful than those who
nent form as applied to an unknown situation and asked to drew a correct one.
construct a consistent free-body diagram and to describe in We followed up with interviews 共six students兲 during
words a consistent physical situation. which students had to solve several problems using a think-
The goals of the study were to 共a兲 see if students who aloud protocol. All students started solving a problem by
were explicitly taught how to draw FBDs and how to use constructing a sketch of the situation described in the prob-
them to solve problems actually used them to help solve lem statement and five out of six drew a free-body diagram
multiple-choice problems when no credit was given for after the sketch. Two most successful students then went

020103-9
ETKINA et al. PHYS. REV. ST PHYS. EDUC. RES. 2, 020103 共2006兲

We found that the students’ abilities to design an experi-


ment, to devise a mathematical procedure to solve an experi-
mental problem, and to communicate the details of the pro-
cedure performance improved.43 The changes in the above
abilities were statistically significant. However, the changes
in students’ ability to evaluate experimental uncertainties
were not significant. We later found that another ability on
which students do not improve is the evaluation of the effects
of theoretical assumptions.44 It could be that guidelines for
the students in the lab, asking them to evaluate the effects of
assumptions and uncertainties, are not sufficient for them to
actually do it. Students probably need additional exercises
helping them master these abilities.45,46 We are currently
working on the development and pilot testing of these exer-
cises.
FIG. 2. Comparison abilities during the third and the tenth week
of the semester.
C. The study of transfer of scientific abilities
back and forth between the sketch and their free-body dia- The goal of this study was to investigate whether students
gram to understand the problem better. They used the dia- can transfer some scientific abilities to a new context. We use
grams to help create a Newton’s second law equation in a the term transfer here to mean an ability to apply something
component form, which they then used to solve the problem. that one learns in one context in a different context or with a
These students determined the direction of the acceleration different content.
and constructed a diagram that was consistent in terms of This study was conducted in a 190-student algebra-based
force magnitudes with the acceleration direction. After com- physics course with two 55-minute lectures, one 80-minute
pleting their numerical solution, they went back to the free- recitation, and one three-hour lab. Each lab usually contained
body diagrams and used them to evaluate the final result. two design tasks in which students had to either test a pro-
None of the students who were unsuccessful in problem posed hypothesis 共similar to the Testing experiment in Sec.
solving during the interview used the free body diagram to III B兲 or experimentally solve a practical problem 共similar to
construct the mathematical representation and none used it to the Application experiment in Sec. III E兲. The lab handouts
evaluate the result. More details of this study can be found in provided to the students resemble the example shown after
Ref. 42. the next paragraph.
In the first lab of the course, students were to design an
experiment to test a proposed hypothesis 共the full experiment
B. The study of experimentation abilities
write-up and sample student responses are given in Appendix
The study was conducted in a large-enrollment introduc- A and Tables X and XI兲. They were encouraged to design an
tory laboratory course 共500 students兲 for science majors 共pre- experiment to reject the hypothesis, as a supportive outcome
med, prevet, biology, exercise science, and environmental meant only that the hypothesis was not ruled out—it did not
science兲. Although the laboratory course was separate from a prove the hypothesis. Students had guidelines provided in the
lecture and/or recitations course, most of the students were lab handout and rubrics to help them design an experiment.
enrolled in both. The development of scientific abilities was They were to use hypothetico-deductive reasoning to test the
considered an important goal in both courses. In the lab hypothesis. The logic of hypothetico-deductive reasoning
course there were 20 lab sections, each with about 25 stu- that we communicated to the students was as follows:
dents and 9 teaching assistants. Students performed one If the hypothesis being tested is correct.
three-hour lab per week for 10 weeks. In each lab, at least And I perform the following testing experiment.
one experiment was a design task 共similar to the example in Then I predict the outcome of the experiment based on the
Sec. III E兲. The experiments had guidelines that focused on hypothesis.
different scientific abilities that we had identified. And/But the outcome did/did not match the prediction.
To measure the development of students’ acquisition of Therefore the hypothesis is not/is disproved.
scientific abilities, we scored their lab reports each week A question on the final exam was similar to the above
based on the scientific abilities rubrics. We focused on the experiment: “Describe an experiment that you could design
ability to design a reliable experiment to solve a problem, to to test the proposed hypothesis that an object always moves
choose a productive mathematical procedure, to communi- in the direction of the unbalanced force exerted on it by other
cate details of the experiment, and to evaluate the effects of objects.” The final exam was three months after the above
experimental uncertainty. Our sample consisted of 35 ran- experiment was performed in the lab. However, on the exam,
domly chosen students who were distributed among 4 lab students had no guidelines like parts 共a兲–共i兲 in Appendix A,
sections. In Fig. 2, we show a histogram of the scores that and no rubrics. Also, students worked individually on the
students’ reports received on four abilities during the third exam. According to a scheme by Barnett and Ceci,47 the
and the tenth week of the semester. transfer we examined can be classified as near in terms of the

020103-10
SCIENTIFIC ABILITIES AND THEIR ASSESSMENT PHYS. REV. ST PHYS. EDUC. RES. 2, 020103 共2006兲

TABLE VIII. Students responses on the final exam 共N = 181兲. ducted the two courses were run in parallel, with nearly iden-
tical lectures, recitations, homework, and labs. The only
Students’ performance on final exam Transfer rate significant difference was that in the experimental course
several evaluation tasks were included in recitation assign-
Drew pictures 94% ments and homework while the comparison course had no
Drew physical representations 共motion diagrams, 45% evaluation tasks in their coursework.
free-body diagrams兲 The study had two goals. One goal was to find out
Designed experiment to try to reject the 58% whether the use of our evaluation tasks did indeed help stu-
proposed rule dents to acquire evaluation abilities. The other goal was to
Designed experiment that supports rule 41% find whether an improvement in students’ evaluation abilities
benefited their understanding of the subject matter.
Prediction was based on rule to be tested 16%
We included an evaluation task on each of the six exams
AND effect of assumptions
throughout the year for both courses. We used the rubrics to
Prediction was based on rule to be tested 64% assign scores of 0–3 共0, missing; 1, inadequate; 2, needs
but not on effects of assumptions improvement; 3, adequate兲 to student work related to each
Prediction was not based on rule but on 32% strategy. We found that in the experimental group students
previous knowledge of Newton’s laws mastered the unit analysis strategy very quickly, scoring an
Considered assumptions 47% average of 2.5 with a standard deviation of 1.05 on the first
exam, and maintaining that average throughout the year. In
contrast, the much more complicated strategy of special-case
knowledge domain, but far in terms of physical context analysis took most of a semester to be learned. On the first
共exam hall instead of a physics lab兲, functional context 共writ- exam, the average score was 1.17 with a standard deviation
ing an answer to a question versus designing and performing of 1.04. By the end of the first semester, though, the student
an experiment兲, social context 共individual versus group兲, and average reached 2.30, with a standard deviation of 0.62, and
modality 共exam versus a lab兲. remained roughly there for the entire second semester.
The individual student’s exam question was scored using To address our second goal, we analyzed exam data from
the rubrics. Table VIII indicates the percentage of students the two courses. The exams for the experimental and control
whose exam answer received a score of 3 on the relevant groups shared several of the same multiple-choice problems.
items in the rubrics. We call this percentage the rate of trans- Of these shared problems, some were on topics on which
fer. The rate of transfer means that a certain percentage of the only the experimental group students had evaluation tasks
students are successful in a certain ability. Typically the rate 共such as Newton’s Second Law兲, whereas the comparison
of transfer is measured by the percentage of subjects who can group had additional standard problems on those topics in-
solve a problem similar to a problem that they were taught to stead of the evaluation tasks in their homework and recita-
solve. These results are encouraging as this rate of transfer tions. The shared multiple-choice exam problems on such
for some abilities was much higher than the typical 20% topics will be called E problems. 共Though called E problems,
transfer rate reported in other studies 共this is transfer of a these problems did not have an evaluation component, they
skill without a hint兲.48 were traditional physics problems. The letter E relates to the
We believe that it is unlikely that students remembered fact that students in the experimental course had received
the details of the laboratory work on this question performed evaluation tasks related to the content of those problems in-
three months earlier. However, they have been using stead of traditional problems.兲 The remainder of the shared
hypothetico-deductive reasoning during the semester, and exam problems were on topics on which no one had evalu-
this practice could have contributed to the positive result. ation tasks 共such as momentum兲. We will call these NE prob-
Was this problem too easy and hence a false indicator of lems.
transfer? To address this issue, we compared students’ per- By comparing the relative performance of each class on E
formance on this exam question and their overall perfor- and NE problems, we could test whether the use of our
mance on the exam. The average score on this special ques- evaluation tasks benefited the students’ problem solving per-
tion was 60% 共standard deviation of 21%兲 as opposed to the formance. What we found was that the experimental group
average score of 69% 共standard deviation 15.5%兲 on the students significantly outperformed control students on E
exam as a whole—it seems that the special question was not problems, while the control group students significantly out-
too easy. Thus we think that students did indeed transfer performed the experimental group students on NE problems.
some experimentation-related scientific abilities that they ac- We did, in fact, expect the control group students to do
quired in the labs using the rubrics. better on NE problems since these students have stronger
math and science backgrounds. In light of this population
difference, our result that the experimental group students
D. The study of evaluation abilities
did better on E problems is especially remarkable. It indi-
A comparison-group study was conducted in two large- cates that the use of evaluation tasks significantly benefited
enrollment courses for science majors. Both courses are students’ problem solving for those topics. In particular, we
year-long courses but the experimental course has generally found that a mastery of special-case analysis was the primary
lower achieving students and is taught on a different univer- cause behind this boost in relative performance. This is rea-
sity campus. During the year when the experiment was con- sonable since a special-case analysis is one effective way for

020103-11
ETKINA et al. PHYS. REV. ST PHYS. EDUC. RES. 2, 020103 共2006兲

students to test and refine their conceptual understanding, study concerning student-experimental abilities, we found
and to coherently organize different models into a hierarchi- that their abilities to design an experiment, to devise a math-
cal system. ematical procedure to solve an experimental problem, and to
communicate the details of the procedure performance im-
V. SUMMARY proved significantly. However, their ability to evaluate ex-
perimental uncertainties did not change significantly. In the
This paper described the development of tasks and assess- third study, we found that students could transfer some abili-
ment rubrics that help students acquire some of the abilities ties learned in one context to new contexts. In the fourth
useful in science and engineering. Development of these study, we found that there was a significant increase in stu-
abilities is considered important in science courses at K–16 dent problem solving performance in conceptual areas in
levels.14–19 There are summative assessment questions devel- which they had done limit case evaluation activities. It seems
oped for national and international evaluations of student that this type of evaluation enhances student understanding
learning of some scientific abilities but we are unaware of and problem solving. Finally, we have evaluated student tra-
any systematic efforts to build a coherent and systematic ditional problems solving performance in this learning sys-
library of formative assessment tasks and rubrics to help stu- tem that emphasizes the development of science process
dents develop these abilities. abilities. These students do significantly better on multiple
We have developed an approach to learning introductory choice problems on final tests than their peers taught via
college physics in which the acquisition of various scientific traditional methods.49
abilities is one of the main goals. We list below the general In summary, it seems that it is possible to help students in
process abilities that we have chosen to include in the learn- introductory physics courses start acquiring some of the sci-
ing system with the goal of helping students develop these ence process abilities that are needed for work in the 21st
abilities while learning physics. 共i兲 Learn to represent physi- century workplace. The learning system also enhances stu-
cal processes in multiple ways. 共ii兲 Learn to devise and test a dent performance in terms of traditional measures. Devel-
qualitative explanation or quantitative relationship. 共iii兲 oped rubrics can used for research purposes to collect data
Learn to modify a qualitative explanation or quantitative re- about student progress and for comparison of student learn-
lationship. 共iv兲 Learn to design an experimental investiga- ing in different courses.
tion. 共v兲 Learn to record, represent, and analyze data. 共vi兲
Learn to evaluate experimental predictions and outcomes,
ACKNOWLEDGMENTS
conceptual claims, problem solutions, and models. 共viii兲
Learn to communicate. Each ability includes subabilities that The authors wish to thank Michael Lawrence, Marina
are needed for proficiency in that ability. Milner-Bolotin, Hector Lopez, and Juliana Timofeeva for
In order to help students acquire these abilities, we have their help in the development of rubrics and formative as-
developed a large number of activities used formatively dur- sessment tasks. We with to thank Richard Fiorillo and Gab-
ing instruction. In addition, we have developed rubrics, riel Alba for their help in implementing the laboratories. We
which indicate what is needed for proficiency relative to the also thank the National Science Foundation 共Grant No.
different subabilities. The rubrics can also be used by in- DUE-0241078兲 for providing us with funding for the project.
structors to provide formative feedback to the students or for
summative evaluation of the students and the learning sys- APPENDIX A: EXAMPLES OF STUDENT WORK AND ITS
tem. The rubrics can also be used by students for self- ASSESSMENT WITH THE RUBRICS
evaluation as they perform the activities. Students need to be
able to revise their work after they evaluated it using the Design an experiment to test the following proposed hy-
rubrics. Here the instructor’s help is essential as the rubrics pothesis: An object always moves in the direction of the
themselves do not provide content-related feedback. The ru- unbalanced force exerted on it by other objects.
brics also can be used for research purposes to monitor stu- 共a兲 State what hypothesis you will test in your experiment.
dents progress and to compare students from different 共b兲 Brainstorm the task. Make a list of possible experi-
courses. They can add to our library of assessment instru- ments. Decide what experiments are best.
ments that allow the PER community to evaluate learning. 共c兲 Draw a labeled sketch of your chosen experiment.
So far most of the PER-developed instruments assess con- 共d兲 Write a brief description of your procedure.
ceptual understanding and graphing skills. 共e兲 Construct a free-body diagram for the object.
Our summative use of the rubrics to study if students in 共f兲 List assumptions you make. How could they affect the
introductory physics courses 共primarily the introductory prediction?
physics course for biology majors兲 are getting better in using 共g兲 Make a prediction about the outcome of the experi-
the abilities is positive for the most part. In the multiple- ment based on the hypothesis you are testing.
representation study we found that a relatively large number 共h兲 Perform the experiment. Record the outcome.
of the students 共about 60% compared to 20% in traditionally 共i兲 Make a judgment about the hypothesis based on your
taught courses兲 used free-body diagrams in their problem prediction and the experimental outcome.
solving. Correct use of the diagrams was significantly corre- We also discuss student responses and provide rubric
lated with success in the problem solution. In the second scores for the subabilities shown in Table IX–XI.

020103-12
SCIENTIFIC ABILITIES AND THEIR ASSESSMENT PHYS. REV. ST PHYS. EDUC. RES. 2, 020103 共2006兲

TABLE IX. Sub-abilities rubrics for designing a testing experiment and making a prediction of the outcome of the experiment.

Ability to design and conduct a testing experiment


2 共Needs some
Scientific ability 0 共Missing兲 1 共Inadequate兲 improvement兲 3 共Adequate兲

1 Is able to identify No mention is made An attempt is made to The relationship or The relationship or
the relationship of a relationship or identify the relationship explanation to be explanation is
or explanation to explanation. or explanation to be tested is described clearly stated.
be tested. tested but is described but there are minor
in a confusing manner. omissions or vague
details.

2 Is able to design The experiment The experiment tests The experiment tests The experiment tests
a reliable exper- does not test the relationship or the relationship or the relationship or
iment that tests the relationship explanation, but due explanation, but due explanation and had a
the relationship or explanation. to the nature of the to the nature of the high likelihood of
or explanation. design it is likely the design there is a producing data that will
data will lead to an moderate chance the lead to a conclusive
incorrect judgment. data will lead to an judgment.
inconclusive judgment.

3 Is able to use avail- At least one of All of the chosen All chosen measure- All chosen measure-
able equipment to the chosen mea- measurements can be ments can be made, ments can be made
make measurements surements cannot made, but no details but the details about and all details about
be made with the are given about how how it is done are how it is done are
available equipment. it is done. vague or incomplete. clearly provided.

4 Is able to decide No decision is made A decision is made A decision is made A correct decision is
whether or not to to confirm or but it is not strongly based on the results made and is based
confirm the pre- disconfirm the based on the results of the experiment, on the results of the
diction based on prediction. of the experiment. but the reasoning is experiment.
the results of the flawed.
experiment

5 Is able to make a No judgment is made A judgment is made A judgment is made A reasonable judgement
reasonable judgment about the relation- but it is based only based on the reliability is made based on the
about the relation- ship or explanation, on the degree of of the experiment and reliability of the exper-
ship or explanation. or is not based on agreement between the degree of agreement iment and the degree of
the results. the results and the between the results and agreement between the
prediction. the prediction, but the results and prediction.
reasoning is flawed.

Ability to construct, modify, and apply relationships or explanations

7 Is able to make a No attempt to make A prediction is made A prediction is made A prediction is made
reasonable prediction a prediction is made. but it does not follow that follows from the that follows from
based on a relation- The experiment is from the relationship or relationship or expla the relationship or
ship or explanation. not treated as at explanation being tested, nation, but it does explanation and
testing experiment. or it ignores or contradicts not incorporate the incorporates the
some of the assumptions assumptions. assumptions.
inherent in the relationship
or explanation.

8 Is able to identify No attempt is made An attempt is made to Most assumptions are All assumptions are
the assumptions to identify any identify assumptions, correctly identified. correctly identified.
made in making the assumptions. but most are missing,
prediction. described vaguely, or
incorrect.

020103-13
ETKINA et al. PHYS. REV. ST PHYS. EDUC. RES. 2, 020103 共2006兲

TABLE X. Student work: Student A.

Researcher’s comments based on scientific


Transcript of student’s lab report abilities rubrics Rubric item Score

In my experiment, I wish to test the hypothesis States the hypothesis to be tested experimentally. 1 3
that an object always moves in the direction of
the net force exerted on it by other objects.
I will get the bowling ball moving to the left in Description of the experiment: The proposed 2 3
a straight line. I will then hit it towards the right experiment tests the hypothesis and has a high
with a mallet gently. likelihood of producing data that will lead to a
conclusive judgment.
Student draws figures and free-body diagrams Multiple representations.
here.
I assume there is no friction between the ball and Discusses an assumption in the procedure. 8 1
the floor. Then the only force on the ball in the x However, the main assumption—that the floor
direction is the force of the mallet on the ball to is not tilted is not addressed. The friction
the right. Net force is to the right. assumption is irrelevant.
Prediction: According to the hypothesis the ball Prediction is based on the hypothesis. 7 3
should immediately start moving to the right
when the mallet hits it no matter how hard we
hit it.
Outcome: When I tapped the ball gently the ball Note that the student has designed and performed 4 3
first slowed down, and only later moved to the a series of experiments where the outcome is
right. When I tapped hard, the ball still slowed different from the prediction, i.e., she tries to
down a little but then moved to the right. It did reject the rule.
not instantly reverse directions as the hypothesis
predicted.
Based on my prediction and the experimental 5 3
outcome, the hypothesis is not supported.

TABLE XI. Student work: Student B.

Researcher’s comments based on scientific


Transcript of student’s lab report abilities rubrics Rubric item Score

Hypothesis we will test: An object always moves in States the hypothesis to be tested experimentally. 1 3
the direction of the net force exerted on it.
Possible experiments: Brainstorming different experiments. Note that
1. Hit a bowling ball with a mallet. the student is trying to support hypothesis.
2. Push a dynamics cart along a dynamics track.
3. Push 共gently兲 your lab partner forward.
The first choice would be the best because it would
most clearly exhibit the hypothesis we are trying to
prove.
Chosen experiment: We will hit the bowling ball This experiment will confirm the rule instead of 2 1
共which is sitting at rest兲 in a forward direction, using trying to reject it. Due to the nature of the design it
the mallet. is likely the data will lead to an incorrect judgment.
Prediction: The ball will move in the forward direction, Prediction is based on the rule under test, but 7 2
the same direction as the net force exerted on it by the assumptions are not considered.
mallet, and in no other direction besides that one.
Assumptions are not described. 8 0
Outcome: The ball moved forward in the exact Note that the student has designed and performed 4 3
direction of the net force on it 共force of the mallet兲. an experiment where the outcome is supported by
Yes—the prediction was confirmed. the rule.
Based on our prediction and successful outcome, we The judgment is based only on the degree of 5 1
can say that hypothesis is supported by experimental agreement between the results and the prediction.
evidence.

020103-14
SCIENTIFIC ABILITIES AND THEIR ASSESSMENT PHYS. REV. ST PHYS. EDUC. RES. 2, 020103 共2006兲

*Also at The Rutgers Graduate School of Education, 10 Seminary Evaluation. Final Report of the Cooperative Study of Evaluation
Place, New Jersey 08901, USA. in General Education 共Council on Education, Washington, DC,
1
R. Czujko, in The Changing Role of Physics Departments in 1954兲.
Modern Universities, edited by E. F. Redish and J. S. Rigden, 26 C. D. Schunn and J. Anderson, in Designing for Science: Impli-

AIP Conf. Proc., No. 399 共AIP, New York, 1997兲, pp. 213–224. cations from Everyday, Classroom and Professional Settings, ed-
2 What Work Requires of Schools: A SCANS Report for America ited by K. Crowley, C. D. Schunn, and T. Okada 共Erlbaum,
2000 共U. S. Department of Labor, Secretary’s Commission on Mahwah, NJ, 2001兲, pp. 351–392.
Achieving Necessary Skills, Washington D.C., 1991兲. 27
P. Black and D. Wiliam, Inside the Black Box: Raising Standards
3
J. D. Bransford, A. L. Brown, and R. R. Cocking, How People through Classroom Assessment 共nferNelson, London, 1998兲.
Learn: Brain, Mind, Experience, and School 共National Academy 28 D. R. Sadler, Instr. Sci. 18, 119 共1989兲.
29 S. M. Brookhart, The Art and Science of Classroom Assessment:
Press, Washington, D.C., 1999兲.
4 A. Carneval, L. Gainer, and A. Meltzer, Workplace Basics: The
The Missing Part of Pedagogy, ASHE-ERIC Higher Education
Essential Skills Employers Want 共Jossey-Bass, San Francisco, Report 共The George Washington University, Graduate School of
1990兲. Education and Human Development, Washington, DC, 1999兲
5
See the ABET home page at URL http://www.abet.org/ Vol. 27, No.1.
30
forms.shtml R. D. Tweney, in Designing for Science: Implications from Ev-
6 National Research Council, National Science Education Stan-
eryday, Classroom and Professional Settings, edited by K.
dards 共National Academy Press, Washington, D.C. 1996兲. Crowley, C. D. Schunn, and T. Okada 共Erlbaum, Mahwah, NJ,
7
L. C. McDermott & the Physics Education Group at the Univer- 2001兲, pp. 141–173.
sity of Washington, Physics by Inquiry 共Wiley, New York, 31 D. Hestenes, Am. J. Phys. 55, 440 共1987兲.
32
1996兲. J. A. Kournay, Scientific Knowledge: Basic Issues in the Philoso-
8 E. F. Redish, Teaching Physics with the Physics Suite 共Wiley,
phy of Science 共Wadsworth, Belmont, CA, 1987兲.
New York, 2003兲. 33
E. Nagel, The Structure of Science 共Harcourt, Brace, & World,
9 D. R. Sokoloff and R. K. Thornton, Interactive Lecture Demon-
Inc., New York, 1961兲.
strations 共Wiley, New York, 2004兲. 34
A. E. Lawson, Sci. Educ. 9, 577 共2000兲.
10 35
D. R. Sokoloff, R. K. Thornton, and P. W. Laws, RealTime Phys- E. Etkina, A. A. Warren, and M. Gentile, Phys. Teach. 44, 34
ics 共Wiley, New York, 2004兲. 共2006兲.
11 P. W. Laws, Workshop Physics Activity Guide, 2nd ed. 共Wiley, 36 C. A. Chinn and W. F. Brewer, Rev. Educ. Res. 63, 1 共1993兲.
37 M. Lawrence, Ph.D. thesis, Rutgers, the State University of New
New York, 2004兲.
12 L. C. McDermott and P. S. Shaffer, Tutorials in Introductory
Jersey, 1997.
Physics 共Prentice Hall, New York, 1998兲. 38 R. H. Horwood, Sci. Educ. 72, 41 共1988兲.
13 39
See the SCALE-UP project webpage at URL http:// E. Etkina, A. Van Heuvelen, D. T. Brookes, and D. Mills, Phys.
www.ncsu.edu/per/scaleup.html Teach. 40, 351 共2002兲.
14
H. Kruglak, Am. J. Phys. 20, 136 共1952兲. 40
L. W. Anderson and D. R. Krathwohl, A Taxonomy for Learning,
15 H. Kruglak, Am. J. Phys. 22, 442 共1954兲.
Teaching, & Assessing: A Revision of Bloom’s Taxonomy of Edu-
16 H. Kruglak, Am. J. Phys. 22, 452 共1954兲.
cational Objectives 共Longman, New York, 2001兲.
17
H. Kruglak, Am. J. Phys. 23, 82 共1955兲. 41
A. Van Heuvelen, Am. J. Phys. 59, 891 共1991兲.
18 National Assessment Governing Board, Science Framework for 42 D. Rosengrant, A. Van Heuvelen, and E. Etkina, in Proceedings

the 2005 National Assessment of Education Progress 共US De- of the 2005 Physics Education Research Conference, edited by
partment of Education, Washington, DC, 2004兲, URL http:// P. Heron, J. Marx, and L. McCullough, AIP Conference Proc.
www.nagb.org/pubs/s_framwork_05/761907- No. 818 共AIP, New York, 2006兲, pp. 49–52.
43 S. Murthy and E. Etkina, in Proceedings of the 2004 Physics
ScienceFramewo%rk.pdf
19
P. Gonzales, C. Calsyn, L. Jocelyn, K. Mak, D. Kastberg, S. Education Research Conference, AIP Conf. Proc. No. 720 共AIP,
Arafeh, T. Williams, and W. Tsen, Statistical Analysis Report NY, 2005兲, pp. 133–137.
No. NCES 2001-028 2001 共unpublished兲. 44
S. Murthy and E. Etkina, Using rubrics as self-assessment instru-
20 G. Salomon and D. N. Perkins, Educ. Psychol. Meas. 24, 113
ments for students in introductory laboratories, AAPT National
共1989兲. Meeting, Albuquerque, NM, January, 2005.
21 S. Holton and S. Brush, Physics, The Human Adventure 共Rutgers 45 A. Buffler, S. Allie, F. Lubben, and B. Campbell, Int. J. Sci. Educ.

University Press, New Brunswick, New Jersey, 2001兲. 23, 1137 共2001兲.
22 A. E. Lawson, Am. Bio. Teach. 62, 482 共2000兲. 46 R. Lippmann, Ph.D. thesis, University of Maryland, 2003.
23
A. E. Lawson, Int. J. Sci. Educ. 25, 1387 共2003兲. 47
S. M. Barnett and S. J. Ceci, Sankhya, Ser. A 128, 612 共2002兲.
24 B. S. Bloom, Taxonomy of Educational Objectives: The Classifi- 48 M. L. Gick and K. J. Holyoak, Cogn. Sci. 12, 306 共1980兲.
49
cation of Educational Goals, Handbook I Cognitive Domain Study reported in A. Van Heuvelen, Implementing a science pro-
共David McKay Co, Inc., New York, 1956兲. cess approach in large introductory physics courses, talk given at
25 P. Dressel and L. Mayer, General Education: Exploration in
Rutgers University, November, 2004.

020103-15

You might also like