Wroblewski flt808 Assessmentdesigntask Paper

Sarah Wroblewski
FLT 808 Assessment in Foreign Language Teaching

Aaron Ohlrogge
Summer 2016
Assessment Design Project
Assessment Design Project
Sarah Wroblewski (Hallinen)
Michigan State University MAFLT Program

Background Information
I chose to develop this assessment because it will be very useful in my new district.
After meeting with some of my colleagues they expressed a wish for a more authentic-
materials based midterm assessment (I will discuss this more in the design portion of this
report). While I could not develop an entire midterm on my own due to time and resources,
I created a unit test with similar parameters and for a similar purpose that could be
adapted into the overall midterm exam if necessary. The test will be for 8th grade Spanish 1,
the second level of Spanish offered in the district but the first non-exploratory course.
Students are around 13-15 years old and attend the middle schools. Most students in
Spanish 1 are at a Novice Low to Novice Mid level, as measured on the ACTFL proficiency
scale, meaning they often rely on memorized phrases and isolated words (ACTFL, 2012).
The exam is intended for after the first semester of study, at the end of unit 2. This test
would be used as a summative assessment for the end of the unit, and would be worth
slightly more than other assessments during the unit. The results would only be used
within the classroom context.
As a unit test, students would be the major stakeholders in the exam results, which
would be calculated into their overall grade. However, with over 6 units total this one exam
would be a smaller portion. The exam would be criterion-referenced, or not based on the
performance of others (Hughes, 2003). If a student performed extremely poorly on this
exam it could affect their grade negatively, but the outcome of the course would depend on
many more factors. Parents could also be considered stakeholders as they usually wish to
see their students graduate, and the first level of a language is a requirement for graduation
in Michigan. When it comes to instructional consequences, this exam focuses instruction on

vocabulary acquisition, reading strategies using authentic texts, listening using authentic
excerpts, and writing practice (although somewhat formulaic). As a result, backwash would
likely be positive and largely focused on interpreting authentic materials. I would like to
point out that I am not pleased by using multiple choice for the majority of the exam.
Nevertheless, my district requested it and I am following their guidelines. I will discuss the
advantages and disadvantages of multiple choice later.
Overall Design
As discussed above, the teachers from my district were looking for an exam based
on authentic materials. Today many language educators push the use of authentic
materials, especially in a communicative-based curriculum. Some studies have found
authentic materials to be motivating for learners; while others say they can cause difficulty
due to specific vocabulary (Gilmore, 2007). In truth, Gilmore (2007) argues that there is
little empirical evidence either way. He also points out that there are many definitions of
“authentic”: for instance, type of task, or materials made by native speakers, for native
speakers (Gilmore, 2007). In this case, the district was looking for the latter on the
interpretive portion of the test (listening, reading & vocabulary). Despite a lack of empirical
evidence supporting authentic resources, I decided it would be worthwhile to use them to
develop an assessment as they place a strong focus on communicative competence when
used correctly (Gilmore, 2007). The largest portion of time in developing this test went into
searching for appropriate authentic materials. My district already provided some sources,
but I found additional sources through extended Internet searches. Links to the original
sources (when available) can be found in the exam document.

There are two portions in this exam: the interpretive and presentational portions.
The first assesses the skills of vocabulary, reading, and listening and consists of 25 multiple
choice items that would be scored via scantron. There are many advantages and
disadvantages to this format. Scoring of multiple choice is much quicker and more reliable,
which is very useful if used by a large group of teachers in different buildings, as ours
would be (Hughes, 2003). However, multiple choice questions only test recognition, scores
can be affected by guessing, and if not written properly the questions could be poor
(Hughes, 2003). In creating this portion of the exam I relied heavily on the brownbag
lecture on writing effective multiple choice questions (Ohlrogge, 2014). For example, I tried
to keep prompts short and use simplistic language that students are familiar with. I did my
best to balance the responses and make the distractors plausible (Ohlrogge, 2014). I will
now describe each section of the interpretive portion in turn.
Vocabulary: The first two sections of the exam (A and B) are focused on vocabulary
and in a secondary sense, reading. The vocabulary being assessed comes from the first
three units of the textbook Avancemos with a focus on Unidad 2 about school supplies, class
schedules, and school activities (McDougall Littell, 2010). Due to the use of multiple choice
questions, only recognition can be assessed. Both of these portions could be considered
reading tasks as well because they include an authentic piece of “text” (a map and a chart),
however vocabulary knowledge is the main construct being measured. Although it could be
argued that by including some reading skill in vocabulary assessment validity could be
compromised (Hughes, 2003), I chose this style because it is more authentic and better
reflects tasks students would do in real life. For part A students must know how to tell time,
numbers, class titles, and generally how to read a schedule in order to complete the tasks.
They would also need basic knowledge of question construction in Spanish. Part B contains
a school map and students interpret where various labeled rooms are located by using
their knowledge of prepositions of location (e.g. to the left of, next to, etc.). While basic
map-reading knowledge is necessary, the focus here is on the prepositions in a more
realistic context. In this section I chose to use 3 option multiple choice questions instead of
4 simply because I had trouble creating other distractors and it doesn't make a difference
in results (Ohlrogge, 2016). Overall, I attempted to use authentic materials in the
vocabulary assessment, which could result in a murkier construct, but does align with the
curricular objectives of students being able to interpret communications such as maps and
schedules.
Reading: The second portion of the exam is focused on the skill of reading. Some of
the main objectives in Spanish 1 are for students to be able to skim texts, scan texts, use
context clues to determine unknown words, and identify general ideas; a mixture of
expeditious and careful operations (Hughes, 2003). The text I chose is an Internet forum
post I found on children talking about their opinions of school. The excerpts are very short,
are on a familiar topic, and are similar to what students may see in their daily lives
exploring online. The text does contain many unknown words, but I decided not to simplify
it based on findings that simplifying a text does not make a significant difference in overall
text comprehension (Young, 1999). Questions are once again multiple choice with 4
responses and are in English to avoid any issues in comprehension (Hughes, 2003). I put all
of the questions in the same order as the text and included questions on all portions of the
text as recommended by Ohlrogge (2016). I also tried to include a variety of questions:
some involve inference (question 13), some are local details (11, 12, 14), and there is one
that involves an inference based on cultural knowledge (that Barcelona is in Spain)
(Hughes, 2003). One major issue with this section is that to accurately measure reading
ability it would likely be necessary to include more items and more reading excerpts to give
a better idea of students’ true performance on the construct (Hughes, 2003). While the first
two sections do include some reading skills, on future iterations of this exam or if it were
used as a midterm I would add more. In this case, time constraints limit what is possible for
students in one sitting for a unit test (see administration section).
Listening: Parts D and E of the exam focus on the assessment of listening skill with
the objective of students being able to identify key details as well as the overall message of
the “text”. Both excerpts are authentic in that native speakers created them, although the
second one is likely for language learners. Both are monologues. The pacing of the first
listening on a school is fairly quick for Novice level learners, which is why I included the
video. Vandergrift (2007) discusses how visuals can aid in comprehension: “Visuals can
provide context and non-linguistic input to activate top-down processing… L2 listeners
who view and listen simultaneously appear to use more… strategies to compensate for
inadequate linguistic knowledge than those who only listen” (p. 200). It’s important to note
that some researchers have found video does not distract learners (Wagner, 2007), but
others have found the opposite (Coniam, 2001). I believe comprehension advantages
outweigh potential distraction. The video can also be slowed in speed and instructors could
determine whether this (as well as repetition of the video) is appropriate depending on the
student population. The second excerpt is a teen describing her school schedule. For this
text I chose to keep the questions in Spanish to have students focus on listening for key
words. The questions for both are in order based on the listening, are far enough apart to
avoid missing one, and include 3 responses only to aid students (Ohlrogge, 2016). One
drawback is the same as the reading in that there are a limited number of items, which
would reduce validity and reliability (Hughes, 2003).
For all sections of the interpretive portion students would receive their scores as a
total number out of 25, although each staff member may choose to go over the questions
and answers with their students to provide them with feedback. In my classroom I always
prefer to go over questions with my students to increase self-reflection and metacognition
(Anderson, 2012).
The second portion of the exam is the presentational portion assessing students’
presentational writing skills; in this case their ability to write a personal letter. This is only
one task and not necessarily a representative sample of writing ability, however this would
only be one of likely many summative and formative assessments during the course. The
prompt I included is very specific and offers little choice, as recommended by Hughes
(2003) to restrict candidates and make scoring more reliable. I also chose to use an analytic
rubric as a “heterogeneous… less well-trained group” (Hughes, 2003) will be the ones
scoring the writing samples. The rubric is based off of the Jacobs et. al. (1981) weighted
scoring profile and is out of 100 points total (Hughes, 2003). East (2009) constructed a
highly reliable analytic rubric specific to foreign language assessment (not ESL), so I also
used input from his rubric in constructing my own. The categories rated include (in order
of importance): content, coherence (organization), vocabulary, grammar (language use),
and mechanics. For more details, see the rubric instrument.
Students will respond to the prompts by writing their letter on the sheet provided
using paper and pencil without the use of a dictionary. Multiple scoring of this assessment
would be ideal, with at least 2 Spanish teachers rating each sample. Feedback for this
portion would be based off of Hughes’ (2003) recommendations of non-writing specific as
well as writing-specific feedback and teacher training would need to be given on how to
best rate as well as give feedback. It’s likely some teachers would weight this writing
assessment overall at 50% or out of 50 points rather than 100 depending on their course
grading scale.
Administration
This test would likely need to be given in one 50 minute class period, which is why
there were limitations on how many items could be included. The writing portion could
take longer and repeating the listening could also increase the time of administration, and
some instructors could choose to use 2 class periods. Each classroom teacher would
administer the assessment in his or her own class in a paper-based format. Required
materials include test booklets, scantrons, cover sheets, pencils, a video-audio projector for
playing the listening excerpts, papers for the writing portion, and rubrics for each student
for ratings. All of these resources would be easily accessible. Some teachers may choose to
have students listen to the excerpts on their own and in that case each student would need
a laptop with the video and audio downloaded as well as headphones.
Conclusion
While it is difficult to ascertain reliability and validity of this assessment instrument
without piloting and analyzing it, by following the recommendations laid out in our course
on assessment and the input of other researchers, I believe this assessment is at least a
basis for future revisions.

References
American Council on the Teaching of Foreign Languages. ACTFL Proficiency Guidelines
2012. (2012). Retrieved from https://www.actfl.org/publications/guidelines-and-
manuals/actfl-proficiency-guidelines-2012
Anderson, N. J. (2012). Student Involvement in Assessment: Healthy Self-Assessment and
Effective Peer Assessment. In Coombe, C., Davidson, P., Sullivan, B. & Stoynoff, S.
(Eds.), The Cambridge Guide to Second Language Assessment (187-197). New York:
Cambridge University Press.
Coniam, D. (2001). The use of audio or video comprehension as an assessment instrument
in the certification of English language teachers: A case study. System 29, 1–14. doi:
10.1016/S0346-251X(00)00057-9
East, M. (2009). Evaluating the Reliability of a Detailed Analytic Scoring Rubric for Foreign
Language Writing. Assessing Writing 14, 88-115. doi: 10.1016/j.asw.2009.04.001
Gilmore, A. (2007). Authentic Materials and Authenticity in Foreign Language Learning.
Language Teaching, 40, 97-118. doi: 10.1017/S0261444807004144
McDougal Littell (2010). Classzone: ¡Avancemos! Level 1. Houghton Mifflin Harcourt.
Retrieved from
http://classzone.com/cz/books/avancemos_1/book_home.htm?state=KS
Hughes, A. (2003). Testing for Language Teachers ( 2nd ed.). Cambridge, UK: Cambridge
University Press.
Ohlrogge, A. (2014). CeLTA Language Learner Training. Multiple Choice Items: The Art and
the Science. Lecture videos retrieved from
http://learninglanguages.celta.msu.edu/writing-multiple-choice-items/
Ohlrogge, A. (2016). Module 4 Part 1_Reading and Listening & Module 4 Part 2_Grammar
and Vocabulary lecture slides (PowerPoint document). Retrieved from
https://d2l.msu.edu/d2l/le/content/423528/Home?itemIdentifier=D2L.LE.Content
.ContentObject.ModuleCO-3650854
Vandergrift, L. (2007). Recent Developments in Second and Foreign Language Listening
Comprehension Research. Language Teaching, 40, 191-210. doi:
10.1017/S0261444807004338
Wagner, E. (2007). Are they watching? Test-taker viewing behaviour during an L2 video
listening test. Language Learning & Technology 11.1, 67–86. Retrieved from
http://llt.msu.edu/vol11num1/wagner/
Young, D. N. (1999). Lingustic Simplification of SL Reading Material: Effective Instructional
Practice? The Modern Language Journal, 83, 350-366. doi: 10.1111/0026-
7902.00027

Wroblewski flt808 Assessmentdesigntask Paper

Uploaded by

Copyright:

Available Formats

You might also like

Wroblewski flt808 Assessmentdesigntask Paper

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Wroblewski flt808 Assessmentdesigntask Paper

Uploaded by

Copyright:

Available Formats

Sarah Wroblewski

FLT 808 Assessment in Foreign Language Teaching

Assessment Design Project

Sarah Wroblewski (Hallinen)

Michigan State University MAFLT Program

within the classroom context.

performance of others (Hughes, 2003). If a student performed extremely poorly on this

in Michigan. When it comes to instructional consequences, this exam focuses instruction on

advantages and disadvantages of multiple choice later.

materials, especially in a communicative-based curriculum. Some studies have found

evidence supporting authentic resources, I decided it would be worthwhile to use them to

develop an assessment as they place a strong focus on communicative competence when

sources (when available) can be found in the exam document.

now describe each section of the interpretive portion in turn.

map-reading knowledge is necessary, the focus here is on the prepositions in a more

in results (Ohlrogge, 2016). Overall, I attempted to use authentic materials in the

text as recommended by Ohlrogge (2016). I also tried to include a variety of questions:

students in one sitting for a unit test (see administration section).

provide context and non-linguistic input to activate top-down processing… L2 listeners

would reduce validity and reliability (Hughes, 2003).

prefer to go over questions with my students to increase self-reflection and metacognition

of importance): content, coherence (organization), vocabulary, grammar (language use),

and mechanics. For more details, see the rubric instrument.

portion would be based off of Hughes’ (2003) recommendations of non-writing specific as

a laptop with the video and audio downloaded as well as headphones.

While it is difficult to ascertain reliability and validity of this assessment instrument

basis for future revisions.

American Council on the Teaching of Foreign Languages. ACTFL Proficiency Guidelines

2012. (2012). Retrieved from https://www.actfl.org/publications/guidelines-and-

Anderson, N. J. (2012). Student Involvement in Assessment: Healthy Self-Assessment and

Cambridge University Press.

Coniam, D. (2001). The use of audio or video comprehension as an assessment instrument

Language Writing. Assessing Writing 14, 88-115. doi: 10.1016/j.asw.2009.04.001

Gilmore, A. (2007). Authentic Materials and Authenticity in Foreign Language Learning.

Language Teaching, 40, 97-118. doi: 10.1017/S0261444807004144

McDougal Littell (2010). Classzone: ¡Avancemos! Level 1. Houghton Mifflin Harcourt.

the Science. Lecture videos retrieved from

and Vocabulary lecture slides (PowerPoint document). Retrieved from

Vandergrift, L. (2007). Recent Developments in Second and Foreign Language Listening

Comprehension Research. Language Teaching, 40, 191-210. doi:

Young, D. N. (1999). Lingustic Simplification of SL Reading Material: Effective Instructional

Practice? The Modern Language Journal, 83, 350-366. doi: 10.1111/0026-

You might also like