Wroblewski flt808 Assessmentdesigntask Paper

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 10

Sarah Wroblewski

FLT 808 Assessment in Foreign Language Teaching


Aaron Ohlrogge
Summer 2016
Assessment Design Project

Assessment Design Project

Sarah Wroblewski (Hallinen)

Michigan State University MAFLT Program


Background Information

I chose to develop this assessment because it will be very useful in my new district.

After meeting with some of my colleagues they expressed a wish for a more authentic-

materials based midterm assessment (I will discuss this more in the design portion of this

report). While I could not develop an entire midterm on my own due to time and resources,

I created a unit test with similar parameters and for a similar purpose that could be

adapted into the overall midterm exam if necessary. The test will be for 8th grade Spanish 1,

the second level of Spanish offered in the district but the first non-exploratory course.

Students are around 13-15 years old and attend the middle schools. Most students in

Spanish 1 are at a Novice Low to Novice Mid level, as measured on the ACTFL proficiency

scale, meaning they often rely on memorized phrases and isolated words (ACTFL, 2012).

The exam is intended for after the first semester of study, at the end of unit 2. This test

would be used as a summative assessment for the end of the unit, and would be worth

slightly more than other assessments during the unit. The results would only be used

within the classroom context.

As a unit test, students would be the major stakeholders in the exam results, which

would be calculated into their overall grade. However, with over 6 units total this one exam

would be a smaller portion. The exam would be criterion-referenced, or not based on the

performance of others (Hughes, 2003). If a student performed extremely poorly on this

exam it could affect their grade negatively, but the outcome of the course would depend on

many more factors. Parents could also be considered stakeholders as they usually wish to

see their students graduate, and the first level of a language is a requirement for graduation

in Michigan. When it comes to instructional consequences, this exam focuses instruction on


vocabulary acquisition, reading strategies using authentic texts, listening using authentic

excerpts, and writing practice (although somewhat formulaic). As a result, backwash would

likely be positive and largely focused on interpreting authentic materials. I would like to

point out that I am not pleased by using multiple choice for the majority of the exam.

Nevertheless, my district requested it and I am following their guidelines. I will discuss the

advantages and disadvantages of multiple choice later.

Overall Design

As discussed above, the teachers from my district were looking for an exam based

on authentic materials. Today many language educators push the use of authentic

materials, especially in a communicative-based curriculum. Some studies have found

authentic materials to be motivating for learners; while others say they can cause difficulty

due to specific vocabulary (Gilmore, 2007). In truth, Gilmore (2007) argues that there is

little empirical evidence either way. He also points out that there are many definitions of

“authentic”: for instance, type of task, or materials made by native speakers, for native

speakers (Gilmore, 2007). In this case, the district was looking for the latter on the

interpretive portion of the test (listening, reading & vocabulary). Despite a lack of empirical

evidence supporting authentic resources, I decided it would be worthwhile to use them to

develop an assessment as they place a strong focus on communicative competence when

used correctly (Gilmore, 2007). The largest portion of time in developing this test went into

searching for appropriate authentic materials. My district already provided some sources,

but I found additional sources through extended Internet searches. Links to the original

sources (when available) can be found in the exam document.


There are two portions in this exam: the interpretive and presentational portions.

The first assesses the skills of vocabulary, reading, and listening and consists of 25 multiple

choice items that would be scored via scantron. There are many advantages and

disadvantages to this format. Scoring of multiple choice is much quicker and more reliable,

which is very useful if used by a large group of teachers in different buildings, as ours

would be (Hughes, 2003). However, multiple choice questions only test recognition, scores

can be affected by guessing, and if not written properly the questions could be poor

(Hughes, 2003). In creating this portion of the exam I relied heavily on the brownbag

lecture on writing effective multiple choice questions (Ohlrogge, 2014). For example, I tried

to keep prompts short and use simplistic language that students are familiar with. I did my

best to balance the responses and make the distractors plausible (Ohlrogge, 2014). I will

now describe each section of the interpretive portion in turn.

Vocabulary: The first two sections of the exam (A and B) are focused on vocabulary

and in a secondary sense, reading. The vocabulary being assessed comes from the first

three units of the textbook Avancemos with a focus on Unidad 2 about school supplies, class

schedules, and school activities (McDougall Littell, 2010). Due to the use of multiple choice

questions, only recognition can be assessed. Both of these portions could be considered

reading tasks as well because they include an authentic piece of “text” (a map and a chart),

however vocabulary knowledge is the main construct being measured. Although it could be

argued that by including some reading skill in vocabulary assessment validity could be

compromised (Hughes, 2003), I chose this style because it is more authentic and better

reflects tasks students would do in real life. For part A students must know how to tell time,

numbers, class titles, and generally how to read a schedule in order to complete the tasks.
They would also need basic knowledge of question construction in Spanish. Part B contains

a school map and students interpret where various labeled rooms are located by using

their knowledge of prepositions of location (e.g. to the left of, next to, etc.). While basic

map-reading knowledge is necessary, the focus here is on the prepositions in a more

realistic context. In this section I chose to use 3 option multiple choice questions instead of

4 simply because I had trouble creating other distractors and it doesn't make a difference

in results (Ohlrogge, 2016). Overall, I attempted to use authentic materials in the

vocabulary assessment, which could result in a murkier construct, but does align with the

curricular objectives of students being able to interpret communications such as maps and

schedules.

Reading: The second portion of the exam is focused on the skill of reading. Some of

the main objectives in Spanish 1 are for students to be able to skim texts, scan texts, use

context clues to determine unknown words, and identify general ideas; a mixture of

expeditious and careful operations (Hughes, 2003). The text I chose is an Internet forum

post I found on children talking about their opinions of school. The excerpts are very short,

are on a familiar topic, and are similar to what students may see in their daily lives

exploring online. The text does contain many unknown words, but I decided not to simplify

it based on findings that simplifying a text does not make a significant difference in overall

text comprehension (Young, 1999). Questions are once again multiple choice with 4

responses and are in English to avoid any issues in comprehension (Hughes, 2003). I put all

of the questions in the same order as the text and included questions on all portions of the

text as recommended by Ohlrogge (2016). I also tried to include a variety of questions:

some involve inference (question 13), some are local details (11, 12, 14), and there is one
that involves an inference based on cultural knowledge (that Barcelona is in Spain)

(Hughes, 2003). One major issue with this section is that to accurately measure reading

ability it would likely be necessary to include more items and more reading excerpts to give

a better idea of students’ true performance on the construct (Hughes, 2003). While the first

two sections do include some reading skills, on future iterations of this exam or if it were

used as a midterm I would add more. In this case, time constraints limit what is possible for

students in one sitting for a unit test (see administration section).

Listening: Parts D and E of the exam focus on the assessment of listening skill with

the objective of students being able to identify key details as well as the overall message of

the “text”. Both excerpts are authentic in that native speakers created them, although the

second one is likely for language learners. Both are monologues. The pacing of the first

listening on a school is fairly quick for Novice level learners, which is why I included the

video. Vandergrift (2007) discusses how visuals can aid in comprehension: “Visuals can

provide context and non-linguistic input to activate top-down processing… L2 listeners

who view and listen simultaneously appear to use more… strategies to compensate for

inadequate linguistic knowledge than those who only listen” (p. 200). It’s important to note

that some researchers have found video does not distract learners (Wagner, 2007), but

others have found the opposite (Coniam, 2001). I believe comprehension advantages

outweigh potential distraction. The video can also be slowed in speed and instructors could

determine whether this (as well as repetition of the video) is appropriate depending on the

student population. The second excerpt is a teen describing her school schedule. For this

text I chose to keep the questions in Spanish to have students focus on listening for key

words. The questions for both are in order based on the listening, are far enough apart to
avoid missing one, and include 3 responses only to aid students (Ohlrogge, 2016). One

drawback is the same as the reading in that there are a limited number of items, which

would reduce validity and reliability (Hughes, 2003).

For all sections of the interpretive portion students would receive their scores as a

total number out of 25, although each staff member may choose to go over the questions

and answers with their students to provide them with feedback. In my classroom I always

prefer to go over questions with my students to increase self-reflection and metacognition

(Anderson, 2012).

The second portion of the exam is the presentational portion assessing students’

presentational writing skills; in this case their ability to write a personal letter. This is only

one task and not necessarily a representative sample of writing ability, however this would

only be one of likely many summative and formative assessments during the course. The

prompt I included is very specific and offers little choice, as recommended by Hughes

(2003) to restrict candidates and make scoring more reliable. I also chose to use an analytic

rubric as a “heterogeneous… less well-trained group” (Hughes, 2003) will be the ones

scoring the writing samples. The rubric is based off of the Jacobs et. al. (1981) weighted

scoring profile and is out of 100 points total (Hughes, 2003). East (2009) constructed a

highly reliable analytic rubric specific to foreign language assessment (not ESL), so I also

used input from his rubric in constructing my own. The categories rated include (in order

of importance): content, coherence (organization), vocabulary, grammar (language use),

and mechanics. For more details, see the rubric instrument.

Students will respond to the prompts by writing their letter on the sheet provided

using paper and pencil without the use of a dictionary. Multiple scoring of this assessment
would be ideal, with at least 2 Spanish teachers rating each sample. Feedback for this

portion would be based off of Hughes’ (2003) recommendations of non-writing specific as

well as writing-specific feedback and teacher training would need to be given on how to

best rate as well as give feedback. It’s likely some teachers would weight this writing

assessment overall at 50% or out of 50 points rather than 100 depending on their course

grading scale.

Administration

This test would likely need to be given in one 50 minute class period, which is why

there were limitations on how many items could be included. The writing portion could

take longer and repeating the listening could also increase the time of administration, and

some instructors could choose to use 2 class periods. Each classroom teacher would

administer the assessment in his or her own class in a paper-based format. Required

materials include test booklets, scantrons, cover sheets, pencils, a video-audio projector for

playing the listening excerpts, papers for the writing portion, and rubrics for each student

for ratings. All of these resources would be easily accessible. Some teachers may choose to

have students listen to the excerpts on their own and in that case each student would need

a laptop with the video and audio downloaded as well as headphones.

Conclusion

While it is difficult to ascertain reliability and validity of this assessment instrument

without piloting and analyzing it, by following the recommendations laid out in our course

on assessment and the input of other researchers, I believe this assessment is at least a

basis for future revisions.


References

American Council on the Teaching of Foreign Languages. ACTFL Proficiency Guidelines

2012. (2012). Retrieved from https://www.actfl.org/publications/guidelines-and-

manuals/actfl-proficiency-guidelines-2012

Anderson, N. J. (2012). Student Involvement in Assessment: Healthy Self-Assessment and

Effective Peer Assessment. In Coombe, C., Davidson, P., Sullivan, B. & Stoynoff, S.

(Eds.), The Cambridge Guide to Second Language Assessment (187-197). New York:

Cambridge University Press.

Coniam, D. (2001). The use of audio or video comprehension as an assessment instrument

in the certification of English language teachers: A case study. System 29, 1–14. doi:

10.1016/S0346-251X(00)00057-9

East, M. (2009). Evaluating the Reliability of a Detailed Analytic Scoring Rubric for Foreign

Language Writing. Assessing Writing 14, 88-115. doi: 10.1016/j.asw.2009.04.001

Gilmore, A. (2007). Authentic Materials and Authenticity in Foreign Language Learning.

Language Teaching, 40, 97-118. doi: 10.1017/S0261444807004144

McDougal Littell (2010). Classzone: ¡Avancemos! Level 1. Houghton Mifflin Harcourt.

Retrieved from

http://classzone.com/cz/books/avancemos_1/book_home.htm?state=KS

Hughes, A. (2003). Testing for Language Teachers ( 2nd ed.). Cambridge, UK: Cambridge

University Press.

Ohlrogge, A. (2014). CeLTA Language Learner Training. Multiple Choice Items: The Art and

the Science. Lecture videos retrieved from

http://learninglanguages.celta.msu.edu/writing-multiple-choice-items/
Ohlrogge, A. (2016). Module 4 Part 1_Reading and Listening & Module 4 Part 2_Grammar

and Vocabulary lecture slides (PowerPoint document). Retrieved from

https://d2l.msu.edu/d2l/le/content/423528/Home?itemIdentifier=D2L.LE.Content

.ContentObject.ModuleCO-3650854

Vandergrift, L. (2007). Recent Developments in Second and Foreign Language Listening

Comprehension Research. Language Teaching, 40, 191-210. doi:

10.1017/S0261444807004338

Wagner, E. (2007). Are they watching? Test-taker viewing behaviour during an L2 video

listening test. Language Learning & Technology 11.1, 67–86. Retrieved from

http://llt.msu.edu/vol11num1/wagner/

Young, D. N. (1999). Lingustic Simplification of SL Reading Material: Effective Instructional

Practice? The Modern Language Journal, 83, 350-366. doi: 10.1111/0026-

7902.00027

You might also like