Download as pdf or txt
Download as pdf or txt
You are on page 1of 31

Brigham Young University

BYU ScholarsArchive

Faculty Publications

2017

Understanding Intermediate- Level Speakers’ Strengths and


Weaknesses: An Examination of OPIc Tests From Korean
Learners of English
Troy L. Cox
Brigham Young University, troyc@byu.edu

Follow this and additional works at: https://scholarsarchive.byu.edu/facpub

Part of the Arts and Humanities Commons

Original Publication Citation


Cox, T. (2017) Understanding Intermediate-level Speakers Strengths and Weaknesses: An
Examination of OPIc Tests from Korean Learners of English. Foreign Language Annals, 50(1),
XX-XX

BYU ScholarsArchive Citation


Cox, Troy L., "Understanding Intermediate- Level Speakers’ Strengths and Weaknesses: An Examination of
OPIc Tests From Korean Learners of English" (2017). Faculty Publications. 5884.
https://scholarsarchive.byu.edu/facpub/5884

This Peer-Reviewed Article is brought to you for free and open access by BYU ScholarsArchive. It has been
accepted for inclusion in Faculty Publications by an authorized administrator of BYU ScholarsArchive. For more
information, please contact ellen_amatangelo@byu.edu.
84 SPRING 2017

Understanding Intermediate-
Level Speakers’ Strengths and
Weaknesses: An Examination of
OPIc Tests From Korean
Learners of English
Troy L. Cox
Brigham Young University

Abstract: This study profiled Intermediate-level learners in terms of their linguistic


characteristics and performance on different proficiency tasks. A stratified random
sample of 300 Korean learners of English with holistic ratings of Intermediate Low
(IL), Intermediate Mid (IM), and Intermediate High (IH) on Oral Proficiency Inter-
views-computerized (OPIcs)—100 at each level—were analyzed by trained ACTFL
raters to determine what was needed for the learners to progress to the next higher
sublevel. The findings indicate that while ILs minimally met all the linguistic character-
istics required of the Intermediate level, they needed to improve in the quantity and
quality of all the linguistic characteristics they employed and improve their mastery of
the types and variety of questions they could use when performing Intermediate tasks to
move to the IM sublevel. In contrast, IMs demonstrated a pattern of strength when
completing Intermediate tasks, but to move to the IH sublevel they needed to improve
their ability to perform all Advanced-level tasks, especially in terms of accuracy when
using paragraph-length discourse. Similar to the IMs, for the IHs to move to the
Advanced Low sublevel, they needed to improve their accuracy with paragraph-length
discourse and expand their content mastery to beyond the autobiographical.

Key words: English as a foreign/second language, oral proficiency

Introduction
In a recent audit of oral proficiency test results from a large university that the author
conducted, it was discovered that a single student had taken either the Oral
Proficiency Interview (OPI) or the Oral Proficiency Interview-computerized

Troy L. Cox (PhD, Brigham Young University) is Associate Director of Research


and Assessment, Center for Language Studies, Brigham Young University, Provo,
Utah.
Foreign Language Annals, Vol. 50, Iss. 1, pp. 84–113. © 2017 by American Council on the Teaching of Foreign
Languages.
DOI: 10.1111/flan.12258
Foreign Language Annals  VOL. 50, NO. 1 85

(OPIc) nine times over a 3-year period. language” (ACTFL, 2012d, p. 4). The profi-
Further examination revealed that this stu- ciency guidelines (ACTFL, 2012c) have
dent, a language teaching minor who long been represented as an inverted pyra-
needed a rating of Advanced Low for in- mid, which illustrates that language learn-
structor licensure, was languishing at the ing is not linear but rather that the
Intermediate level. After an initial OPI score progression from one level to the next can
of Intermediate High (IH) in 2013, the next be best represented as a pattern of geometric
four tests resulted in ratings of Intermediate growth. When envisioning the inverse pyr-
Mid (IM), while the final four tests were amid, the geometric area in the Novice and
rated IH. Reaching the Advanced level Intermediate tiers is much smaller than that
is critical for those pursuing teaching of the higher levels. However, the skills that
licensure (Brooks & Darhower, 2014; are acquired at those levels form the struc-
Chambless, 2012), and this student’s lack tural foundation upon which the higher
of progression toward higher proficiency on levels are built. For example, while the abil-
the ACTFL scale represented a real-world ity to narrate in the past is a critical charac-
example of the importance of understand- teristic of Advanced-level communication,
ing the characteristics of speech and the language learners usually first learn to re-
types of tasks that are required to progress port events that have taken place in strings
through the three sublevels that constitute of sentences using the simple past. How-
the Intermediate level and move into the ever, the learner who does not develop
Advanced range. However, since the rating the ability to use paragraph-length dis-
is holistic, information on the specific as- course will not be able to progress beyond
pects of a test taker’s performance that pre- the Intermediate level (ACTFL, 2012a).
vent that person from being rated at the next Communicative habits that seem to appro-
adjacent level is not documented, nor is it priately convey meaning but that are not
provided in the final rating. Thus, there can corrected and extended become ingrained
be a disconnect between what the examin- and thus impede progress into and beyond
ees see as their rating, the information that the Advanced level. These fossilized errors
instructors provide to students about the in essence become faulty girders and beams
assessment and the rating system, and that are incapable of supporting the increas-
what raters are attending to when assigning ing communicative weight when learners
ratings. The purpose of this study was to are required to carry out more sophisticated
examine information that is not tradition- functions and address more robust and var-
ally available to either test takers or instruc- ied content. Thus, understanding the devel-
tors so as to provide more detailed opmental stages through which learners
information about the specific profiles of progress is vital in assisting students in their
speakers who received the same proficiency language-learning journey, both within a
rating within the Intermediate range and particular level but also from one level
determine how a test taker’s skills along into the next.
four linear axes (function, text type, con- Although the ACTFL guidelines were
tent, and accuracy) contributed to their introduced in 1982 (Liskin-Gasparro,
final, global rating. 2003) and the ACTFL recently certified
the 1,000th OPI tester worldwide (ACTFL,
2016), it is quite likely that many foreign
Background language educators may still be unclear
The ACTFL defines proficiency as the “abil- about how exactly to use them to improve
ity to use a language to communicate mean- student learning outcomes. While there are
ingful information in a spontaneous more than 4,000 institutions of higher edu-
interaction, and in a manner acceptable cation and more than 35,000 high schools
and appropriate to native speakers of the (U.S. Department of Education, 2016a,
86 SPRING 2017

2016b) across the United States, only a frac- certain topic domains or with certain lin-
tion of the secondary and postsecondary guistic features, he or she is unable to sus-
institutions (1 in every 390) has certified tain this level of performance across the
personnel to assist in assessment and lead requisite range of topics, tasks, or linguistic
proficiency-oriented curricular revisions. features with the required level of accuracy
Even though some institutions have insti- and thus does not demonstrate Advanced-
tuted curriculum-wide training in profi- level ability.
ciency assessment (Brooks & Darhower, As shown in Table 2, the fundamental
2014; Gouoni & Feyten, 1999), many for- difference between speech that is rated at
eign language educators must rely on writ- any one of the three Intermediate sublevels
ten descriptions of the scale with little (Low, Mid, High) lies in the quality and
understanding of how the descriptors relate quantity of the examinee’s language when
to actual language production. The result is engaged in at-level tasks (Clifford, 2016).
that a huge segment of the foreign language The Low sublevel is indicative of a speaker
education community is left with an under- who just barely demonstrates competence
standing of the guidelines that is cursory at when performing the tasks for the major
best or reductive to certain grammatical level. Meanwhile, a rating at the Mid sub-
forms at worst. level indicates that the speaker fulfills all the
In simple terms, each major proficiency requirements of the major level with suffi-
level (Novice, Intermediate, Advanced, cient quantity and quality of language
Superior, and Distinguished) is defined as across the assessment criteria. There is no
a confluence of four domains: function, text doubt that the examinee can perform the
type, content, and accuracy. These features functions of that major level; indeed, the
are defined in more detail in Table 1. response is much more substantial than
While learners may progress in a linear that of a speaker at the Low sublevel. The
way on each of these characteristics, con- High sublevel rating indicates that the
joint mastery of multiple linear character- speaker demonstrates a robust ability to
istics is necessary for movement through meet the criteria for the proficiency level
one major level and into the next. Obtaining in question and that he or she also attempts
a rating at the next higher level only occurs and executes with success some of the tasks
through sustained performance of the lower and can often—but not always—meet the
levels (ACTFL, 2012a; Clifford, 2016). Be- related expectations for text type, context,
fore a rating can be awarded, the speaker and level of accuracy that are required at the
must demonstrate a sustained level or next higher (adjacent) major level—in this
“floor” of performance across tasks, text case, Advanced. Thus, a rating of IH indi-
type, content, and accuracy (see Table 2) cates that the speaker exhibits Advanced-
as well as a breakdown level or “ceiling” in level performance most, but not all of the
which the examinee can no longer sustain time, by either exhibiting all the traits of the
performance in one or more of the four Advanced level in certain topic domains and
domains (ACTFL, 2012a). For examinees not others, or by exhibiting Advanced fea-
in the Intermediate range, the floor is the tures such as text type and fluency but not
ability to create with language in sentence- others such as pronunciation or grammati-
length utterances that demonstrate control cal accuracy.
over the content that is needed in daily life; While a number of studies have looked
the ceiling is the ability to use paragraph- at the validity of the OPI and the use of its
length discourse to narrate and describe scale in oral proficiency testing (Dandonoli
topics of personal and community interest & Henning, 1990; Halleck, 1996; Surface &
in all major time frames. While an Interme- Dierdorff, 2003; Thompson, 1995, 1996),
diate-level speaker may exhibit some char- little empirical research has specifically
acteristics of Advanced-level proficiency in sought to document examinees’ strengths
Foreign Language Annals  VOL. 50, NO. 1 87

TABLE 1
Speech Characteristics Analyzed by Area of Focus
Area of Focus Characteristics Description
of Speech

Function Focus on topic/ The degree to which the examinee


task completed the task presented as defined
by the major level
Text type Text length The extent to which the amount of language
completed the function of the task (words
and phrases, sentences, strings of
sentences, or connected paragraphs)
Discourse The extent to which the text was organized
organization appropriately and the use of appropriate
cohesive markers to organize speech
Content Vocabulary use The quantity and quality of lexicon needed
to accomplish the task appropriately
Accuracy/ Fluency The extent to which the rate of speech,
comprehensibility length of runs, pauses, and other timing
expectations features affected the comprehensibility of
the message for a native listener
Pronunciation The extent to which individual words and
phrases were articulated in a way that was
comprehensible to the listener
Grammatical/ The degree of control of the grammar/syntax
structural needed to accomplish the task in a way
accuracy that was comprehensible to the listener

and weaknesses at each sublevel within a In an attempt to help instructors under-


major level, primarily because the single ho- stand differences in levels, Liskin-Gasparro
listic rating of the speech sample as a whole (1996) analyzed the communication strate-
results in a lack of transparency about ex- gies of IH and Advanced Low (AL) Spanish
actly what such a rating means and on which speakers and found that the AL speakers
dimensions a test taker showed strength or used a broader range of communicative
weakness. Thus, while the small percentage strategies; however, she did not analyze
of instructors who have received formal OPI other aspects of the interview samples and
training can intuit the reason their students did not look at the differences among the
may have received a particular score, the Intermediate sublevels. Apart from this
large number of instructors who have less study, most of the research has focused on
familiarity with the scale may: learners’ expected proficiency outcomes at
particular points in their program of study
 fail to understand the conjunctive nature (Carroll, 1967; Chambless, 2012; Glisan &
of a proficiency rating (Clifford, 2016), Foltz, 1998; Gouoni & Feyten, 1999) or has
 overestimate their own students’ abilities compared learners’ results on the two alter-
(Levine & Haus, 1987), or nate forms of the assessment, the OPIc and
 confound the performance of rehearsed the OPI (Surface, Poncheri, & Bhavsar,
material with proficiency (Cox, Bown, & 2008; SWA Consulting, 2009; Thompson,
Burdis, 2015). Cox, & Knapp, 2016). In contrast, this
88 SPRING 2017

TABLE 2
Floor and Ceiling Performance of Intermediate Speakers
Level Floor (or Intermediate Ceiling (or Advanced Criteria)
Criteria)

Function Create with language Narrate and describe in major time


Participate in simple frames (past, present, and future)
conversations Linguistically negotiate situations with
Ask and answer questions complications
Text Sentences Paragraphs
Type
Content Self Self
Daily life Daily life
Nonautobiographical topics
Topics of general interest
Accuracy Understood by people Can be understood without confusion
accustomed to speaking to by monolinguals not accustomed to
nonnative speakers speaking to nonnative speakers

study sought to determine how the scale is wrote a technical report (Cox, 2015). The data
operationalized. That is, the purpose of the from that report form the basis for the current
study was to look into the black box, so to article. To determine commonly manifested
speak, of the Intermediate level to find em- Intermediate-level speech characteristics and
pirical data and identify the patterns of the discover what prevents a test taker from being
linguistic strengths and weaknesses of IL, rated at the next higher adjacent level, experi-
IM, and IH speakers when they carried out enced ACTFL raters were recruited to analyze
different types of tasks. The study addressed existing assessment data from an OPIc.
the following questions:

1. What are the most common linguistic


Raters
Nine raters selected from the ACTFL’s cer-
features of speakers at each Intermediate
tified OPIc rater pool were recruited: seven
sublevel (IL, IM, IH)? Which character-
were certified ACTFL OPI testers, six were
istics prevent speakers from being rated
certified ACTFL OPI trainers, three were
at the next higher sublevel?
part of the original OPIc development
2. How well do speakers at each Intermediate
team, and eight were members of the
sublevel (IL, IM, IH) perform on different
OPIc Quality Assurance team. The
task types that operationalize the criteria of
strength of using trained raters guaranteed
Intermediate and Advanced proficiency?
that the feedback provided was from ex-
Which task types prevent speakers from
perts who know the scale intimately. Al-
being rated at the next higher sublevel?
though the approach is susceptible to the
criticism of confirmation bias (raters’ feed-
back could possibly have been based on
Method the language in the descriptors rather than
To answer the research questions, the author on unbiased observation), this susceptibil-
analyzed data from a research report that ity was deemed acceptable (1) due to the
CREDU (a subsidiary of Samsung) had lack of research into what trained raters
commissioned the ACTFL to study and then think as they rate speech samples, and (2)
Foreign Language Annals  VOL. 50, NO. 1 89

because only highly experienced raters can To answer the first research question,
provide the necessary analysis of the inter- the approach differed for each of the three
nal mechanisms that result in ratings sublevels (IL, IM, and IH).
across the Intermediate range. While raters
typically score OPIcs holistically (ACTFL,  For IL speakers, five Intermediate tasks
2012b), for this study, the raters scored the and one Advanced task were analyzed.
tests analytically by examining specific lin- The objective of examining more Inter-
guistics features and tasks, analyzing in mediate tasks at this level was to deter-
depth the difficulty of the types of tasks mine in which domains the ILs needed to
associated with the functions of the Inter- improve both the quantity and quality of
mediate and Advanced levels, and deter- their responses and on which functions
mining the extent to which different test takers needed to improve in order to
speech features were present in those reach IM.
level-specific tasks at each of the two ma-  For IM speakers, four Intermediate tasks
jor levels (Intermediate and Advanced). To and three Advanced tasks were analyzed.
gather qualitative data, raters also had the The Intermediate tasks provided a basis
opportunity to share comments on the of comparison between the IL’s threshold
specific tasks and on the speech samples performance and the IM’s strong perfor-
that they rated. mance at the Intermediate level. The Ad-
vanced tasks provided direct information
on what the examinees needed to do to
Examinee Data move up the scale to the IH rating and
To control for the variance between native
beyond. It is important to note that one
and target language learning, the study was
moves from IM to IH primarily by focus-
limited to Korean-speaking adults who
ing on and improving the ability with
were learning English, who were taking
Advanced-level tasks, although improv-
the English OPIc, and who were at differ-
ing performance in Intermediate-level
ent OPI levels. All exams were chosen
tasks happens as well.
from the existing pool of OPIc assessments
 For IH speakers, six Advanced tasks were
taken by Korean test takers. To meet the
analyzed. A High rating indicates evi-
selection criteria, each assessment had to
dence of performance of all Advanced
have been previously double or triple-
task types most of the time yet an inability
rated by raters who had been in exact
to sustain that performance. Therefore, to
agreement on the sublevel awarded (e.g.,
determine the linguistic features of IH,
all raters independently rated an examinee
the most useful information would
as IM). From the list of double- and triple-
come from an analysis of learners’ perfor-
rated exams, stratified random sampling
mance with Advanced-level tasks.
was used to select 100 exams at each sub-
level (Low, Mid, and High) for a total of
To answer the second research ques-
300 exams.
tion, a subset of task types was selected
for detailed analysis from among the 15
Design items on each form of the OPIc (Novice
A connected design was used, in which all High to IM or IM to Advanced). This served
the raters analyzed a subset of examinees to reduce the amount of time that was
and tasks from existing OPIcs as a way to needed to analyze the profile of any individ-
verify that all the raters were applying the ual examinee, thus resulting in a broader
same criteria. Just a single rater then ana- sampling of different examinees from which
lyzed the subsequent examinees/tasks to generalizations could be drawn. Since each
allow for the broadest possible survey of form was individually tailored to the exam-
examinee response types. inee, it was not possible to analyze items at
90 SPRING 2017

the question level; however, since each moderate ability with the other skills. A
question represented a specific task type second rubric was adapted from one that
(see Table 3), the speech samples that had been previously employed in a heritage
were represented in all of the interviews language study (Swender, Martin, Rivera-
were fundamentally equivalent. Martinez, & Kagan, 2014) with a comment
field so that raters could note issues (e.g.,
technical difficulties, other speech charac-
Procedure teristics, etc.) that were not easily addressed
The raters were each assigned 30 examin-
with the five-point scale and offer any com-
ees, including 10 examinees who had been
ments that would add more information to
rated at each sublevel (IL, IM, and IH).
the rating they had awarded.
Raters used a rubric, shown in Figure 1.
For each examinee’s speech sample, raters
were asked to analyze six or seven individ- Data Analysis
ual tasks and were instructed to listen to This study was guided by two primary re-
each task twice. When listening the first search questions. The first investigated the
time, they were to assess the response ho- most common linguistic features of exam-
listically on a five-point scale ranging from inees at the Intermediate level by task level
“does not meet expectations”—1 (e.g., total (Intermediate or Advanced). The second
breakdown, i.e., the examinee could not investigated the way in which task type
produce any language at the intended level) (see Table 3) affected examinee perfor-
to “exceeds expectations”—5 (e.g., pro- mance at the Intermediate level by task level
duced language far above the intended (Intermediate or Advanced). These ques-
task difficulty level). When listening for tions were answered by looking at the
the second time, the raters were asked to mean overall rating (see Figure 1) by task
identify any characteristics that would help level. For the first question, the means of the
explain their holistic assessment of the re- different linguistic features (e.g., fluency,
sponse. For example, a task that received a pronunciation, etc.; see Table 1) were com-
global assessment of “almost meets expec- pared and 95% confidence intervals (95%
tations” might include weakness in pronun- CI) were calculated and graphed to deter-
ciation, strength in vocabulary use, and mine how the features differed. For the

TABLE 3
Descriptions of Task Type Analyzed by Intermediate Sublevel
Task Level Task Type Description IL IM IH

Intermediate Talk about thing or place 2 1


Intermediate Talk about activity or routine 1 1
Intermediate Ask questions 1 1
Intermediate Intermediate role-play 1 1
Advanced Past description 1 1 1
Advanced Past narration 1 1
Advanced Advanced role-play (situation with a complication) 1 1
Advanced Role-play follow-up—Narration/description 1
Advanced Narration/description—context beyond personal 1
Advanced Report current event 1
Total Tasks Analyzed 6 7 6
Foreign Language Annals  VOL. 50, NO. 1 91

second question, the means of the task types represented by an error bar (I) with either
(“talk about thing or place,” “talk about a dot or line in the middle indicating the
activity or routine,” etc.; see Table 3) were population mean. Examining the length of
compared with 95% CIs calculated and the error bars and the overlap among vari-
graphed as well. The 95% CI is an estimate ables of interest provided a visual represen-
of population parameter and is generally tation of the differences among the variables

FIGURE 1
Holistic Assessment Grid Example
92 SPRING 2017
Foreign Language Annals  VOL. 50, NO. 1 93

and their effect sizes. Where there was no the five Intermediate-level tasks, the mean
overlap between error bars, the means were rating of overall task performance was 2.78
statistically different from one another. (see Table 4).
Where there was total overlap, there might
not be any difference between the variables. Linguistic Features
In examining the linguistic features that
contributed to overall task performance
on the Intermediate tasks, raters found
Findings that two of the features did not meet the
IL Speakers required threshold (a score of 3): “fluency”
By definition, IL speakers at minimum can and “focus on topic and task.” For the one
accomplish Intermediate-level functions Advanced-level task, the overall mean was
but are not expected to successfully perform 1.17—all of the linguistic features were
the functions that are required to complete scored between the criteria “does not
Advanced-level tasks. In the analysis of the meet” (a score of 1) and “almost meets”
samples rated IL, a rating of 3 was indicative (a score of 2). A MANOVA showed that
of minimally meeting the requirements. For all of the linguistic features of the
94 SPRING 2017

TABLE 4
Speech Criteria Rating on IL Sample
Intermediate Tasks (n ¼ 5) Advanced Tasks (n ¼ 1)
N Mean SD 95% CI N Mean SD 95% CI

Overall 544 2.78 0.78 [2.73, 2.83] 107 1.17 0.48 [1.08, 1.26]
Function
—Focus on topic 540 2.87 1.03 [2.78, 2.96] 107 1.23 0.45 [1.15, 1.31]
and task
Text Type
—Length 541 3.06 0.72 [3.00, 3.12] 107 1.34 0.57 [1.23, 1.45]
—Discourse 535 3.03 0.73 [2.97, 3.09] 107 1.69 0.85 [1.53, 1.85]
organization
Content
—Vocabulary use 539 3.12 0.65 [3.07, 3.17] 105 1.50 0.72 [1.36, 1.64]
Accuracy
—Fluency 541 2.97 0.65 [2.92, 3.02] 105 2.12 1.09 [1.91, 2.33]
—Pronunciation 539 3.19 0.63 [3.14, 3.24] 107 1.30 0.57 [1.19, 1.41]
—Grammatical/ 541 3.02 0.71 [2.96, 3.08] 107 1.97 1.13 [1.76, 2.18]
structural

Note: Please note that in some instances raters provided an overall rating but when there
was evidence of memorized material they did not rate the individual linguistic features.
This is further discussed in the qualitative section of this article.

Intermediate-level tasks were statistically This indicates that the profile of an IL


different from those of the Advanced-level speaker was one in which the ratings on the
tasks, F(1, 622) ¼ 26.86, p < 0.0001; Wilk’s different categories clustered around the
L ¼ 0.79. “minimally meets” threshold for Intermedi-
Figure 2 presents the means as well as ate-level tasks. With the Advanced-level
the 95% CIs as represented by error bars task, the developmental profile was less
(I). When the speakers were performing equal. “Focus on topic and task” and “pro-
Intermediate-level tasks, performances nunciation” were the strongest areas and
across the seven categories all clustered were statistically equivalent. The weakest
around the “minimally meets” threshold areas were “grammatical/structural,” “text
of 3. With a difference of 0.32 between type (length),” and “discourse organiza-
the highest (pronunciation) and lowest tion,” with “fluency” and “vocabulary use”
(“focus on topic and task”) categories, just slightly higher than the other three.
the profile was relatively even. With the
Advanced-level task, scores in only one Task Types
domain (fluency) exceeded the “almost To determine the effect of the task type on
meets” threshold (a score of 2). With a the overall performance and holistic final
difference of 0.89 between the highest- rating, the mean of examinees’ overall per-
rated domain (“fluency”) and the lowest formance was examined by task type (see
(“focus on task and topic”), the profile was Table 5). None of the Intermediate-level
more disparate. tasks exceeded the “minimally meets”
Foreign Language Annals  VOL. 50, NO. 1 95

FIGURE 2
Holistic Assessment of IL Linguistic Characteristics by Intermediate and
Advanced Task Level

requirement (a score of 3). Among the In- description,” scored substantially lower
termediate-level tasks, “intermediate role- than the Intermediate-level tasks (mean
play” had the lowest mean (mean ¼ 2.62, ¼ 1.17, SD ¼ 0.45).
SD ¼ 0.90), and “talk about activity or rou- In Figure 3, the means as well as the 95%
tine” had the highest mean (mean ¼ 2.99, CIs as represented by error bars (I) showed
SD ¼ 0.86). The Advanced-level task, “past some interesting trends. With the Intermediate

TABLE 5
Overall Mean of IL Speakers on Task Type
Task Type Task Level N Mean SD 95% CI

Intermediate role-play Intermediate 107 2.62 0.90 [2.44, 2.80]


Talk about thing or place Intermediate 225 2.88 0.72 [2.79, 2.97]
(two prompts)
Talk about activity or routine Intermediate 106 2.99 0.86 [2.83, 3.15]
Ask questions Intermediate 106 2.80 0.80 [2.64, 2.96]
Past description Advanced 107 1.17 0.45 [1.09, 1.25]
96 SPRING 2017

FIGURE 3
Holistic Assessment of IL Speakers on Intermediate and Advanced Task
Type—Qualitative Comments

level, the performances across the tasks were specific tasks sounded scripted or rehearsed.
not significantly different from one another as This could be an artifact of analyzing Korean
demonstrated by the error bars; however, “in- examinees, where memorization is often em-
termediate role-play” and “ask questions” did ployed as a test preparation strategy. While
appear to be more difficult than the other three this may be effective for success on tests of
Intermediate-level task types. content, sheer memorization and then reci-
There were approximately 286 com- tation of such responses is a feature of the
ments on the overall performance of the Novice level of oral proficiency and therefore
IL speakers. Many of the comments con- would not result in sufficient language for an
firmed what was observed with the quanti- examinee to be rated at a level higher than
tative analysis (improving fluency, Novice. Official ACTFL rating protocols re-
accuracy, etc.); however, one trend that quire the raters to listen to the entire speech
emerged was the role that rehearsed mate- sample and not just individual tasks, as was
rial or canned/memorized responses had on the case in this study (ACTFL, 2012a); how-
raters’ ability to assess the sample of speech ever, when a single task type is learned and
in a valid way. In nine instances, raters gave rehearsed, it does not provide evidence of an
a holistic rating of “does not meet” but then examinee’s spontaneous, productive speech.
used the comments field to note why they
did not provide numerical ratings for some
of the other linguistic features. IM Speakers
Approximately 20% of the rater com- By definition, IM speakers fully meet the
ments noted that the responses to these requirements needed to accomplish the
Foreign Language Annals  VOL. 50, NO. 1 97

functions that are required by the Interme- (“focus on topic and task,” “vocabulary
diate-level tasks but are not able to suc- use,” and “pronunciation”) exceeding the
cessfully sustain the functions that are “almost meets” criterion of 2.
required at the Advanced level. This was Figure 4 presents the means as well as
found to be the case for the IM speakers. the 95% CIs as represented by error bars (I).
For the four Intermediate-level tasks, the When the speakers were performing the
mean rating for overall performance was Intermediate-level tasks, their performance
3.59, an indication that test takers ex- across the seven categories exceeded the
ceeded the “minimally meets” threshold “minimally meets” threshold of 3 but did
and were approaching the “fully meets” not reach the “fully meets” threshold of 4.
level (see Table 6). With a difference of just 0.22 between the
means for the highest (“vocabulary use”)
Linguistic Features and lowest (“fluency”) characteristics, the
In examining the linguistic features that profile was relatively even. With the Ad-
contributed to the overall task performance vanced-level tasks, none of the categories
scores on the Intermediate-level tasks, met the “minimally meets” threshold and
raters found that all of the features exceeded with a difference of 0.70 between the means
the “minimally meets” threshold (a score of for the highest (“focus on topic and task”)
3), though none exceeded the “fully meets” and lowest (“grammatical/structural”) do-
threshold (a score of 4). For the three Ad- mains, the profile was more disparate.
vanced-level tasks, the overall mean was This indicates that the profile of an IM
1.57 and with three of the linguistic features speaker was one in which the different

TABLE 6
Holistic Rating of IM Speakers on Task by Speech Characteristic
Intermediate Tasks (n ¼ 4) Advanced Tasks (n ¼ 3)
N Mean SD 95% CI N Mean SD 95% CI

Overall 440 3.59 0.73 [3.52, 3.65] 334 1.57 0.65 [1.50, 1.64]
Function
—Focus on topic 421 3.60 0.82 [3.52, 3.68] 347 2.39 1.07 [2.28, 2.50]
and task
Text Type
—Length 422 3.71 0.57 [3.66, 3.76] 345 1.73 0.74 [1.65, 1.81]
—Discourse 422 3.66 0.61 [3.60, 3.72] 347 1.80 0.8 [1.72, 1.88]
organization
Content
—Vocabulary use 420 3.74 0.53 [3.69, 3.79] 343 2.25 0.9 [2.15, 2.35]
Accuracy
—Fluency 421 3.52 0.60 [3.46, 3.58] 345 1.97 0.89 [1.88, 2.06]
—Pronunciation 421 3.68 0.55 [3.63, 3.73] 346 2.37 1.05 [2.26, 2.48]
—Grammatical/ 422 3.59 0.62 [3.53, 3.65] 345 1.69 0.75 [1.61, 1.77]
structural

Note: Please note that in some instances, raters provided an overall rating but when there
was evidence of memorized material, they did not rate the individual linguistic features.
This will be further discussed in the qualitative section of this paper.
98 SPRING 2017

FIGURE 4
Holistic Assessment of IM Linguistic Characteristics by Intermediate and
Advanced Task Level

categories easily exceeded the “minimally mean (mean ¼ 3.61, SD ¼ 0.71). With the
meets” threshold for Intermediate-level Advanced-level tasks, all were scored below
tasks. With the Advanced-level tasks, the the “minimally meets” requirement level,
developmental profile across domains was with the lowest mean being “past descrip-
less equal. “Focus on topic and task,” “pro- tion” (mean ¼ 1.49, SD ¼ 0.71) and the
nunciation,” and “vocabulary use” were the highest being “advanced role-play” (mean
strongest areas and were statistically equiv- ¼ 2.46, SD ¼ 0.68).
alent. The weakest areas were “grammati- In Figure 5, the means as well as the
cal/structural,” “length,” and “discourse 95% CIs as represented by error bars (I)
organization.” showed some interesting trends. With the
Intermediate-level tasks, I across the differ-
Task Types ent tasks overlapped, indicating that the
To determine the effect of the task type on performances were not significantly
performance, the mean of overall perfor- different. With the Advanced-level tasks,
mance was examined by task type (see however, test takers’ scores on “advanced
Table 7) Among the Intermediate-level role-play” were significantly higher than
tasks, “talk about activity or routine” had their scores on the other two Advanced-
the lowest mean (mean ¼ 3.35, SD ¼ 0.85), level tasks. This could be due to the fact
and “asking questions” had the highest that resolving situations with complications
Foreign Language Annals  VOL. 50, NO. 1 99

TABLE 7
Overall Mean of IM Speakers on Task Type
Task Type Task Level N Mean SD 95% CI

Intermediate role-play Intermediate 111 3.60 0.72 [3.46, 3.74]


Talk about activity or routine Intermediate 112 3.35 0.85 [3.19, 3.51]
Talk about thing or place Intermediate 111 3.52 0.76 [3.38, 3.66]
Ask questions Intermediate 111 3.61 0.71 [3.47, 3.75]
Past description Advanced 111 1.49 0.62 [1.37, 1.61]
Past narration Advanced 143 1.53 0.61 [1.43, 1.63]
Advanced role-play Advanced 105 2.46 0.68 [2.32, 2.60]

at the Advanced level can often be what was observed with the quantitative
accomplished without paragraph-length analysis—test takers provided both good
discourse. quantity and quality of speech when com-
pleting the Intermediate-level tasks—for
Qualitative Analysis of IM Speakers Advanced-level tasks, there was still a
There were approximately 216 comments need for improvement in all areas (e.g., in
on the overall performance of the IMs. accuracy, text type, and discourse organiza-
While many of the comments confirmed tion). One trend that also emerged with the

FIGURE 5
Holistic Ratings of IM Speakers on Intermediate and Advanced Task Types
100 SPRING 2017

IM speakers was the role that “rehearsed interesting to analyze performance on


material” or “canned/memorized re- some of the Intermediate-level tasks as a
sponses” played. With the IL speakers, ap- point of comparison, that was beyond the
proximately 20% of the comments indicated scope of this study. For the six Advanced-
that test takers’ responses to these specific level tasks, the mean for raters’ performance
tasks sounded rehearsed; however, with the score for the overall task was 2.13 (see
IM speakers the rate was much lower—only Table 8).
12% (or 26 total responses) were considered
by raters to constitute instances of rehearsed Linguistic Features
material. As noted in the IL discussion, As noted earlier, a holistic assessment of 3
while memorization may be an effective was indicative of minimally meeting the
strategy for tests of content, sheer memori- requirements. As shown in Table 8, none
zation and then recitation of such responses of the IHs exceeded that minimum in any of
is a feature of the Novice level. the seven categories, with the lowest score
for “length” (mean ¼ 2.34, SD ¼ 0.69) and
the highest for “focus on topic and task”
IH Speakers (mean ¼ 2.85, SD ¼ 0.92).
By definition, IH speakers fully meet the Figure 6 presents the means as well as
requirements that are needed to accomplish the 95% CIs as represented by error bars
the functions that are assessed by Interme- (I). When these speakers were performing
diate-level tasks (research question 1) and the Advanced-level tasks, their perfor-
are able to successfully meet the functions mance across the seven categories all clus-
and other criteria of the Advanced level tered between the thresholds of 2 and 3.
most of the time (research question 2). Be- With a difference of 0.51 between the
cause the High sublevel is primarily defined means for the highest domain (“focus on
in terms of performance at the next higher topic and task”) and the lowest (“length”),
major level, only Advanced-level tasks were their profile was relatively even, indicating
analyzed. While it would have been that the profile of an IH develops evenly

TABLE 8
Holistic Rating of IH Speakers on Task by Speech Characteristic
Advanced Tasks (n ¼ 3)
N Mean SD 95% CI

Overall 622 2.13 0.65 [2.08, 2.18]


Function
—Focus on topic and task 594 2.85 0.92 [2.78, 2.92]
Text Type
—Length 592 2.34 0.70 [2.28, 2.40]
—Discourse organization 594 2.38 0.71 [2.32, 2.44]
Content
—Vocabulary use 588 2.64 0.68 [2.59, 2.69]
Accuracy
—Fluency 595 2.46 0.77 [2.40, 2.52]
—Pronunciation 594 2.76 0.83 [2.69, 2.83]
—Grammatical/structural 592 2.38 0.71 [2.32, 2.44]
Foreign Language Annals  VOL. 50, NO. 1 101

FIGURE 6
Holistic Ratings of IH Speakers on Advanced-Level Tasks

across the required set of expectations but In Figure 7, the means as well as the
does not yet meet expectations at the next 95% CIs as represented by error bars (I)
level. “Focus on topic and task” and “pro- show some interesting trends. With the
nunciation” were the strongest areas and Advanced-level tasks, “advanced role-
were statistically equivalent. The weakest play” and “past narration” had the highest
areas were “grammatical/structural,” “text scores, indicating that these were the easiest
type (length)” and “discourse organiza- tasks for the examinees. The next easiest
tion,” with “fluency” and “vocabulary were “past description” and “role-play fol-
use” just slightly higher than the other low-up.” The most difficult were “narra-
three. tion,” “description beyond the personal,”
and “current events.”
Task Types
To determine the effect of the task type on Qualitative Analysis of IH Speakers
performance, the mean of the raters’ scores of There were approximately 41 comments on
overall performance was examined by task the overall performance of the IH speakers.
type (see Table 9). Scores for all of the Ad- While many of the comments confirmed what
vanced-level tasks fell below the “minimally has been learned from the quantitative
meets” requirement level, with the lowest analysis, the comments indicated that for
mean for “current event” (mean ¼ 1.90, SD Advanced-level tasks, there was still a need
¼ 0.66) and the highest for “advanced role- for improvement in all areas (e.g., improving
play” (mean ¼ 2.46, SD ¼ 0.68). accuracy, text type, and discourse
102 SPRING 2017

TABLE 9
Overall Mean of IH Speakers on Task Type
Task Type Task N Mean SD 95% CI
Level

Past description Advanced 108 2.11 0.56 [1.93, 2.29]


Past narration Advanced 73 2.36 0.63 [2.22, 2.50]
Advanced role-play Advanced 105 2.46 0.68 [2.30, 2.62]
Role-play follow-up Advanced 106 2.10 0.58 [1.96, 2.24]
Narration/description beyond Advanced 105 2.01 0.64 [1.85, 2.17]
personal
Current event Advanced 105 1.90 0.66 [1.82, 1.98]

organization). One point to note on task type This observation was supported by the quan-
is that as with the IM speakers, the raters titative analysis as well.
found that “advanced role-play” was more
easily performed successfully than the other Discussion
Advanced-level tasks, probably because it of- The purpose of this research project was to
ten does not require paragraph-level speech. provide empirical data on the profiles of

FIGURE 7
Holistic Ratings of IH Speakers on Advanced Task Types
Foreign Language Annals  VOL. 50, NO. 1 103

examinees who were rated at the Intermediate is to examine the change in linguistic char-
level and in this way to offer an initial road map acteristics between IL and IM speakers
for helping students to progress through Inter- when they performed Intermediate-level
mediate to Advanced levels of proficiency. tasks. Figure 8 shows the 95% CI means
Furthermore, since OPIc data were used, this and error bars of the linguistic character-
study is the first to examine the impact of the istics of five Intermediate-level tasks for IL
different task types that are required at the speakers and four Intermediate-level tasks
Intermediate and Advanced levels. for IM speakers that were rated.1 While both
groups of learners could meet the linguistic
Speech Characteristics demands of the Intermediate level, the IM
The first research question examined the speakers were stronger in all areas. This
linguistic characteristics at each of the sub- indicates an ease in performing Intermedi-
levels. To answer that question, it is neces- ate-level functions and provides empirical
sary to parse the speakers’ performance on evidence that there is an increase in the
Intermediate- and Advanced-level tasks. quantity and quality of the language pro-
duced between the sublevels.
Intermediate-Level Tasks As would be expected, the IL speakers’
The only way to understand what IL speak- speech samples averaged near the “mini-
ers need to do to improve to the IM sublevel mally meets” threshold (a score of 3), while

FIGURE 8
Linguistic Characteristic Ratings of IL and IM Speakers on Intermediate-Level
Tasks
104 SPRING 2017

the IM speakers’ speech samples demon- rehearsed material. While most speakers
strated their ability to perform all of the have a collection of anecdotes that they
functions that are associated with the Inter- share in conversations, test takers called
mediate level using both good quantity and attention to their inability to create with
quality of language. Even though all speech language when they offered glibly fluent,
samples in this study had been originally memorized responses that did not address
double- or triple-rated, it is interesting that the question and were not adapted for the
at the IL sublevel, the samples were rerated audience.
just below the “minimally meets” border This issue may be more endemic with
rating of 3. Some might argue that this is the OPIc than the OPI in that it is difficult
evidence that OPIc scoring has a compensa- for OPIc raters to investigate whether
tory element—the failure to minimally meet speech is being spontaneously created or
the requirements of any single task can be is simply being recited from memory. For
compensated for by stronger performance example, when an interviewee struggles to
on other tasks. Thus, the whole speech create with the language (e.g., “This uh uh
sample could be rated more highly than question uh about school uh uh very uh
some of the individual parts, although it interest. . .”) and then transitions to a
may also be an artifact of the use of re- more fluid response (e.g., “Built in the
hearsed speech. 1940s, the school I attended was part of
Table 10 lists the mean order rank of the Art Deco movement in which. . .”), an
the characteristics that IL speakers must OPI interviewer can interrupt the soliloquy
improve on when moving toward the IM by asking follow-up and clarification ques-
sublevel. Given that rehearsed material tions that guide the conversation in a new
could be subsumed under “focus on task direction. With OPIcs, however, raters must
and topic,” it is not surprising that this listen for telltale signs of rehearsed re-
characteristic was the lowest for the IL sponses and then exclude that sample as
speakers. Rather than memorizing re- evidence that the examinee can create
sponses, students must understand that with the language. The opportunity cost
they must be able to spontaneously (1) of using rehearsed material is that there
create with language, (2) perform simple are fewer chances for an examinee to
transactions, and (3) ask and answer ques- show what can be produced spontaneously.
tions, and that they should practice tailor- These results indicate that, rather than help-
ing responses to different circumstances ing examinees to be rated at a higher level,
rather than going into the autopilot of the uneven juxtaposition of rehearsed

TABLE 10
Linguistic Characteristic on Intermediate Task From Weakest to Strongest
Mean Rank Order IL IM

7th Function: focus on topic and task Accuracy: fluency


6th Accuracy: fluency Accuracy: grammatical/structural
5th Accuracy: grammatical/structural Function: focus on topic and task
4th Text type: discourse organization Text type: discourse organization
3rd Text type: length Accuracy: pronunciation
2nd Content: vocabulary use Text type: length
1st Accuracy: pronunciation Content: vocabulary use
Foreign Language Annals  VOL. 50, NO. 1 105

material with spontaneous language is a and repurposing rehearsed and memorized


hallmark of the IL sublevel. material from the Novice level as well as
In addition, to be rated at the next appropriately adjusting and adapting it to
higher sublevel, IL speakers also need to new circumstances. Engaged conversa-
reduce disfluencies in sentence-level dis- tional practice over a wide range of personal
course to increase both the quantity and topics will enable IL speakers to develop the
quality of their speech. Often disfluencies ease and fluency that is needed to progress
arise as learners search for words and self- to the IM sublevel.
correct errors in grammar—an indication
that the recall of vocabulary and grammati- Advanced-Level Tasks
cal structures has not yet been automatized. To understand what IM and IH speakers
For learners to progress from conceptual need to do to move up to the next sublevel,
control, which often entails conscientious one needs to examine the linguistic charac-
effort to produce forms, to full control, in teristics of speakers who performed Ad-
which production is automatized, learners vanced-level tasks. Figure 9 shows the
must engage in ample, abundant, and varied 95% CI means and error bars of the linguis-
conversational language practice. The ben- tic characteristics of the Advanced-level
efit of varied conversational practice is that tasks that were rated (one for IL speakers,
it allows learners to practice recombining three for IM speakers, and six for IH

FIGURE 9
Linguistic Characteristic Ratings of IL, IM, and IH Speakers on
Advanced-Level Tasks
106 SPRING 2017

speakers). None of the groups successfully different time frames (e.g., “When I was
met the linguistic demands of the Advanced walking to school this morning, I ran into
level; however, the higher the sublevel, the my cousin”). The text type progresses along
stronger their performance of each linguis- the continuum from “sentences” to “strings
tic characteristic. Thus, progression toward of sentences” until it develops into para-
the Advanced level requires systematic im- graphs with discourse markers (e.g., first,
provement among all linguistic next, then, however). The function of de-
characteristics. tailed description and narration cannot be
To move to the next higher sublevel, IM attained without increasing length and or-
and IH speakers need to show progress to- ganizational tags. Thus, grammatical/struc-
ward accomplishing Advanced-level func- tural accuracy, length, and organization—
tions. The speech characteristic areas that the three characteristics that IM and
were found to be the weakest for both of IH speakers must work on—are all
these groups and thus in need of the most interrelated.
improvement were “grammatical/struc- Just as IL speakers need to enlarge their
tural,” “length,” and “discourse organiza- language base as well as adapt and transfer it
tion” (see Table 11). to new and varied contexts, IM and IH
Speakers must move beyond simple speakers must develop greater breadth
sentences to perform the functions that and accuracy; in addition, they must funda-
are required at the Advanced level. Thus, mentally reconfigure their speech habits—
as IM and IH speakers engage in descrip- to add another floor on top of the Interme-
tions and narrations, sentence complexity diate-level girders that were mentioned at
will naturally increase. In the case of En- the beginning of the article. They need to
glish, it will involve moving toward com- move beyond conversational exchanges and
plex sentences with embedded clauses (e.g., practice carrying out Advanced-level func-
“The girl over there wearing the red sweater tions. While IM speakers would likely ben-
is my cousin”) and will also include moving efit from drawing content from familiar,
from partial to full control of tense and autobiographic domains and adding more
aspect when narrating or describing in complexity and length to their utterances,

TABLE 11
Linguistic Characteristic on Advanced Task From Weakest to Strongest
Mean Rank IM IH
Order

7th Accuracy: grammatical/ Text type: length


structural
6th Text type: length Text type: discourse
organization
5th Text type: discourse Accuracy: grammatical/
organization structural
4th Accuracy: fluency Accuracy: fluency
3rd Content: vocabulary use Content: vocabulary use
2nd Accuracy: pronunciation Accuracy: pronunciation
1st Function: focus on topic Function: focus on topic
and task and task
Foreign Language Annals  VOL. 50, NO. 1 107

IH speakers may benefit from moving be- improving their language along all four of
yond the autobiographical by acquiring the required dimensions.
more content domains. Advanced-level
speakers are often compared to news report-
ers—they can describe the setting and nar-
Task Type Difficulty
The second research question examined the
rate the details of stories over a wide range
difficulty of the Intermediate- and
of topics. Thus, IH speakers would benefit
Advanced-level tasks at each of the suble-
from opportunities to practice sharing con-
vels. To answer this question, it is necessary
tent by describing settings and narrating
to parse performance by major level.
stories in many different domains. The abil-
ity to describe or narrate in all time frames
requires speakers to use enough language Intermediate-Level Tasks
(text type) to paint a verbal picture (dis- The only way to understand what IL speak-
course organization and vocabulary) with ers need to do to improve to the IM sublevel
enough precision (accuracy) that a mono- is to examine the performance differences
lingual listener (accuracy) can visualize the between IL and IM speakers on the different
scene. Intermediate-level tasks. Figure 10 shows
Once again, a speaker’s inability to ful- the 95% CI means and error bars of the
fill these functions at the Advanced level linguistic characteristics of four different
may result from the failure to use enough Intermediate-level task types that were
language, to organize it meaningfully, or to rated. The IL speakers were at or just under
offer enough precision to communicate the “minimally meets” threshold of 3, while
without causing confusion or misunder- the IM speakers could clearly perform the
standing. Thus, concentrating on different different Intermediate-level tasks. As oc-
aspects of the four construct axes can help curred with the linguistic characteristics,
learners gain the deliberate practice it was expected that the IL speakers would
(Ericsson, 2006) needed for incremental reach the threshold; however, this result
growth. While this type of growth often is could have been another instance where
best achieved during intensive immersion the whole was greater than the sum of its
experiences like study abroad (Pearson, parts, or it could have been due to many of
Fonseca-Greber, & Foell, 2006), growth the responses being rehearsed and thus not
can also be facilitated by instructors requir- able to be rated.
ing vocabulary development and grammar It is important to point out that the
learning to be completed out of class and ordering of task difficulty was different be-
deliberately allocating a very high percent- tween the IL and IM speakers (see Table 12).
age of class time to extended communica- For IM speakers, their strengths were per-
tion activities. For example, when focusing forming “intermediate role-play” and “ask
on past narration (an Advanced-level func- questions,” both of which required transac-
tion), learners could work on spontane- tional language. Yet those same tasks were
ously producing detailed paragraph-length the most difficult for the IL speakers. En-
discourse using a variety of sentence struc- gaging in role-plays is not part of everyday,
tures and connecting devices without pen- spontaneous conversation; however, “inter-
alty for grammatical errors. Then, to mediate role-play” in the OPIc was designed
improve accuracy, learners could work at to allow examinees to demonstrate their
the sentence level to correct their recorded ability to handle simple transactions or so-
speech. The inverse (creating a written base cial situations (e.g., make a purchase, accept
text and then spontaneously enhancing and or propose an invitation) that are not readily
elaborating on it by adding detail and con- elicited through a conversational format.
tent and varying the sentence structure) Because the successful completion of “inter-
would also help learners focus on mediate role-play” required the speaker to
108 SPRING 2017

FIGURE 10
Holistic Ratings of IL and IM Speakers on Intermediate-Level Tasks

ask questions, it is not surprising that “ask while speakers focus on the linguistic fea-
questions” was the other function most in tures that have already been discussed.
need of improvement. Thus, for speakers to
move to the next sublevel, improving the Advanced-Level Tasks
ability to ask questions spontaneously in To understand what IM and IH speakers
interactional contexts must take priority need to do to improve up a sublevel, one

TABLE 12
Ordering of Intermediate Tasks From Weakest to Strongest
Mean Rank Order IL IM

4th Intermediate role-play Talk about activity or routine


3rd Ask questions Talk about thing or place
2nd Talk about thing or place Intermediate role-play
1st Talk about activity or routine Ask questions
Foreign Language Annals  VOL. 50, NO. 1 109

needs to examine the change in their differ- interlocutor. As noted above, the “advanced
ences in their performance on Advanced- role-play” required test takers to add new
level tasks. Figure 11 shows the 95% CI girders in their linguistic structure. How-
means and error bars of Advanced-level ever, since such encounters are still trans-
tasks (one for IL speakers, three for IM actional even at the Advanced level,
speakers, and six for IH speakers). None paragraph-length discourse may not be
of the groups successfully performed the needed to accomplish the task. The ordering
Advanced-level tasks; however, the higher of the “role-play follow-up” for the IH
the sublevel, the stronger the performance. speakers was also somewhat surprising
The easiest tasks for both the IMs and given the relative ease with which they per-
the IHs was the “advanced role-play” (see formed the role-play itself. The purpose of
Table 13). While “intermediate role-play” the follow-up was to provide another op-
was designed to elicit the language that is portunity for examinees to describe or nar-
needed for simple conversational ex- rate a personal instance in which they had
changes, the role-play at the Advanced level experienced something similar to what was
added a complication and placed the trans- in the role-play, and that does require para-
action in a more formal setting. This re- graph-length discourse. This finding pro-
quired that examinees use more precise vides empirical evidence that resolving
language and actively negotiate with the complicated situations may be the first trait

FIGURE 11
Holistic Ratings of IL, IM, and IH Speakers on Advanced-Level Tasks
110 SPRING 2017

TABLE 13
Ordering of Advanced Tasks From Weakest to Strongest
Mean Rank Order IM IH

6th — Current event


5th — Narration/description beyond personal
4th — Role-play follow-up
3rd Past description Past description
2nd Past narration Past narration
1st Advanced role-play Advanced role-play

that is acquired in the progression toward and “current event,” it is evident that the
the Advanced level, but discussion and nar- increased cognitive load of discussing gen-
ration within the same topic domain is more eral issues spontaneously could be impact-
difficult. ing their linguistic control. Thus, having
That “past narration” was easier than students read or listen to authentic material
“past description” was somewhat unexpected. and then asking them to describe, elaborate,
Typically, “past narration” requires greater and recount the story will help examinees
command of grammatical/structural accuracy, have content they can incorporate and use
which intuitively would seem to be more diffi- as they gain the language skills and build
cult than offering a detailed description. It girders that are needed to move along the
could be that autobiographical narrations continuum of the Intermediate sublevels to
“sound” better to raters because the rhetorical the Advanced level.
structure is different from that of a description.
Furthermore, it would have been interesting to
examine how IM speakers responded to the Limitations and Future
other Advanced-level tasks such as “current Directions
event” to see if the ordering of all tasks was the While this study identified common pat-
same between IM and IH speakers. Clearly, terns of language growth, there are some
more research must be conducted to explore issues that still must be taken into consid-
this phenomenon. eration. First, human performance is vari-
When certified testers attempt to gather able and not every learner will follow the
evidence of Advanced language proficiency, same path through the Intermediate suble-
they often employ a three-prong strategy in vels into the Advanced level. Thus, while
which examinees are (1) asked to describe a this research reports general trends, it
setting or situation; (2) asked to elaborate, would not be surprising for individual ex-
clarify, or expand the same topic; and (3) ceptions to occur. Second, using trained
prompted to relate the story from the outset raters has both strengths and weaknesses.
to the conclusion (Swender & Vicars, One strength is that it ensures that those
2012). A similar strategy could be used doing the rating understand the scale well
with IM and IH speakers if they try to de- and know what to look for. However, that
scribe, elaborate, and narrate in all the ma- familiarity could be a weakness as it may
jor time frames, moving from topics that are lead to confirmation bias in which those
personal to those that are more general. As same raters use circular logic to justify their
the most difficult tasks for IH speakers were ratings. Thus, a rater listening to an exam-
“narration/description beyond personal” inee who is already known to be an IL
Foreign Language Annals  VOL. 50, NO. 1 111

speaker will only be looking for evidence to could ask her to elaborate on what she
support that rating rather than simply rating was speaking about by telling her that for
the task on its merits alone; however, this every person (or object) she mentioned
does not seem to invalidate the insights that (e.g., a cousin), she needed to think of three
were gained into test takers’ linguistic char- traits (e.g., physical description, hobbies,
acteristics and the way in which task types occupation) that she could incorporate
differentiate among proficiency levels. Fi- into the description. Repeatedly rerecording
nally, since this study was conducted with more structurally complex versions that
Korean speakers learning English, it is un- were also more rich in content could help
known to what extent these findings would the learner establish patterns of more com-
be generalizable to other languages and plex grammar use, transition from shorter
learners. to longer text types, and increasingly incor-
porate nonautobiographical and general
content. Such continued practice across a
Conclusion variety of contexts would force rehearsed
Understanding the stages that learners go material to be adapted and allow her to
through as they progress through the Inter- confirm that she could create with the lan-
mediate range into the Advanced levels us- guage as she emerged into the IH sublevel.
ing the ACTFL proficiency framework has As an IH speaker, this learner should be
received very little attention. Fortunately, using Advanced language most of the time,
the OPIc allows a better view into what may although she would be unable to sustain it.
be happening as learners progress through The learning approach that allowed her to
that major level as both linguistic character- move into the IH sublevel would also allow
istics and task types can be analyzed. her to move to the AL sublevel, but there
To return to the student who lan- would be a few caveats. Successful commu-
guished at the Intermediate level for 3 years nication at the Advanced level requires that
across nine attempts to demonstrate Ad- learners demonstrate the ability to create
vanced-level proficiency, it might have with language in longer text types using
been helpful if her instructors had inter- discourse markers and showing automa-
vened and helped her specifically target tized fluency that incorporates more com-
the different linguistic areas and task types plex grammatical structures. Because
at each developmental sublevel. For exam- Advanced-level speech requires that en-
ple, when she started as an IL speaker, she tirely new girders be built in the learner’s
could have been instructed to work on speech paradigm, it is often difficult to reach
adapting her memorized language to differ- the level of fluency that is needed without
ent circumstances and to work on the spon- abundant opportunities to produce para-
taneous back-and-forth that characterizes graph-length discourse. Thus, since class-
transactional language as well as asking room time alone is typically insufficient,
and answering questions about personal ex- other opportunities for extensive language
periences and daily life contexts. As she practice must be incorporated. This could
developed into an IM speaker, the transac- include study abroad, foreign language
tional language that used to be a weakness housing, speaking partners, or technologi-
should now be a strength, and she could cal solutions that would allow consistent
practice moving beyond simple sentences to partnering with native speakers.
more complex strings of sentences. Record- Offering feedback on grammatical and
ing and transcribing what she said could structural errors that cause confusion or
provide the foundation for learning how misunderstanding is also essential. Per-
to combine simple sentences using subordi- haps having this learner record and tran-
nation and how to enrich the content by scribe a response, circle and identify errors
adding detail. For example, an instructor that she was aware of, and then rerecord
112 SPRING 2017

herself would encourage her to notice and ACTFL. (2016). ACTFL achieves milestone of
correct errors that might otherwise fossil- 1,000 certified ACTFL OPI testers. Retrieved
February 21, 2017, from https://www.actfl.
ize and thus limit the conjoint progression org/news/press-releases/actfl-achieves-mile
that is needed to reach the next major stone-1000-certified-actfl-opi-testers
level. Once again, the learner should Brooks, F. B., & Darhower, M. A. (2014). It
seek to communicate with a level of auto- takes a department! A study of the culture of
maticity that would lead to increased flu- proficiency in three successful foreign lan-
ency and would help her move beyond the guage teacher education programs. Foreign
purely autobiographical into narration and Language Annals, 47, 592–613.
description that extends beyond the per- Carroll, J. B. (1967). Foreign language profi-
sonal frame to include topics of general ciency levels attained by language majors near
interest as well as current events. As graduation from college. Foreign Language An-
nals, 1, 131–151.
both language learners and instructors
come to understand the necessity of simul- Chambless, K. S. (2012). Teachers’ oral profi-
ciency in the target language: Research on its
taneous and interrelated, or conjoint, de- role in language teaching and learning. For-
velopment in function, text type, content, eign Language Annals [Supplement], 45,
and accuracy, they can structure learning s141–s162.
so as to scaffold performance on each of Clifford, R. (2016). A rationale for criterion-
these dimensions to help learners more referenced proficiency testing. Foreign Lan-
easily progress through the Intermediate guage Annals, 49, 224–234.
level into the Advanced range. Cox, T. (2015). Findings of the ACTFL-CREDU
research project: Linguistic profiles of Korean
speakers of English. White paper submitted to
Note ACTFL.
1. IH speakers did not have any Intermedi- Cox, T. L., Bown, J., & Burdis, J. (2015).
ate tasks analyzed. Exploring proficiency-based vs. perfor-
mance-based items with elicited imitation as-
sessment. Foreign Language Annals, 48,
350–371.
Acknowledgments Dandonoli, P., & Henning, G. (1990). An
This article was based on a research report investigation of the construct validity of the
written for the ACTFL, and the original ACTFL proficiency guidelines and oral inter-
view procedure. Foreign Language Annals, 23,
members of that research team (Elvira 11–22.
Swender, Cynthia Martin, and Danielle Tez-
Ericsson, K. A. (2006). The influence of expe-
can) provided valuable assistance through- rience and deliberate practice on the develop-
out. I am extremely grateful for their ment of superior expert performance. The
generosity and friendship. Cambridge Handbook of Expertise and Expert
Performance, 38, 685–705.
Gouoni, J. M., & Feyten, C. M. (1999). Effects
References of the ACTFL OPI-type training on student
ACTFL. (2012a). Oral proficiency interview performance, instructional methods, and
familiarization manual. Alexandria, VA: classroom materials in the secondary foreign
Author. language classroom. Foreign Language Annals,
32, 189–200.
ACTFL. (2012b). Oral proficiency interview
computerized familiarization manual. Alexan- Glisan, E. W., & Foltz, D. A. (1998). Assessing
dria, VA: Author. students’ oral proficiency in an outcome-based
curriculum: Student performance and teacher
ACTFL. (2012c). Proficiency guidelines 2012. intuitions. Modern Language Journal, 82, 1–18.
Alexandria, VA: Author.
Halleck, G. B. (1996). Interrater reliability of
ACTFL. (2012d). Performance descriptors the OPI. Using academic trainee raters. For-
2012. Alexandria, VA: Author. eign Language Annals, 29, 223–238.
Foreign Language Annals  VOL. 50, NO. 1 113

Levine, M. G., & Haus, G. J. (1987). The Swender, E., Martin, C. L., Rivera-Martinez,
accuracy of teacher judgment of the oral pro- M., & Kagan, O. E. (2014). Exploring oral
ficiency of high school foreign language stu- proficiency profiles of heritage speakers of
dents. Foreign Language Annals, 20, 45–50. Russian and Spanish. Foreign Language An-
nals, 47, 423–446.
Liskin-Gasparro, J. E. (1996). Circumlocution,
communication strategies, and the ACTFL pro- Swender, E., & Vicars, R. (2012). Oral profi-
ficiency guidelines: An analysis of student dis- ciency interview training manual. Alexandria,
course. Foreign Language Annals, 29, 317–330. VA: ACTFL.
Liskin-Gasparro, J. E. (2003). The ACTFL pro- Thompson, I. (1995). A study of interrater
ficiency guidelines and the oral proficiency in- reliability of the ACTFL oral proficiency in-
terview: A brief history and analysis of their terview in five European languages: Data
survival. Foreign Language Annals, 36, 483–490. from ESL, French, German, Russian, and
Spanish. Foreign Language Annals, 28,
Pearson, L., Fonseca-Greber, B., & Foell, K. 407–422.
(2006). Advanced proficiency for foreign lan-
guage teacher candidates: What can we do to Thompson, I. (1996). Assessing foreign lan-
help them achieve this goal? Foreign Language guage skills. Data from Russian. Modern Lan-
Annals, 39, 507–519. guage Journal, 80, 47–65.
Surface, E. A., & Dierdorff, E. C. (2003). Thompson, G. L., Cox, T. L., & Knapp, N.
Reliability and the ACTFL oral proficiency (2016). Comparing the OPI and the OPIc: The
interview: Reporting indices of interrater con- effect of test method on oral proficiency scores
sistency and agreement for 19 languages. For- and student preference. Foreign Language An-
eign Language Annals, 36, 507–519. nals, 49, 79–92.
Surface, E., Poncheri, R., & Bhavsar, K. (2008). U.S. Department of Education. (2016a, De-
Two studies investigating the reliability and cember 16). Degree-granting institutions and
validity of the English ACTFL OPIc with Ko- branches. Retrieved December 15, 2016,
rean test takers: The ACTFL OPIc validation from http://nces.ed.gov//programs/digest/
project technical report. Retrieved August 15, d02/dt244.asp
2015, from http://www.languagetesting.com/ U.S. Department of Education. (2016b, Decem-
wp-content/uploads/2013/08/ACTFL-OPIc- ber 16). High school facts at a glance. Retrieved
English-Validation-2008.pdf December 15, 2016, from http://www2.ed.gov/
SWA Consulting Inc. (2009). Brief reliability about/offices/list/ovae/pi/hs/hsfacts.html
report 5: Test-retest reliability and absolute
agreement rates of English ACTFL OPIc profi-
ciency ratings for double and single rated tests Submitted November 4, 2016
within a sample of Korean test takers. Raleigh,
NC: Author. Accepted January 20, 2011

You might also like