Professional Documents
Culture Documents
EFL Learner Perceptions of Differentiated Speaking Assessment Tasks
EFL Learner Perceptions of Differentiated Speaking Assessment Tasks
Hui-Chuan Liaoa
Abstract
Although differentiated instruction and assessment correspond with
various motivation and learning theories, few studies have examined
the use of differentiated assessment in English as a foreign language
(EFL) contexts. Therefore, in this study, the manner in which EFL
learners perceived aspects of differentiated assessment regarding
validity, fairness, backwash, and ways to improve differentiated
assessment task construction was examined. The effects of group
learning orientation (GLO) and English proficiency on learner
perceptions were also investigated. Quantitative and qualitative data
were collected by administrating questionnaires to 300 university
sophomores and interviewing 6 participants. Descriptive analyses,
one-way and two-way analyses of variance, simple effect analyses,
and the constant comparative method were used for data analyses.
Overall positive perceptions were observed, thus supporting the
implementation of differentiated assessment to facilitate language
development in mixed-ability second language (L2) speaking classes.
Learner perceptions were found to be affected by the level of English
proficiency and GLO. Interaction effects between proficiency and GLO
were also observed. The findings are discussed in terms of pedagogical
recommendations for using differentiation in L2 contexts and
suggestions for conducting further research concerning differentiated
assessment.
Key Words: d
ifferentiated assessment, learner variance, speaking
assessment
a
Department of Applied Foreign Languages, National Kaohsiung University of Applied Sciences. E-mail: hliao@kuas.edu.tw
INTRODUCTION
Current educational trends reflect the change from homogeneity to
multiplicity in student populations. The increasing diversity of students
has been exemplified by distinct aptitudes, interests, learning styles and
strategies, cultures, experiences, and academic achievements (Chamot,
2012; Kim, 2012; McCoy & Ketterlin-Geller, 2004; Subban, 2006).
Chamot (2012) and Tomlinson (2002) have argued that classes should
be designed and implemented based on the diverse readiness levels and
learning profiles of students, and the strengths and limitations of each
learner should be acknowledged and accommodated.
However, despite the growing achievement gap in contemporary
classrooms and support from theories (Chamot, 2012; Krashen, 1985,
2003; Vygotsky, 1978), teachers in many educational contexts have not
modified their pedagogies according to current trends (Subban, 2006;
Tobin & McInnes, 2008; Valiande, Kyriakides, & Koutselini, 2011).
Similarly, although a growing achievement gap has been observed
in many English language classrooms in Taiwan, few corresponding
modifications have been made to restructure traditional classrooms to
accommodate the learning needs of each student, especially in large-
enrollment classes (Liao & Oescher, 2009).
In contemporary EFL classrooms, numerous instructors use the
teach-to-the-middle approach to meet the learning needs of most
students. Consequently, the learning profiles of higher- and lower-level
learners in a class are often neglected when instructional and assessment
decisions are made. Chamot (2012) argues that L2 teachers should not
assume that every student undergoes the same language development
processes. Although teaching all students in a class by using the same
lesson appears to be a realistic approach, not every student will be
as successful. The instruction and assessment involved may not be
sufficiently challenging to inspire higher-level learners, and may be too
demanding to sustain the learning interest and efforts of lower-level
students (Tomlinson, 2005a). To solve these problems, several educators
have advocated employing differentiation to address learner variances
and maximize learning.
30
Effects of Differentiation
Ehrman and Oxford (1995) argue that learning is most effectively
achieved when learners are motivated and their needs are addressed;
31
DIFFERENTIATED INSTRUCTION
DIFFERENTIATED ASSESSMENT
Self-monitoring
Selecting among Goal setting Self-regulation
Self-evaluation various difficulty Planning Multiple
levels
intelligences
Figure 1
Model of Differentiated Instruction and Assessment
Source: Blaz (2013); Chamot (2012); Liao and Shih (2013); Quiocho and Ulanoff
(2009); Tomlinson (2010).
32
33
34
METHODOLOGY
This study comprised two phases. The first involved developing
differentiated assessment tasks; the second involved developing
instruments and collecting data.
35
36
Figure 2
Recursion and Nonlinearity of the Revision Process
37
Table 1
Summary of the Differentiated Assessment Tasks
Fixed Preparation Maximal
Level Content Duration
Scenario Time Scores
Role Play (Midterm and Final, Mandatory)
Basic 8 expressions No 30 min n/a 80
Intermediate 11 expressions No 30 min n/a 90
Advanced 11 expressions Yes 30 min n/a 100
Topic Talk (Final, Optional)
Basic 1 of 3 topics n/a none 1 min 3
Intermediate 1 of 3 topics n/a none 1.5 min 5
Advanced 1 of 3 topics n/a none 2 min 8
38
Table 2
Task Characteristics of the Differentiated Speaking Tasks
Dimensions Task Characteristics
Task Orientation Guided: Although the outcomes are guided by the
rubrics, learners can flexibly determine how to
react to the input.
Interactional Relationship Interactional: Two-way
Goal Orientation Convergent between the speakers in the same team
Interlocutor Status No interlocutor
Topics Variable
Situations Variable
The purpose of the optional task was to award extra points to students
if they were willing to exert additional effort.
Instruments
Three instruments were used to address the research questions.
First, a five-construct, 23-item differentiated speaking assessment
learner-perception questionnaire (DSALQ, see Appendix C) was
developed on the basis of a four-construct, 14-item questionnaire in
Liao and Shih (2013) to measure learner perceptions of differentiated
assessment. The DSALQ consists of eighteen 5-point Likert-scaled
items constituting the constructs of validity (four items), fairness (five
items), backwash (three items), and clarity (six items); four open-
ended questions constituting the constructs of clarity (one item) and
task modifications (three items); and a reflection section in which
respondents evaluate the assessment. Table 3 lists the constructs
and characteristics (Likert scaling vs. open-ended questions) of the
questionnaire items. The reverse-coded items are marked with an
asterisk (*) in the table.
Second, a group learning orientation questionnaire (GLOQ, see
Appendix D) was developed on the basis of the Perceptual Learning
Style Preference Questionnaire (Reid, 1998).The GLOQ contains eight
5-point Likert-scaled questionnaire items, all gauging a single construct,
namely group learning orientation. Items 4, 6, 7, and 8 are reverse
coded. These two questionnaires were translated into Chinese by the
researcher, followed by expert reviews to ensure accuracy of meaning.
The Chinese versions were used to collect data to ensure that the
student participants understood the items. Third, the Test of English for
International Communication (TOEIC) was used to measure English
39
Participants
The participants were 300 sophomore English majors from six
intact English speaking classes offered by various universities in
Table 3
DSALQ Constructs and Question Items
Construct Likert-Scaled Items Open-Ended Items
Validity 3, 6, 11, 17
Fairness 1, 7, 9, 12, 16
Backwash 4, 8, 14*
Clarity 2*, 5*, 10*, 13, 15*, 18 19
Task Modification 20, 21, 22
(Reflection) 23
Number of Items 18 5
Total Number of Items 23
Table 4
Score Interpretation of GLO and Language Proficiency Levels
Score Score Interpretation
GLOQ Score
1.00-2.49 Weak GLO level (W-GLO)
2.50-3.49 Mild GLO level (M-GLO)
3.50-5.00 Strong GLO level (S-GLO)
TOEIC Score
225-549 A2 (Elementary) proficiency level
550-784 B1 (Intermediate) proficiency level
785-945 B2 (Upper Intermediate) proficiency level
40
Taiwan. The TOEIC scores of the participants ranged from 500 to 910,
with 11.1%, 71.5%, and 17.4% of the participants at the A2, B1, and B2
levels, respectively. The varying language performance levels supported
the rationale for implementing differentiated assessment to maximize
learning.
RESULTS
Reliability of the Instruments
All three instruments exhibited satisfactory reliability (Table 5)
based on the criteria in DeVellis (2012): A Cronbach’s alpha coefficient
between .70 and .80 is respectable; between .80 and .90 is very good; and
above .90 is excellent.
41
Table 5
Reliability and Descriptive Statistics of the Scales
Subscale N Mean SD Reliability
GLOQ 300 3.38 .59 .88
TOEIC 288 678.78 98.05 .89
DSALQ
Validity 300 3.88 .60 .81
Fairness 300 3.66 .64 .74
Backwash 279 3.80 .58 .71
Clarity 300 3.45 .62 .81
Overall Satisfaction 279 3.70 .50 .90
Table 6
Differences in Perceived Validity Among GLO Subgroups
Mean 95% CI
Comparisons Std. Error
Difference Lower Bound Upper Bound
W-GLO vs. M-GLO -.56* .21 -1.08 -.05
W-GLO vs. S-GLO -.59* .20 -1.10 -.08
M-GLO vs. S-GLO -.03 .05 -.15 -.10
*p < .05.
42
Since my early teens I’ve been very interested in English and spent much
time learning English through films and songs. It turned out that my
proficiency is higher than my peers and the classroom assessments are
Table 7
Differences in Perceived Validity Among Proficiency Subgroups
Mean Std. 95% CI
Comparisons
Difference Error Lower Bound Upper Bound
A2 vs. B1 .37* .09 .13 .60
A2 vs. B2 .49* .14 .16 .82
B1 vs. B2 .12 .11 -.15 .39
*p < .05.
43
Table 8
Differences in Perceived Backwash Among GLO Subgroups
Mean Std. 95% CI
Comparisons
Difference Error Lower Bound Upper Bound
W-GLO vs. M-GLO -.49* .15 -.85 -.13
W-GLO vs. S-GLO -.37* .14 -.73 -.02
M-GLO vs. S-GLO .12 .07 -.04 .28
*p < .05.
44
Table 9
Differences in Perceived Backwash Among Proficiency Subgroups
Mean Std. 95% CI
Comparisons
Difference Error Lower Bound Upper Bound
A2 vs. B1 .63* .13 .31 .94
A2 vs. B2 .86* .20 .36 1.36
B1 vs. B2 .23 .16 -.18 .64
*p < .05.
of GLO (weak, mild, strong) and English proficiency (A2, B1, B2) as
the independent variables. Results on validity (F(3, 280) = 1.223, p =
.302) and fairness (F(3, 280) = 1.570, p = .197) indicated no significant
interaction, suggesting that the effect of proficiency on learner
perceptions of validity and fairness did not depend on which GLO level
was considered, and vice versa.
By contrast, a significant interaction on backwash was observed
between the level of proficiency and GLO (F(3, 259) = 8.590, p = .000).
The observed power was excellent at .99, and the effect size was medium
to large (η2 = .09) based on the criteria of Stevens (1999). This statistically
significant interaction indicated that the effect of language proficiency
level on backwash depended on which GLO level was considered. To
examine where the significant proficiency by GLO effect on backwash
occurred, three simple effect analyses were conducted: one compared
the two GLO group means within the A2 proficiency subgroup, another
compared the three GLO means within the B1 subgroup, and the other
compared the three GLO means within the B2 subgroup. The simple
effect analyses revealed that the level of GLO influenced the perceptions
of backwash of the A2 (F(1, 259) = 8.615, p = .004) and B2 students (F(2,
259) = 12.950, p = .000) but not the B1 students (F(2, 259) = .020, p =
.981). B1 student perceptions of backwash were not affected by their
level of GLO, whereas A2 and B2 learner perceptions of backwash were
influenced by the level of GLO. Despite the significant results at both
the A2 and B2 levels, the interaction plot illustrated in Figure 3 indicates
that B2 students with weak GLO were the only learner subgroup
that neutrally rather than positively perceived the backwash of the
differentiated assessment.
45
Figure 3
Interaction Plot for the Effect of Proficiency and GLO on Backwash
Table 10
Written Comments on Enhancing the Assessment Description Clarity
Comments n
a. Use examples/graph charts to make the assessment descriptions easier 7
to understand.
b. Make the assessment descriptions bilingual (i.e., English and Chinese). 7
c. Continue to supplement the written descriptions with teacher 4
explanations and an in-class Q & A session.
d. Highlight keywords that distinguish task levels. 3
e. Include scoring rubrics that match the performance levels and scores. 5
f. Include rubrics that detail the sub-skills assessed. 4
g. Use mathematic formulas to show how the scores are calculated. 3
46
Table 11
Written Comments on Maximal Score Adjustments
Comments n Coding
Both Midterm and Final Role Play (55%)
A.Increase the basic-level maximal score to motivate those
29 B+
who have limited proficiency.
B. Decrease the basic-level maximal score to make
12 B-
distinctions among the performance levels.
C. Decrease the basic-level maximal score to encourage self-
11 B-
challenge by making the score unappealing.
D.Increase the basic-level but decrease the advanced-level
maximal score to decrease the differences among the 4 B+/A-
maximal scores at varying levels.
E. Decrease the advanced-level maximal score to motivate
students to try performing the bonus task for a higher 3 A-
semester grade.
Midterm Role Play Only (10%)
F. Decrease advanced-level maximal score to motivate
students to work hard on the final for a higher semester 4 A-
grade.
G.Decrease the maximal scores at all levels to make students
work harder because they have more time to prepare for 4 E-
the midterm assessment.
H.Increase the basic-level maximal score to enhance the
3 B+
confidence level of those who have limited proficiency.
Final Role Play Only (3%)
I. Increase the basic- and intermediate-level maximal scores
to provide a second chance for the students who did not 3 B+/I+
perform satisfactorily on the midterm.
(continued)
47
Table 11
Written Comments on Maximal Score Adjustments (continued)
Comments n Coding
Bonus Topic Talk Only (32%)
J. Increase the maximal scores at all levels to motivate
10 E+
students to perform the bonus task.
K. Increase the basic- and intermediate-level maximal
scores because they are currently too low to motivate 8 B+/I+
students to perform the bonus task.
L. Adjust the maximal scores at all levels to make the
6 E
intervals among the three scores equal.
M. Increase the advanced-level maximal score to make
5 A+
distinctions among the performance levels.
N. Increase the advanced-level maximal score to enhance
5 A+
the motivation of advanced learners.
Note. B, I, and A denote the basic, intermediate, and advanced task levels,
respectively; E denotes every level; “+” and “-” indicate the directions for
score adjustment.
48
DISCUSSION
Perceived Fairness, Validity, and Backwash
The overall learner satisfaction with the differentiated assessment
was affirmative. The participants typically considered the differentiated
assessment as a fair practice that reflected speaking course objectives
and positively affected learning. These positive learner perceptions
are pivotal for classroom assessment implementation because of their
consequent effects. Backwash is “the extent to which the introduction
and use of a test influences language... learners to do things... that
promote or inhibit language learning” (Messick, 1996, p. 241). Learners
who do not perceive beneficial backwash when an assessment is
introduced may attribute low task value to the assessment task. Lacking
task value subsequently causes students to exert little effort in learning
(Dornyei, 2001). Therefore, instructors should consider beneficial
backwash when constructing an assessment. In addition, relevant
literature indicates that EFL learners consider fairness to be a vital
classroom concern (Lee, 2010; Park & Lee, 2006). Previous studies have
also demonstrated the effects of the beneficial backwash of perceived
fairness (Chory-Assad, 2002) and the harmful backwash of perceived
unfairness (Horan et al., 2010; Nesbit & Burton, 2006) on motivation,
subsequent learning behaviors, and outcomes. Similarly, learner
perceptions of validity (i.e., face validity) are associated with backwash.
Although face validity cannot substitute the empirical validation of an
assessment, it establishes learner-perceived credibility and may produce
a beneficial backwash effect on language learning (P. S. Green, 1987).
Therefore, the positive learner perceptions of validity, fairness, and
backwash demonstrated in this study are positive indicators of using
the differentiated assessment in a mixed-level L2 speaking class.
49
policies and the use of multiple assessments instead of only one measure
can foster perception of the assessment as fair. The speaking assessment
tasks in this study followed both principles. This likely contributed to
the positive perceptions of assessment fairness.
Although positive perceptions of fairness were observed in all
GLO and language proficiency subgroups, the interview data regarding
fairness presented in the Effects of Learner Variances on Learner
Perceptions subsection of the Results section indicates that learner
interpretations of fairness differed. For example, Sharon (B2, S-GLO)
and Stan (A2, M-GLO) considered fairness as the opportunity for
learners at higher and lower proficiency levels to demonstrate acquired
English language skills (Alberta Education, 2010) within their zones of
proximal development (Vygotsky, 1978) and the “i + 1” range (Krashen,
1985, 2003). However, the statement provided by Lulu (B1, M-GLO)
shows her definition of fairness based on her attribution of success and
failure to personal efforts (Weiner, 2000).
Gipps and Stobart (2009) argue that fairness should be examined
considering different learning profiles within a social context. The
Table 12
Summary of Learner Variance Effects on the Perceptions of Differentiated
Assessment
Independent Variables Dependent Variables Significance
Main Effects
GLO Validity Yes (W-GLO < M-GLO/S-GLO)
Fairness No
Backwash Yes (W-GLO < M-GLO/S-GLO)
Proficiency Validity Yes (B1/B2 < A2)
Fairness No
Backwash Yes (B1/B2 < A2)
Interaction Effects Simple Effects
Proficiency & GLO Validity No
Fairness No
Backwash Yes A2 S-GLO < A2 M-GLO
B1 W-/M-/S-GLO
B2 W- < M- < S-GLO
Note. “<” indicates that the learners listed on the left side of the symbol reported
significantly less positive perceptions than did the learners on the right; “/”
denotes no significant difference between the learners on both sides.
50
51
52
53
54
REFERENCES
Alberta Education. (2010). Making a difference: Meeting diverse learning
needs with differentiated instruction. Edmonton, Canada: Author.
Beecher, M., & Sweeny, S. M. (2008). Closing the achievement gap
with curriculum enrichment and differentiation: One school’s
story. Journal of Advanced Academics, 19, 502-530.
Blaz, D. (2008). Differentiated assessment for middle and high school
classrooms. Larchmont, NY: Eye on Education.
55
56
57
58
59
60
APPENDIX A
61
Length of Not required. (As long as you use all of the assigned
Performance expressions, the length of time does NOT matter.)
Preparation Time You have 30 minutes to prepare before your performance.
62
APPENDIX B
63
APPENDIX C
Strongly disagree
Strongly agree
Disagree
Neutral
Agree
1. I think the three levels of the assessment tasks 1 2 3 4 5
correspond to the varying English proficiency
levels of my classmates.
2. I feel confused by this new type of 1 2 3 4 5
differentiated assessment.
3. This differentiated assessment tests the students 1 2 3 4 5
knowledge and skills taught in class.
4. Using this differentiated assessment facilitates 1 2 3 4 5
my English learning.
5. The way in which the task descriptions are 1 2 3 4 5
written and organized should be improved.
6. This speaking assessment matches the course 1 2 3 4 5
objective listed in Part III of the Appendix .
7. I think the maximal score that each student can 1 2 3 4 5
earn accurately represents the time and effort
invested.
8. The differentiation system adopted in this 1 2 3 4 5
speaking assessment encourages me to study
harder.
64
Reason: _________________________________________
65
21. Do you think the maximal scores for the three leveled tasks in the final
impromptu role-play performance should be adjusted? Please check the “as
is” box or fill in a recommended maximal score. If a change is suggested,
please briefly provide a rationale for the adjustment.
Reason: _________________________________________
22. Do you think the maximal scores for the three leveled tasks in the bonus
topic talk project for the final should be adjusted? Please check the “as
is” box or fill in a recommended maximal score. If a change is suggested,
please briefly provide a rationale for the adjustment.
Reason: _________________________________________
23. Please provide additional viewpoints or suggestions regarding the
differentiated assessment in this reflection section. Use the back of this
page if additional space is required.
66
APPENDIX D
Strongly disagree
Strongly agree
Disagree
Neutral
Agree
1. I get more work done when I work with others. 1 2 3 4 5
2. I learn more when I study with a group. 1 2 3 4 5
3. In class, I learn best when I work with others. 1 2 3 4 5
4. When I work alone, I learn better. 1 2 3 4 5
5. I enjoy working on an assignment with two or 1 2 3 4 5
three classmates.
6. In class, I work better when I work alone. 1 2 3 4 5
7. I prefer working on projects by myself. 1 2 3 4 5
8. I prefer to work by myself. 1 2 3 4 5
67
英語學習者對差異化口語學習評量之
看法
摘要
差異化教學與評量之理念與諸項學習動機與學習理論精神相符,
但目前卻少有研究探討差異化學習評量在 EFL ( 英語為外國語 )
情境下之使用。因此,本研究就學生對於差異化學習評量之看法,
包括差異化學習評量的效度、公平性、回沖效應以及改進建議進
行探討,且進一步分析學習者看法是否因其群體學習導向和英語
能力之不同而有所差異。本研究針對 300 位大二學生進行問卷施
測,並訪談其中六名研究對象,蒐集量化與質化資料。資料分析
法包含敘述性分析、單因子與雙因子變異數分析、單純效果分析
以及持續比較分析法。整體而言,學生對於差異化學習評量抱持
正向態度;因此,在多元能力英語口語課室中實施差異化學習評
量,對促進學生語言發展應有助益。研究結果進一步顯示,學生
對差異化學習評量之看法,受到個人英語能力與群體學習導向個
別作用與交互作用所影響。本文文末根據研究結果,針對在英語
課室使用差異化教學與評量以及未來研究方向提出建議。
關鍵詞:差異化學習評量 學習者差異 口語評量
68