Azu Etd 14700 Sip1 M

Testing, Assessment, and Evaluation in Language Programs
Item Type text; Electronic Dissertation
Authors Alobaid, Adnan Othman
Publisher The University of Arizona.
Rights Copyright © is held by the author. Digital access to this material

is made possible by the University Libraries, University of Arizona.
Further transmission, reproduction or presentation (such as
public display or performance) of protected items is prohibited
except with permission of the author.
Download date 09/03/2021 02:50:03
Link to Item http://hdl.handle.net/10150/613422

TESTING, ASSESSMENT, AND EVALUATION IN LANGUAGE PROGRAMS
by
Adnan Alobaid
__________________________
Copyright © Adnan Alobaid 2016
A Dissertation Submitted to the Faculty of the
GRADUATE INTERDISCIPLINARY PROGRAMS
IN SECOND LANGUAGE ACQUISITION AND TEACHING
In Partial Fulfillment of the Requirements
For the Degree of
DOCTOR OF PHILOSOPHY
In the Graduate College
THE UNIVERSITY OF ARIZONA
2016
THE UNIVERSITY OF ARIZONA
GRADUATE COLLEGE
As members of the Dissertation Committee, we certify that we have read the dissertation
prepared by Adnan Alobaid, titled Testing, Assessment, and Evaluation in Language
Programs and recommend that it be accepted as fulfilling the dissertation requirement for the
Degree of Doctor of Philosophy.
____________________________________________________________ Date: 04/14/2016

Suzanne Panferov
____________________________________________________________ Date: 04/14/2016

Beatrice Dupuy
____________________________________________________________ Date: 04/14/2016

Peter Ecke
____________________________________________________________ Date: 04/14/2016

Edmond White
Final approval and acceptance of this dissertation is contingent upon the candidate’s
submission of the final copies of the dissertation to the Graduate College.
I hereby certify that I have read this dissertation prepared under my direction and recommend
that it be accepted as fulfilling the dissertation requirement.
____________________________________________________________ Date: 04/14/2016

Dissertation Director: Suzanne Panferov
2
STATEMENT BY AUTHOR
This dissertation has been submitted in partial fulfillment of the requirements for an
advanced degree at the University of Arizona and is deposited in the University Library to
be made available to borrowers under rules of the Library.
Brief quotations from this dissertation are allowable without special permission, provided
that an accurate acknowledgement of the source is made. Requests for permission for
extended quotation from or reproduction of this manuscript in whole or in part may be
granted by the copyright holder.
SIGNED: Adnan Alobaid
3
ACKNOWLEDGEMENTS
This dissertation could not have been accomplished without the assistance and
support of many individuals. First and foremost, I would like to express my deep and sincere
gratitude to my supervisor Suzanne Panferov, who has been more than generous with her
assistance, patience, and expertise in addition to guiding me through the entire process of
writing this dissertation.
Moreover, much gratitude goes to my committee members: Drs. Beatrice Dupuy,
Peter Ecke, and Edmond White who have been substantially supportive, encouraging, and a
great source of inspiration. I also would like to thank Dr. Nicholas Ferdinandt with whom I
took program evaluation class. Dr. Ferdinandt has been very supportive and helpful in
providing me with all necessary advice, references, and information pertaining to the program
evaluation industry.
Similarly, I would like to express my appreciation to Dr. Muhammad Al-Fuhaid, who
has been very instrumental in providing me with thoughtful insights in addition to discussing
crucial issues relevant to my dissertation topics. Furthermore, I would like to thank Ms.
Amani Suleiman Al-Samhan who helped me collect data from the study site. She has been
very generous with her time and assistance carrying out the interview part of one of this
dissertation articles.
Last but not least important, my deepest debt of gratitude goes to my family members,
especially my parents, Othman and Lulwa, for their prayers, support, and encouragement. I
also would like to thank my wife, Reem, and my two children, Wissam and Tala, for their
support, patience, and assistance in that without whom I could not have been able to complete
this dissertation.
4
TABLE OF CONTENTS
ACKNOWLEDGEMENTS ............................................................................................................ 4
LIST OF FIGURES ...................................................................................................................... 11
LIST OF TABLES ........................................................................................................................ 12
ABSTRACT .................................................................................................................................. 13
DISSERTATION THEME ........................................................................................................... 14
CHAPTER 1 INTRODUCTION .................................................................................................. 15
CHAPTER 2 SAUDI STUDENT PLACEMENT INTO ESL PROGRAM LEVELS: ISSUES
BEYOND TEST CRITERIA ........................................................................................................ 19
ABSTRACT .............................................................................................................................. 20
INTRODUCTION .................................................................................................................... 21
A Historical Review on ESL Placement Tests: Traditional ESL Placement Tests ..................... 23
ESL Placement Test Item Formats .............................................................................................. 24
ESL Placement Test Delivery Formats........................................................................................ 25
Test-takers Intentionally Failing Exams: Myth or Reality? ........................................................ 26
Student Intentions to Purposefully Underperform on a Language Test ...................................... 27
Saudi Students Studying Abroad ................................................................................................. 30
The Saudi Arabian Cultural Mission (SACM) ............................................................................ 30
METHODOLOGY ................................................................................................................... 31
Anecdotal Evidence and Pilot Study ........................................................................................... 31
Data Collection Background on Site..................................................................................... 32
Positionality of the Researcher .................................................................................................... 34
5
Significance of the Study ............................................................................................................. 35
Research Questions ...................................................................................................................... 36
Research Instruments ................................................................................................................... 36
Participants .................................................................................................................................. 37
Procedures ................................................................................................................................... 38
FINDINGS ................................................................................................................................ 39
Purposefully Underperforming on ESL Placement Tests: A Major or Minor Issue?.................. 48
DISCUSSION AND CONCLUSION....................................................................................... 49
LIMITATIONS ......................................................................................................................... 52
IMPLICATIONS ...................................................................................................................... 54
Pedagogical Implications ............................................................................................................. 54
Implications for Scholarship Agencies ........................................................................................ 55
Implications for Both Scholarship Agencies and ESL Administrators ....................................... 56
CHAPTER 3 INTEGRATING SELF-ASSESSMENT TECHNIQUES INTO L2
CLASSROOM ASSESSMENT PROCEDURES......................................................................... 58
ABSTRACT .............................................................................................................................. 59
INTRODUCTION .................................................................................................................... 60
Definition ..................................................................................................................................... 61
Benefits and Limitations of Self-assessment ............................................................................... 61
LITERATURE REVIEW ......................................................................................................... 62
Impact of Classical Teaching Methods on Self-assessment ........................................................ 62
Impact of Communicative Language Teaching on Self-assessment ........................................... 62
Learner-centeredness ................................................................................................................... 63
6
Self-directed Learning ................................................................................................................. 63
Self-assessment ............................................................................................................................ 64
Accuracy of Self-assessment ....................................................................................................... 64
Correlations ................................................................................................................................. 65
Discrepancies ............................................................................................................................... 65
How Can the Accuracy of Self-assessment Be Achieved? ......................................................... 66
Common European Framework of Reference (CEFR) ............................................................... 67
Validity of the CEFR Self-assessment Rubric ............................................................................ 68
METHODOLOGY ................................................................................................................... 69
Significance of the Study ............................................................................................................. 69
Research Questions ...................................................................................................................... 69
Research Tools............................................................................................................................. 70
Participants .................................................................................................................................. 70
Background on Site ..................................................................................................................... 71
Procedures ................................................................................................................................... 72
FINDINGS ................................................................................................................................ 74
Discussion and Conclusion .......................................................................................................... 83
TOEFL-related Limitations ......................................................................................................... 84
Implications for ESL/EFL Program Administrators.................................................................... 85
Implications for ESL/EFL Teachers ............................................................................................ 86
Implications for Future Research................................................................................................. 86
CHAPTER 4 QUALITY ASSURANCE AND ACCREDITATION AS FORMS FOR
LANGUAGE PROGRAM EVALUATION: A CASE STUDY OF TWO EFL
DEPARTMENTS IN A SAUDI UNIVERSITY .......................................................................... 87
7
ABSTRACT .............................................................................................................................. 88
INTRODUCTION .................................................................................................................... 89
What Is Program Evaluation? ...................................................................................................... 90
LITERATURE REVIEW ......................................................................................................... 91
Foreign Language Program Evaluation ....................................................................................... 92
Evaluation Paradigms .................................................................................................................. 94
Formative vs. Summative ............................................................................................................ 96
External Experts vs. Internal Stakeholders .................................................................................. 97
Field vs. Laboratory Research ..................................................................................................... 98
Ongoing vs. Short-term Evaluations ........................................................................................... 98
Qualitative vs. Quantitative ......................................................................................................... 99
Process vs. Product .................................................................................................................... 100
Program Evaluation through Quality Assurance ....................................................................... 101
Accreditation-based Program Evaluation .................................................................................. 102
Quality Assurance and Accreditation in Saudi Arabia .............................................................. 103
NCAAA ..................................................................................................................................... 104
Significance of the Study ........................................................................................................... 104
METHODOLOGY ................................................................................................................. 105
Background on Site ................................................................................................................... 106
Research Objectives................................................................................................................... 106
Research Questions .................................................................................................................... 107
Participants ................................................................................................................................ 107
Research Tools........................................................................................................................... 108
Integrating NCAAA and CEA Standards .................................................................................. 108
Procedures ................................................................................................................................. 110
8
Data Analysis ............................................................................................................................. 112
FINDINGS .............................................................................................................................. 112
Standard One: Mission/Goals/Objectives .................................................................................. 112
Evaluating the Mission Statements of the Two EFL Departments............................................ 113
Evaluating the Goals and Objectives of the Two EFL Programs .............................................. 116
Students’ Attitudes towards Course Objectives and Requirements .......................................... 116
Standard Two: Curriculum ........................................................................................................ 117
Student Learning Outcomes ...................................................................................................... 118
Teaching Strategies .................................................................................................................... 119
Assessment Methods ................................................................................................................. 121
Standard Three: Student Achievement ...................................................................................... 122
Standard Four: Program Development, Planning, and Review ................................................. 123
Quality Assurance Coordinator of the Male-section Program .................................................. 123
Student Dissatisfaction with the Two EFL Departments .......................................................... 126
Student Satisfaction with the Practices of the Two Departments .............................................. 128
Good Quality Assurance Practices ............................................................................................ 130
Poor Quality Assurance Practices .............................................................................................. 131
Assessment-related Dilemmas ................................................................................................... 132
Quality Assurance-related Dilemmas ........................................................................................ 132
DISCUSSION AND CONCLUSION..................................................................................... 133
Implications .................................................................................................................................... 134
Standard One: Mission/Goals/Objectives. ................................................................................. 134
Standard Two: Curriculum. ........................................................................................................ 135
Standard Three: Student Achievement. ...................................................................................... 135
Standard Four: Program Development. ...................................................................................... 135
9
CHAPTER 5: CONCLUSION ................................................................................................... 137
Appendix A: Pilot Study Findings .............................................................................................. 141
Appendix B: Survey Questions ................................................................................................... 144
Appendix C: Interview Questions ............................................................................................... 150
Appendix D: ESL Administrators & Assessment Coordinators ................................................. 153
Appendix E: Study Participants .................................................................................................. 155
Appendix F: CEFR, TOEFL, and IELTS Equivalency Table .................................................... 156
(Tannenbaum & Wylie, 2007) .................................................................................................... 156
Appendix G: CEFR Self-assessment Rubric Descriptors ........................................................... 157
Appendix H: Interview Questions .............................................................................................. 167
Appendix I .................................................................................................................................. 168
Appendix J: Color-coded CEFR, TOEFL, and IELTS Equivalency Table (Tannenbaum &
Wylie, 2007) ............................................................................................................................... 170
Appendix K: Participants’ Self-rated Scores .............................................................................. 171
Appendix L: Consent Form ........................................................................................................ 172
APPENDIX M: Web-based STUDENT SURVEY .................................................................... 173
APPENDIX O: RESEARCHER’S EVALUATION CHECKLIST ........................................... 181
APPENDIX P: COLLEGE MISSION ........................................................................................ 189
APPENDIX Q: MALE DEPARTMENT OBJECTIVES ........................................................... 190
REFERENCES ........................................................................................................................... 191

10
LIST OF FIGURES
Figure 2.1. Participants' levels of English proficiency by gender. .......................................... 37
Figure 2. Scatterplot (BIVAR)=TOEFL with Ratio by Level of Proficiency for Male and
Female. ..................................................................................................................................... 80
Figure 3. Student learning outcomes cover all language skills. ............................................ 119
Figure 4. Teaching strategies are proper for various learning styles. .................................... 120
Figure 5. Instructors communicate with you in English........................................................ 120
Figure 6. You are engaged presentations & leading discussions .......................................... 120
Figure 7. You are engaged in research projects. ................................................................... 120
Figure 8. Are assessment methods communicated in advance? ............................................ 121
Figure 9. Are these methods used consistently? .................................................................... 122
Figure 10. Students’ responses to how often their work is graded and returned to them
promptly. ................................................................................................................................ 128
Figure 11. Students’ responses to the extent to which SLOs meet their expectations. ......... 129
11
LIST OF TABLES
Table 1 ..................................................................................................................................... 38
Table 2 ..................................................................................................................................... 40
Table 3 ..................................................................................................................................... 41
Table 4 ..................................................................................................................................... 42
Table 5 ..................................................................................................................................... 44
Table 6 ..................................................................................................................................... 71
Table 7 ..................................................................................................................................... 73
Table 8 ..................................................................................................................................... 75
Table 9 ..................................................................................................................................... 76
Table 10 ................................................................................................................................... 77
Table 11 ................................................................................................................................... 82
Table 12 ................................................................................................................................. 110
Table 13 ................................................................................................................................. 110
12
ABSTRACT
This three-article dissertation addresses three different yet interrelated topics:
language testing, assessment, and evaluation. The first article (Saudi Student Placement into
ESL Program Levels: Issues beyond Test Criteria) addresses a crucial yet understudied issue
concerning why lower-level ESL classes typically contain a disproportionate number of
Saudi students. Based on data obtained from different stakeholders, the findings revealed that
one-third of the study students intentionally underperformed on ESL placement tests.
However, ESL administrators participating in this study provided contradicting findings.
The second article explores the efficacy of (Integrating Self-assessment Techniques
into L2 Classroom Assessment Procedures) by examining the accuracy of CEFR self-
assessment rubric compared to students’ TOEFL scores, and the extent to which gender and
levels of language proficiency cause any potential score underestimation. By obtaining data
from 21 ESL students attending the Center for English as a Second Language at University of
Arizona, the findings revealed no statistically significant correlations between participants’
self-assessed scores and their TOEFL scores. However, the participants reported that the
CEFR self-assessment rubric is accurate in measuring their levels of language proficiency.
On the other hand, the third article (Quality Assurance and Accreditation as Forms for
Language Program Evaluation: A Case Study of Two EFL Departments in A Saudi
University) provides a simulated program evaluation based on an integrated set of standards
of the NCAAA (the National Commission for Academic Accreditation and Assessment) and
CEA (the Commission on English Language Program Accreditation). The findings indicated
that the standards of the mission, curriculum, student learning outcomes, and program
development, planning, and review, were partially met, whereas the standards of teaching
strategies, assessment methods, and student achievement were not.

13
DISSERTATION THEME
Program
Evaluation
Classroom
Alternative
Assessment
Placement Tests
14
CHAPTER 1 INTRODUCTION
Testing, assessment, and evaluation are an inseparable component of education
(Fulcher & Davidson, 2007), and they play a fundamental role in fulfilling a number of
functions in the educational process at different levels. Research stresses the important of
these three concepts in planning, guiding, and implementing modern educational processes.
For example, Fulcher (2010) asserts that tests are “vehicles by which society can implement
equality of opportunity or learner empowerment” (p. 1). They function as gatekeepers for
education through a process of evaluating student performance to ensure they satisfy or meet
a set of standards or requirements, respectively. Furthermore, tests are tools, which are used
to verify the extent to which an individual has mastered a specific skill(s). Assessment, on the
other hand, is concerned with documenting the target knowledge, skills, and attitudes in that a
performance is observed, described, documented, and interpreted for improvement purposes.
Lastly, evaluation is a process of making judgments to a performance based on evidence.
These three concepts are an indispensable part of L2 learning. For example, Carroll
(1961) indicates that language tests are key to formal language learning as they “render
information to aid in making intelligent decisions about possible courses of action” (p. 314).
This suggests that verifying the extent to which L2 student learning outcomes are achieved is
unlikely to be made without any form of testing. Moreover, language assessment plays a vital
role not only in L2 learning but also in applied linguistics, by “operationalizing its theories
and supplying its researchers with data for their analysis of language knowledge or use”
(Clapham, 2000, p. 148). Furthermore, program evaluation has become key to many L2
language programs as “no curriculum should be considered complete without some form of
program evaluation” that corresponds with the program emerging developments (Brow, 2007,
p. 158). This three-article dissertation addresses language testing, assessment, and evaluation.
15
The first article, “Saudi Student Placement into ESL Program Levels: Issues beyond
Test Criteria”, attempts to account for why some Saudi ESL students studying in the United
States are placed into lower than-expected levels. In other words, one of the IEP programs in
the Southwestern United States has received some complaints raised by the Saudi Arabia
Cultural Mission (SACM), a Saudi student sponsor in the US, about Saudi students being
placed into lower ESL levels than expected. Of course, this placement may be attributed to
numerous factors including students’ low level of language proficiency, fatigue, lack of
adequate background on ESL placement tests, and so forth. Nevertheless, based on initial
anecdotal evidence found by the researcher, it was found that there are other reasons causing
this issue. That is, many Saudi ESL students indicated that they purposefully underperform
on ESL placement tests for several reasons including but not limited to having more time to
learn English, adapting to the target educational and environmental contexts, preparing well
for admission-related tests (TOEFL, IELTS, and GRE), and so on. To examine whether this
issue exists and therefore represents a dilemma for some or all stakeholders (language
programs, scholarship agencies, students, teachers), a set of different data were collected
from a larger, more diverse group of Saudi and Gulf Cooperation Council (GCC) students.
The article will help solve this issue, to whatever extent it exists, by first obtaining the
opinions of various stakeholders. Then, based on their responses and attitudes, implications
will be provided to help mitigate the issue. The findings of this article; however, are not
intended only for Saudi or GCC students; they are also intended for ESL students of other
nationalities. Thus, the article will be a unique contribution to possibly avoiding any form of
negative impact caused by this issue on stakeholders. That is, having a classroom with a large
number of self-misplaced students may pose greater issues for ESL teachers in particular, for
it would be expected to have severe discrepancies among students (Hughes, 1989).

16
The second article, “Integrating Self-assessment Techniques into L2 Classroom
Assessment Procedures”, in contrast, is an endeavor to explore the integration of alternative
assessment techniques into ESL/EFL classroom assessment measures. In other words, as a
response to a pedagogical paradigm shift from teacher- to student-centered teaching, this
article examines the accuracy of CEFR self-assessment rubric in reflecting L2 students’
levels of language proficiency. To do so, self-assessed scores of a group of ESL students are
compared against their TOEFL/IELTS scores in order to identify the extent to which this self-
assessment measure can be used in L2 contexts for proficiency purposes. The reason for
choosing self-assessment in particular among many other alternative assessment techniques is
twofold. First, as a previous EFL student, the researcher never used any self-assessment
measures during his entire undergraduate studies. This lack of experience led him to
encounter several difficulties when he first pursued his higher studies in the United States.
Second, research has empirically documented the importance of self-assessment in promoting
student-centeredness (Nunan, 1988), enhancing student learning (Boud 1995; Taras, 2010),
and augmenting learner autonomy (Benson, 2012).
This article emphasizes the importance of incorporating self-assessment rubrics into
ESL/EFL classroom assessment processes as an alternative or supplementary tools to
traditional assessment techniques. That is, it proposes endeavors be made to use CEFR or any
other valid self-assessment rubrics not only in L2 programs in Saudi Arabia, but also in many
ESL programs in general. More importantly, the findings of this article are hoped to
encourage university students to develop their autonomy through self-assessment. At the end
of the paper, EFL departments in Saudi Arabia in particular and other L2 practitioners in
general will be provided with implications on how self-assessment techniques can be
effectively utilized in L2 contexts.

17
First, as many Saudi universities seek to obtain institutional or program accreditation
from the National Commission for Academic Accreditation and Assessment (NCAAA), this
article provides a simulated evaluation of two EFL departments at a Saudi University to serve
as an effective model and benchmark for other language programs that seek to obtain
accreditation. Another salient gap that needs to be addressed is that through its Ninth
Development Plan (2014), Saudi Arabia considers accreditation to be a national strategic
aspect of its policies (Ministry of Economy and Planning, 2015). Hence, this article
endeavors to promote this aspect by providing a practical example of program evaluation.
Another main gap that this article attempts to fill is that program evaluation does not
receive substantive attention as opposed to “program design, teacher training, or classroom
techniques”, for it is deemed “as of a lower priority than the more obviously immediate
activities associated with design and planning” (MacKay, 1998, p. 33). In other words, as
MacKay notes, some program administrators argue that once a program is planned, designed,
and implemented, program evaluation processes will become natural and spontaneous as a
response to program changing circumstances. As a consequence, this article aims to raise
program administrators’ awareness of the significance of maintaining periodic program
evaluation and review, and that program evaluation is not, under any circumstances, “an ad
hoc, unprincipled, or arbitrary activity” but rather it is a planned, organized, and guided
process (MacKay, 1988, p. 34). Therefore, by evaluating two university English language
programs in Saudi Arabia through a set of integrated standards of the NCAAA and CEA, this
article explores program evaluation in general yet simulating through other lenses.
18
CHAPTER 2 SAUDI STUDENT PLACEMENT INTO ESL PROGRAM LEVELS:
ISSUES BEYOND TEST CRITERIA
19
ABSTRACT
ESL Placement tests are used “to place students at the stage of the teaching program
most appropriate to their abilities” (Hughes, 1989, p. 14). Placement decisions should thus
reflect students’ actual levels of language proficiency; otherwise, improper placements may
affect their learning achievement (Bachman & Palmer, 1996). Although most ESL placement
tests adhere to testing theoretical underpinnings, another issue may exist. That is, some Saudi
ESL students studying in the United States are placed into lower than-expected levels, which
might be attributed to several factors (e.g., students’ low level of language proficiency); yet,
there are other reasons for this issue. To gain initial insight into this issue, a piloted survey
was circulated to 27 Saudi students studying in the U.S. The findings revealed that 40% of
them purposefully underperformed on university ESL placement tests for various reasons.
To examine the extent to which this issue exists, a survey was circulated to a
randomly selected group of Saudi ESL students attending three ESL programs in the U.S.
This was followed by conducting semi-structured interviews with a randomly selected sample
of participants. The findings showed that 20% of these participants intentionally
underperformed on ESL placement tests. Then, the same survey was circulated to GCC (Gulf
Cooperation Council) students studying in the U.S. The findings showed that 18% of them
also deliberately underperformed on ESL placement tests. Finally, some ESL administrators
were surveyed about this issue and provided contradicting percentages of students
purposefully underperforming on ESL placement tests. This article, therefore, provides some
suggestions that help overcome or mitigate this issue.
Key Words: Test validity, test reliability, test authenticity, test practicality,
purposefully underperforming on ESL placement tests, self-misplaced students, Saudis, GCC,
SACM, KSA, IEP

20
INTRODUCTION
Language testing is an important field in the literature regarding second language
teaching, for it addresses key issues pertaining to language evaluation and assessment such as
test design, test administration, scores interpretation and so forth (Bachman & Palmer, 1996;
Brown, 2010; Hughes & Scott-Clayton, 2011; Kunnan, 1998; McNamara, 2006). ESL
placement tests are widely used by universities, community colleges, and language programs
for the purpose of placing prospective students into homogeneous groups based on accurate
measures of their levels of language proficiency (Bachman, 1990; Bachman & Palmer, 1996;
Fulcher, 1997; Wall, Clapham, & Alderson, 1994). The main purpose of ESL placement tests
is to provide accurate measures of students’ language abilities in order to place them in the
appropriate level classes (Kunnan, 2000). However, failure to report accurate scores that
reflect students’ actual language proficiency can lead to placing them into incorrect levels,
which can impact their current and future learning (Hughes, 1989).
Research has approached language placement testing from different perspectives. For
example, Fulcher (1997) points out that early placement testing publications (e.g., Goodbody,
1993; Schmitz & DelMas, 1990; Schwartz, 1985; Truman, 1992) were “concerned with the
placement of linguistic minority students” into language programs (p. 113). Before the 1990s,
researchers rarely examined the validity of ESL placement tests compared with other studies
conducted on first language and standardized foreign language tests because placement tests
are often low stakes as opposed to other language tests such as exit tests (Brantmeier, 2006).
This paper investigates another, but as of yet understudied, issue concerning the reasons why
many Saudi ESL students are placed into lower-than-expected levels in intensive university
ESL programs abroad and the extent to which this behavior affects program placement
systems.
21
In this paper, it is assumed that the placements of many Saudi ESL students are lower
than expected for numerous reasons. First, the Saudi Arabian Cultural Mission (SACM) has
long complained about placing Saudi ESL students into lower levels (ESL program
administrator, personal communication, March 12, 2015). SACM is concerned with the
reasons why lower-level ESL classes typically contain a disproportionate number of Saudi
students, and SACM ESL coordinators have attempted to identify the factors causing this
issue. They wonder if it is a language issue, a testing issue, a lack of experience in taking
placement tests, or a combination of issues. Second, research has reported that Saudi ESL
students’ oral skills are better than their writing skills. For example, Johnson (2015) found
that Saudi students perform better on the University of Illinois entry interviews compared
with their written placement scores. Saba’s (2014) longitudinal study also found that within a
short period of time, some Saudi students achieved TOEFL scores higher than their scores on
the Virginia Tech Language and Cultural Institute (VTLCI) placement test.
The discrepancies between Saudi ESL students’ placement scores and their actual
level of English proficiency can be ascribed to numerous factors such as lack of testing skills,
test-takers’ varied abilities in taking computerized test, test anxiety, and so on. Nevertheless,
more tangible evidence is needed to support this extrapolation. Thus, the researcher collected
anecdotal evidence from a randomly selected group of Saudi ESL students, all who indicated
that they purposefully performed poorly on ESL placement tests for various reasons, which
will be discussed below. This finding encouraged the researcher to review literature
addressing this issue in order to identify if some ESL students intentionally underperformed
on ESL placement tests, and if so, to what extent? Unfortunately, the researcher was unable
to find adequate studies addressing the issue. Hence, a historical review on ESL placement
tests and an in-depth discussion about test criteria will be provided in detail instead.
22
A Historical Review on ESL Placement Tests: Traditional ESL Placement Tests
Due to the dominance of SLA theories that emerged during the 70’s, 80’s and 90’s,
ESL placement tests have undergone profound changes (Green & Weimer, 2004). For
example, ESL placement tests of earlier years were greatly impacted by grammar-translation
approaches, resulting in tests that focused mainly on language structures (Brown, 1989).
Hughes, Weir, and Porter (1995) argue that traditional ESL placement tests merely test
learners’ grammatical knowledge to “mirror course content” in order to be placed into a
homogeneous grouping (p. 13). Later, the interactionalist paradigms emerged (Gass &
Varonis, 1994; Long, 1981; Mackey, 1999), which emphasized the importance of presenting
test items with rich contextual clues through which “interactional adjustments” could be
triggered (Gass, 2010, p. 221). This contextualized ESL placement tests items, “whose recall
is promoted by information in the discourse” (Bachman, 1990. p. 132). However, traditional
ESL placement tests completely neglected other language skills such as listening and
speaking.
Communicative language teaching (CLT), which emerged in the early 1970s (Hymes,
1972) and 1980s (Wesche, 1983), stressed that oral skills be included in ESL placement tests
(Brown, 1989). Consequently, this shifted testing from focusing on language structure (e.g.,
cloze tests) to addressing test-takers’ communicative competence instead (Brown, 2010).
This paradigm shift resulted in the inclusion of more authentic texts (reading), interactive
tasks (speaking), lecture note-taking tasks (listening), and integrative-based tasks (writing).
The CLT approach has also introduced performance- and task-based test items, which made
ESL placement tests move away from solely testing L2 learners’ grammatical knowledge to
testing their communicative competence.
23
ESL Placement Test Item Formats
Traditional placement test formats include essay-based, cloze, gap-filling, matching,
true/false, error correction, and so forth (Bachman, 1990). Although multiple-choice formats
have been used for quite some time, they were initially integrated into ESL placement tests
during the early 1990s, primarily as a reaction to the widespread and popular item response
theory research (IRT) (Bock, 1997). Researchers have addressed the issue of test format and
the extent to which it affects test-takers’ performance. For example, Bridgeman and Lewis
(1994) argue that some test-takers are “relatively strong on essays and weak on multiple-
choice question and vice versa” (p. 133), implying that the nature of the test construct has a
great impact on test-takers’ scores. This suggests that test format should take into careful
account during designing ESL placement tests.
Another issue that test designers should bear in mind when designing a placement test
is the extent to which ESL placement test format affects test-takers’ performance. Bridgeman
and Morgan (1996) conducted a study to validate an ESL placement test designed with two
different formats (multiple-choice vs essay), which they administered to a randomly selected
sample of ESL students in two separate sessions. The results indicated that some students,
who achieved higher scores in the essay-based test, obtained lower scores in the multiple-
choice format and vice versa. In the same vein, in validating the Lancaster University EAP
test, Wall et al. (1994) found evidence that multiple-choice items were fairly easy for many
test-takers, with a total mean scores ranging from 70% to 76%, whereas the mean scores of
the essay-based test was lower (57.91%). According to Lancaster University’s policies,
students who achieved higher scores met the targeted program’s requirements, while those
who achieved lower scores were required to take pre-sessional or remedial courses to help
them improve their language skills (Wall et al., 1994).

24
ESL Placement Test Delivery Formats
In addition, delivery formats of ESL placement tests (paper-based vs computerized)
may also have an impact on test-takers’ performance. According to Chapelle (2001),
beginning the early 1980s, ESL placement tests were delivered and graded more and more by
computers, known as computer-based tests (CBTs), which slowly replaced human raters.
Brown (1997) lists three key benefits of CBTs: 1) they can be “individually administered,
even on a walk-in basis”, 2) they do not involve many proctors, and 3) they can be kept for
future use, review, and adaptation (p. 45). Despite these benefits, it was questioned whether
the two formats would report equivalent results. Based on a Cronbach’s alpha analysis,
Fulcher (1999) found that the scores of a CBT correlated significantly with those of a PBT,
where the former’s reliability was 0.90, while the latter’s was 0.95. This implies that both
formats can report consistent scores provided test-takers are familiar with using computers.
Currently, many ESL placement tests are being delivered online, a dramatic change in
the placement test industry compared with their PBT and CBT counterparts (Roever, 2001).
That is, the Internet provides impetus opportunities for “test-taking at the learner’s
convenience and providing immediate and specific feedback to learners” (Chapelle, Jamieson
& Hegelheimer, 2003, p. 409). In fact, many ESL placement tests today are effectively
delivered and graded online despite the fact that some technical glitches may occur during
taking the test. One question that arises is to what extent do different delivery test formats
yield consistent scores? Research has revealed conflicting results; however, regardless of
delivery or design formats, the consensus is that all test-takers should have equal
opportunities in taking the test (Brantmeier, 2006; Brown, 1997; Fulcher, 1999; Kirsch,
Jamieson, Taylor & Eignor, 1998). The next section sheds some light on test criteria that
should be taken into account during the design process of ESL placement tests.
25
Test-takers Intentionally Failing Exams: Myth or Reality?
To better understand students’ intentions to fail a test, it is inevitably crucial to
identify similar cases, from any discipline, that show people making conscious attempts to
purposefully perform poorly on an exam. In other words, do people have legitimate motives
to intentionally perform poorly on a test? Literature from various disciplines has reported that
some people fail a test to achieve desired goals. For example, based on neuropsychological
data, Coppel (2011) found that some high school “athletes have been known to deliberately
perform poorly on baseline to create a low comparison point to evaluate change on post-
concussion follow-up” (p. 658). This suggests that purposefully performing poorly on a
baseline exam will enable high school athletes to be excused from practice, take longer
leaves, receive lighter assignments, and be exempt from doing their homework.
Another example of intentionally underperforming on a test is evident in the case of
malingering. That is, malingers have been found to deliberately perform poorly during
clinical examinations by feigning memory problems caused by traumatic events hoping that
the results would lead them to achieve specific external incentives (Sekely, 2014). For
example, their intentional deception of providing false statements is aimed at avoiding
“military duty or work, obtaining financial compensation, evading criminal prosecution, or
obtaining drugs” (Diagnostic & Statistical Manual of Mental Disorders 4th ed. American
Psychiatric Association, 2000, p. 739). In academic contexts, Ahlgren Reddy and Harper
(2013) reported that some prospective mathematics students often “intentionally perform
poorly on a placement examination so as to take a presumably easier course and there is no
easy way to control for this” (p. 688). Based on the examples above, one may realize that
some people seem to be generally motivated to perform poorly on certain tests or
examinations in order to achieve specific external incentives.

26
Student Intentions to Purposefully Underperform on a Language Test
Only one research article was found that addressed the motives of ESL students who
deliberately perform poorly on ESL placement tests. Kahn, Butler, Weigle and Sato (1994)
found that some ESL students tend to “intentionally perform poorly in oral interviews in
order to remain at a lower level...to avoid missing anything, or when they want to stay with
friends or an instructor at a lower level” (p. 38). Yet, no other evidence pertinent to this issue
has been reported, which can be ascribed to the assumption that ESL students might find no
benefit of revealing their intentions to throw any test. Moreover, as a Saudi, I assume that
many Saudi ESL students might be reluctant to acknowledge such behavior due to the fear of
losing their scholarship. Thus, this leads us to seek additional language-related cases of ESL
students’ conscious attempts to deliberately underperform on tests.
Brown and Hudson (2002) found cases of purposefully underperforming on L2
diagnostic tests, so that “some students may actually be motivated to fail or at least to get a
low score on criterion-referenced tests” (p. 287). This indicates that in some cases, students
may intentionally underperform on tests for different reasons. Brown and Hudson (2002)
provide an interesting yet inappropriate choice of students’ attempts to fail a test on purpose
in that some “cynical students may try to outsmart the criterion-referenced testing process
when they understand what is going on” (p. 287). That is, taking the test at the beginning of
the course for diagnostic purposes “and are told by the teacher (or guess)” that they will
retake it at the end of the course for achievement purposes, students tend to deliberately
underperform on the pre-test, and then take the post-test seriously (p. 87). By doing so, they
will be able to make perceived learning progress due to the difference between their pre- and
post-test scores without any learning actually occurring. Such deliberate failures have been
found in criterion-referenced tests, especially in pre-post assessment (Jang, 2008).

27
Furthermore, Reed and Stansfield (2004) also found evidence of students intentionally
failing a test. Similarly, although the Modern Language Aptitude Test (MLAT) is often used
to predict students’ ability to learn a foreign language (Carroll & Sapon, 2002), it has been
recently used to diagnose students who claim that they have learning disabilities that may
impede them from learning a foreign language. The findings indicated that some students
intentionally failed the test, so they could waive or postpone a foreign language course. To
address this issue, some schools hire counselors who conduct interviews with test-takers to
ensure that their MLAT reported scores are accurate. Despite all the efforts exerted by
schools to overcome or at least alleviate the aforementioned behavior, many cases of students
failing the MLAT test still occur (Carroll & Sapon, 2002).
Another interesting example of students failing a test on purpose was mentioned in a
meeting organized by the NY Public Education Reform Commission. Leonardatos (2012),
who was one of the attendees, reported that one of the presenters suggested that a new teacher
evaluation system be imposed as students’ reported scores are inaccurate. That is, since
students know that they will go through a pre- and post-test evaluation process to identify
their progress in a course, they deliberately fail a test to reflect a rapid achievement based on
their post scores. However, their motives for failing a test intentionally are not always for
their own benefit but rather for their teachers’. For example, knowing that their teachers are
evaluated based on their achievement, these high school students intentionally fail a
diagnostic pre-test so that the teacher(s) they like can obtain higher scores on teacher
evaluation (Leonardatos, 2012). This assumes that the students would achieve higher scores
on the achievement post-test. Conversely, students might perform well on the diagnostic pre-
test so that the teacher(s) they dislike can gain lower scores on teacher evaluation.
28
In the previous sections, a number of significant issues pertaining to ESL placement
tests were discussed in detail including how these tests have developed over time, with
particular focus on item formats, delivery modes, theoretical underpinnings, and so forth. It
was also been emphasized that regardless whether ESL placement tests have diagnostic
(institutional-oriented such as Michigan Test) or course-content orientations, they should
ultimately reflect test-takers’ actual levels of English proficiency. This is due to the fact that
they are originally developed “to assess students’ level of language ability so that they can be
placed in the appropriate course or class” (Alderson, Clapham & Wall, 1995, p. 11). Such
purposes can be achieved by ensuring a set of test criteria in order to make the test as useful
as possible. In some cases, however, despite being valid, reliable, practical, and even
authentic, some placement tests may fail to report test-takers’ actual levels of language
proficiency for numerous and possibly test-irrelevant reasons.
As reported by the study’s anecdotal evidence and pilot study, it was discovered that
some Saudi ESL students purposefully underperformed on ESL placement tests for certain
planned or personal reasons. This paper examines the accuracy of these findings by surveying
a larger number of Saudi and GCC ESL students. The reason for choosing Saudi ESL
students in particular is two-fold. First, as a Saudi, the researcher has witnessed a wide range
of Saudis intentionally performing poorly on ESL placement tests. That is, whenever he
attends Saudi gatherings, meets with Saudi friends, or even has some discussions with other
Saudi students, he notices that some of them encourage each other to fail placement tests on
purpose for many reasons, most importantly to have more time to seek university admission.
These students believe that underperforming on a placement test will help them stay longer in
the U.S., for if they are placed into higher ESL levels and finish the ESL program in a short
period of time, they are required to obtain admission sooner or be forced to leave the country.
29
Saudi Students Studying Abroad
The first study abroad scholarships began in 1927, when six students were sent to
Egypt to pursue higher degrees at the Kingdom’s expenses (MOHE1, 2014). Since that time,
study abroad scholarships have grown exponentially. According to Alqahtani (2014), one of
the most prosperous eras of these scholarships was the launch of King Abdullah Scholarship
Program in 2005, which was established based on an agreement between King Abdullah and
George W. Bush (Taylor & Albasri, 2014). The program sends qualified Saudi students to
“the best universities worldwide for further studies leading to academic degrees (Alqahtani,
2014, p. 33). As of 2014, the number of Saudi students studying abroad dramatically
increased to 200,000 students, approximately 111,000 of them in the U.S. (MOHE, 2014).
The Saudi Arabian Cultural Mission (SACM)
The Saudi Arabian Cultural Mission (SACM) is a Saudi agency in the U.S. that
oversees the King Abdullah Scholarship Program, represents Saudi universities, and provides
scholarship Saudi students with academic and financial support (MOHE, 2014). The ESL
Department at SACM functions as an intermediary between ESL programs and ESL Saudi
students (SACM, 2013). There is no required level of English proficiency for obtaining a
scholarship. Instead, students should: 1) be Saudi, 2) be unemployed, and 3) meet age
requirements (MOHE, 2014). Thus, in most cases, Saudi students’ levels of English
proficiency are generally based on their high school English backgrounds, which range from
beginner to intermediate with some exceptions. Some students, like those from ARAMCO
who are taught in English in KSA, generally have higher levels of English proficiency, while
those with very limited exposure to English are at much lower levels (Al Murshidi, 2014).
1
Ministry of Higher Education
30
METHODOLOGY
The key motivation for conducting this study was to identify the reasons why many
Saudi ESL students are placed into lower than-expected ESL levels. This requires carrying
out further investigations, recruiting a large number of participants, or even triangulating the
data from various resources. Therefore, data collection involved several stages. First, an
anecdotal confirmation of this behavior was obtained from some Saudi ESL students. Second,
based on the anecdotal findings, a pilot study was circulated to many Saudi ESL students.
Third, more data was collected from Saudi ESL students attending three ESL programs in the
southwestern United States. Finally, the study survey was circulated to a large number of
GCC students studying in the United States. The findings suggested numerous reasons
accounting for why many Saudi ESL students are placed into lower than-expected levels.
Anecdotal Evidence and Pilot Study
Being uncertain about why many Saudi ESL students are placed into lower than-
expected ESL levels, the researcher decided to collect some anecdotal evidence from a
randomly selected sample of Saudi students studying English in the United States. To
accomplish this, he conducted face-to-face interviews with seven of them and communicated
online via chat and email with nine of them. During the interviews, the researcher noticed
consistent themes about the issue at hand. Some Saudi students indicated that they
deliberately performed poorly on ESL placement tests. Based on these themes, a pilot study
consisting of a twelve-item survey was then administered online through Twitter (See
Appendix A). Then, it was sent via Whatsapp to a group of Saudis who fit into the targeted
category. After asking them about some introductory questions about their experience in
taking ESL placement tests, the participants were then directly asked if they have, in one way
or another, ever intentionally performed poorly on any ESL placement test.

31
The findings of the pilot study showed that 11 out of 27 participants, or nearly 40%,
indicated that they had indeed failed an ESL placement test on purpose for several reasons,
which will be discussed thoroughly in the Findings section. Although this is a pilot study with
a limited number of subjects, the findings are still stunning. That is, almost half of the
randomly selected ESL learners made a conscious attempt to fail a placement test, a problem
worth further investigation, especially given that the results are consistent with those of the
anecdotal evidence. To further explore the issue, in the subsequent data collection stages, a
more developed survey was circulated to a larger number of Saudi and GCC ESL students in
addition to interviewing a random sample of them in order to allow them to expand more on
their motives behind intentionally underperforming on ESL placement tests.
DATA COLLECTION BACKGROUND ON SITE
The IEP programs examined in this study are located in the southwestern United
States. In order to maintain anonymity, they are labeled Program A, Program B, and Program
C throughout this article. Program A consists of six ESL levels: Basic (1 & 2), Intermediate
(1 & 2), and Advanced (1 & 2). If students obtain a GPA of 3.0 and above in Advanced 2,
then they meet the university language requirement. Program B, on the other hand, consists
of seven levels often ending with an optional Bridge Program. Each level takes eight weeks
to complete unless a student fails two or more classes, in which case they must repeat the
level. In the Bridge Program, academic credit-bearing courses are combined with some ESL
courses to develop students’ English skills and to engage them in a university course in their
areas of specialization. If students complete level 7 successfully, they will fulfill the
university’s language requirements for entry into a degree-seeking program. Program C
consists of six levels. Students can obtain an endorsement for university entrance if they earn
grades of A, B, or C in every level 5 or 6 class in which they are enrolled.

32
It can be assumed that the placement tests of these programs are valid for several
reasons. Unfortunately, the researcher did not have access to Program A’s placement test. He
believes that the placement tests of Programs B and C are valid since the former’s was
abstracted from a previously validated test, the International Test of English Proficiency or
iTEP, and students are reassigned a level whenever misplacements are detected (Program B
assessment coordinator, personal communication, November, 23, 2014). As for Program C,
its levels were compared against the TOEFL IBT proficiency descriptors on which placement
decisions are based (Program C Handbook). Hence, prospective students are placed into a
level of instruction based on their TOEFL or in-house placement scores. Moreover, the first
two weeks are a period of provisional placement for all students. If a student is misplaced,
s/he is then reassigned to the proper level. Given that two of the study placement tests are
valid, it would be unusual if some Saudi ESL students are placed into lower than-expected
levels unless there are other external factors causing these persistent misplacements.
At first glance, these misplacements could be attributed to several factors including
but not limited to test-takers’ potential low levels of language proficiency, lack of adequate
experience in taking language placement tests, and even fatigue when taking the test. In cases
of misplacement, students often find themselves in inappropriate levels, leading to boredom
when the classes are too easy or frustration when they are too challenging. Sometimes, these
misplaced students show neither boredom nor frustration to reflect their dissatisfaction with
misplacement. In these cases, one may assume that there are other factors causing
misplacements beyond test criteria, test-takers’ levels of proficiency, inadequate background
in placement tests, or any other test-related factors. To examine this issue prior to conducting
this study, anecdotal evidence was obtained and a pilot study was conducted from which
initial assumptions about misplacements can be drawn, as discussed earlier.

33
Positionality of the Researcher
One of the main principles in practice-based research is the positionality of the
researcher. Positionality is defined as the “self-conscious awareness of the relationship
between a researcher and another” (Bourke, 2014, p. 2). It mainly deals with a researcher’s
insider/outsider relationships with the participants and professionals in the same or
interrelated fields. For purposes of full disclosure, the researcher’s nationality is the same as
the participants, Saudi Arabian. A researcher is expected to collect and analyze data as an
independent researcher without consciously or unconsciously attempting to lead the
participants towards certain desired findings. The researcher’s physical, language, and
cultural access to the participants provided him with an emic position. Both the researcher
and participants are scholarship students sponsored by SACM to pursue their desired degrees.
This enabled him to meet with several Saudi students at the Saudi Student Club, weekend
gatherings, Facebook, and so forth.
The researcher’s emic advantage helped him obtain much information from the
participants, which might not have been accessible by researchers of other nationalities.
Although the participants appeared to be comfortable discussing this issue with him as
opposed discussing it with their ESL teachers or classmates, his objectivity was challenged
during some stages of data collection process. For example, during one interview, the
researcher discussed with a participant some issues about ESL placement tests in order to
allow the issue at focus, purposefully underperforming on ESL placement tests, to enter into
the conversation naturally. However, the participant kept discussing the accuracy of ESL
placement tests in detail, and the researcher repeatedly attempted to redirect the discussion to
the study issue. This may have impacted some of the participants’ responses and led them to
desired findings, which will be discussed fully in the limitations section.

34
Significance of the Study
This study is a substantial endeavor in accounting for why some Middle Eastern,
especially Saudi, ESL students are placed into lower than-expected levels. In addition, based
on the literature, it examines students’ suggestions and ESL administrators’ views to provide
implications to address this issue effectively. According to the anecdotal evidence and pilot
study findings, failing an ESL placement test on purpose, for any reason, is likely to lead to
further negative consequences. Assuming that 40% of the approximately 10,000 Saudi
students, who are granted governmental scholarships on an annually basis to study abroad,
intentionally underscore on placement exams, then 4,000 students will fail their ESL
placement test. This will not only affect the outcomes of Saudi students studying abroad, but
it will also have a potentially negative impact on ESL programs as well.
A large number of misplaced students can pose greater problems for ESL teachers, as
there might be severe discrepancies among students (Hughes, 1989). Furthermore, although
we do not have accurate statistics of scholarship ESL students who were unable to obtain
university admission and thus returned to Saudi Arabia, this study will be helpful to
encourage not only Saudi but all ESL students to take the ESL placement tests seriously to
benefit from the ESL levels into which they are placed. As a consequence, the significance of
this study centers upon the hope that many Saudi scholarship administrators, ESL
administrators and teachers, and Saudi ESL students will find the results very important in
terms of best practices in managing massive scholarship programs, long-term program
effects, and general language pedagogy. Therefore, this study aims to answer the following
key question, do ESL Saudi students purposefully underperform on ESL placement tests?
Based on the findings of this question, the study also investigates any negative consequences
resulting from such behavior.

35
Research Questions
1. Do some Saudi ESL students purposefully underperform on ESL placement tests in
order to be placed into lower levels, and if so, why?
2. What are some reasons that make students deliberately underperform on the English
placement tests?
3. To what extent does this behavior affect the placement system of ESL programs?
4. Is this behavior a major issue worth addressing, or is it a minor issue?
5. What are some possible implications that can contribute to resolving or mitigating this
issue/phenomenon?
Research Instruments
A mixed-methodology approach was used to collect data from the three programs
studied because an understudied issue is being investigated. The research tools are a 20-item
survey (Appendix B) and semi-structured interviews (Appendix C). The survey was designed
based on the findings of the anecdotal evidence and pilot study. The last item of the survey
asked the participants to provide their emails if they wished to participate in the interviews.
On the other hand, the semi-structured interviews, which consisted of ten questions, were
designed in a way that helps the participants expand more fully on their responses to the
survey in order to identify their motives behind deliberately underperforming on their ESL
placement tests. Moreover, the same survey was circulated, with some modifications, to a
wide range of GCC students studying in the U.S. through several platforms to test the initial
findings of the study. The last research tool used in this study was a six-item survey
(Appendix D), which was designed and sent out randomly to some ESL administrators and
assessment coordinators (n= 17) in order to gain their views on the study’s issue.
36
Participants
The participants of this study were divided into three categories (See Appendix E).
The first group consisted of 71 randomly selected Saudi ESL students (50 male and 21
female) attending three university ESL language programs in the southwestern United States.
Their levels of language proficiency were identified in the survey, which first asked students
about their current ESL program levels, range from beginner to advanced (Figure 1).
Figure 2.1. Participants' levels of English proficiency by gender.
Although 127 students participated in the survey, only 71 of them completed the
entire survey, so the remaining participants were excluded from analysis. Regarding their
distribution across the three study programs, 25 participants are attending Program A, 36
attending Program B, 13 attending Program C, and one attending another (Table 1).
37
Table 1
Levels of Proficiency Distribution of Participants in the Three ESL Programs
Lower-
Upper- Advance Passed all
Answer Beginner intermediat Intermediate
intermediate d ESL Levels
e
Program A 3 0 5 5 8 4
Program B 5 2 7 8 8 3
Program B 3 1 1 1 1 6
Total 71 11 3 13 14 17 13
The second group of participants consisted of 216 GCC students studying in the
United States who received the same survey with some modifications. The third group
included 17 ESL administrators and assessment coordinators to obtain their attitudes towards
the study issue.
Procedures
Data was collected by circulating a web-based survey to the university emails of
Saudi students attending the target ESL programs. Unfortunately, the initial number of
participants was too small to draw any conclusions about the issue. Therefore, the researcher
asked some teachers from the study’s programs to encourage Saudi students to participate in
the survey. He also asked many Saudi students with whom he is acquainted to encourage
their friends or any other Saudi students they knew of who satisfied the parameters of the
focus population to take the survey. After three months, 127 survey responses were received,
71 of which were completed in their entirety. The responses were organized into three
themes: 1) general information about the participants, 2) the study’s main question (Have you
ever intentionally performed poorly on an ESL placement test?), and 3), suggestions for
addressing the issue.
38
As for the second round of data collection, those who provided their emails in the
survey were contacted and given the choice to take part in face-to-face, phone, or other voice
protocol interviews. Program B students were interviewed at their program site, whereas the
other students were interviewed via phone, Skype, and Tango2. Each interview session lasted
20 to 25 minutes. Due to cultural constraints, the researcher was unable to interview female
participants face-to-face; instead, the interviews were done via Whatsapp. All interviews
were carried out in English unless some participants found difficulty conveying their
messages, then Arabic was used. To make the findings generalizable to more ESL students
beyond the study population, the survey was sent to a number of GCC students studying in
the U.S. via Student Clubs, Whatsapp, and Twitter. Finally, a six-item survey was sent to a
random group of ESL program administrators to ask them about any impact that purposefully
underperforming on ESL placement tests may have on their program placement system.
FINDINGS
After being asked some introductory questions concerning their experiences in taking
ESL placement tests in order to avoid leading them to any desired findings (i.e., being
influenced by the wording of the study’s main question), the participants were directly asked
the following question: “Did you intentionally, in one way or another, perform poorly on the
ESL Placement Test to be placed in lower levels?” One noteworthy issue is that this key
question was introduced in both English and Arabic to ensure that all participants understood
it fully. The findings showed that 14 out of 71 (20%) of these Saudi participants reported that
they did deliberately perform poorly on an ESL placement test for various reasons (Table 2).
2
A social media platform
39
Table 2
Saudi Participants’ Responses to the Study Main Question
# Answer Response %
20%
1 Yes 14
80%
2 No 57
Total 71 100%
Although 20% represents only half that of the pilot study, it is still high. However, it
is remains difficult to decide whether ESL students’ motives to purposefully underperforming
on placement tests exist at this percentage. In other words, 20% of participants is likely to be
insufficient to make the findings generalizable. Moreover, there is no other supporting
evidence, such as observing participants’ classroom performance right after taking the
placement tests. Another salient issue is that the researcher had no access to the placement
systems used by the three programs that participated in the study in order to examine these
findings. As a consequence, the researcher decided to support these findings with other data
collected from a larger number population that meets the parameters of the target population.
Therefore, the same survey was circulated to numerous GCC students studying in the U.S..
In this round of data collection, almost 336 GCC students responded to the survey, yet
only 216 of them completed the survey. Although the survey targeted GCC students across
the United States, the vast majority of the participants are Saudis (202), with only (6)
Emiratis, (7) Kuwaitis, and (1) Omani as indicated in Table 3 below. This can be attributed to
the researcher’s limited access to GCC students, except for Saudis. In other words, the
40
researcher contacted Saudi students across the Unites States with whom he is acquainted and
asked them to help him distribute the survey.
Table 3
Saudi Participants’ Responses to the Study Main Question
# Answer Male Female

1 Bahraini 0 0
2 Emirati 4 2
3 Kuwaiti 4 3
4 Omani 1 0
5 Qatari 0 0
6 Saudi 153 49
Total 162 54
The findings revealed that 39 out of 216, or 18% of the GCC participants indicated
that they did intentionally underperform on ESL placement tests (Table 4). Although these
percentages (20% & 18%) represent around half of that of the pilot study (40%), they are still
high and concerning for scholarship agencies. Given that 10,000 Saudi scholarship students
are annually sent to study abroad, this means that 1800 to 2000 of them are likely to
deliberately perform poorly on ESL placement tests. These percentages can cost scholarship
agencies tens of millions of dollars, as students will take unnecessarily additional courses.
41
Table 4
GCC Participants’ Responses to the Study Main Question
Did you intentionally, in one way or another, perform poorly on the ESL Placement Test to be placed in lower levels?
Answer Response %
18%
Yes 39
82%
No 177
Total 216 100%
One may argue that these data are insufficient; hence, more evidence is still needed to
gain further insight into the issue. Therefore, a group of ESL administrators and assessment
coordinators were surveyed. They were first asked if they have ever noticed any case of
students purposefully underperforming on ESL placement tests. Ten of the 17 participants
indicated that they had encountered some form of conscious attempts made by students to
perform poorly on ESL placement tests, yet with a very low percentage (<5%). More
specifically, five of these participants stated that the percentage of students intentionally
underperforming on the placement test is very low, probably around 1%, while two of them
indicated that it is likely less than 5%. In contrast, one of them indicated that she has
personally witnessed this behavior only twice in her entire 25-year ESL teaching experience.
Another participant stated that although she found some students deliberately
underperforming on ESL placement tests, she maintained that it is difficult to tell the rate.
Interestingly, only six of these participants indicated that they have never witnessed a student
purposefully underperforming on a placement test.
42
Clearly, there is a large discrepancy between the responses of ESL administrators and
assessment coordinators and those of students. This poses another issue whether this behavior
exists or not. It is possible that some ESL administrators might be unaware of their students’
intentions to underperform on ESL placement tests, which could account for this discrepancy.
Of course, this argument makes sense if the study had a large pool of data collected through a
long-term period, or perhaps data were triangulated. Otherwise, the discrepancy between
these two sets of percentages may question the accuracy of these findings, especially since no
concrete evidence of students’ work was collected such as writing samples, classroom
observations, and subsequent tests. Nevertheless, even with these two sets of data, one may
wonder why these students would still intentionally perform poorly on ESL placement tests.
Only those participants who deliberately underperformed poorly on ESL placement
tests were asked about their motives. That is, if a participant selected no, indicating that s/he
has never intentionally performed poorly on ESL placement tests, the survey automatically
skipped to the next section. The participants’ provided different, yet interrelated, reasons
accounting for their intentions to purposefully underperform on placement tests as shown in
Table (5). Six of the 14 participants indicated that they had failed placements test
purposefully in order to a have longer time to practice English, as the more ESL classes they
take, the more exposure to English they will have. Moreover, five of them indicated that they
had placed themselves into lower ESL levels so that they could have less effort in carrying
out ESL tasks, which would allow them to focus more on preparing for admission-required
standardized tests (e.g., TOEFL, GRE).
43
Table 5
Why Students Purposefully Underperform on a Placement Test
Responses
Answers
6
To have more time for learning English.
To have adequate time to adapt to the program and

2
educational system.
To have adequate time to adapt to the city

0
environment.
To focus more on preparing for standardized tests

5
(TOEFL, IELTS, GRE).
Other reasons. 1
Moreover, two participants justified their reasons by stating that this would help them
spend extra time in the ESL program to have more time to become fully familiar with the
American educational system. However, only one participant ascribed his intentions to “other
reasons”, without providing additional information in the subsequent question. Having seen
these findings, one may consider students’ justifications logical to some extent, provided this
behavior does not affect their current or future learning. However, some qualitative data is
needed to explain the quantitative data more convincingly.
During the interviews, three of the participants who purposefully underperformed on a
placement test argued that if they were placed into classes reflecting their actual levels of
language proficiency (e.g., intermediate, upper-intermediate), this would force them to
complete the English language program in a shorter period of time. In such cases, they are
44
required per SACM policies to obtain immediate admission to a university or return to Saudi
Arabia. These two options are mandatory because once a student completes his/her ESL
classes, their I20 for language studies is cancelled. If a student decides to intentionally
underperform on an ESL the placement test to be placed into a lower level, this will provide
them more time to: 1) learn English, 2) prepare for standardized tests, and 3) correspond with
or even visit several universities before choosing one. Moreover, three of the participants
contend that being placed into a lower ESL level allowed them to start learning English from
scratch.
Who encouraged these 14 participants to deliberately underperform on a placement
test? Surprisingly, eight of them indicated that their friends who had taken ESL placement
tests prior encouraged them to fail the test, suggesting that more focus be given to obtaining
university admission. That is, their friends told them that the higher the ESL level, the more
demanding the class assignments will be. This may not allow them to have sufficient free
time to prepare for the TOEFL, IELTS, or GRE. On the other hand, four participants stated
that no one told them to purposefully underperform on the test; rather, this decision was
based on personal motives. Another interesting finding is that two of these participants
indicated that they had no predetermined intention to intentionally underperform on the
placement test; however, shortly before taking the test, some students with whom they took
the test encouraged them to fail it. As for GCC students, many indicated that they paid more
attention to applying to universities than to engaging in advanced ESL assignments, a motive
that made them deliberately fail the test.
This brings the researcher to another question. If those students had no scholarships,
would they still have intentionally performed poorly on their ESL placement tests? Twelve
out of 14 participants indicated that they would not, under any circumstances, have
45
purposefully underperformed on the ESL placement tests if they had not been granted a
scholarship, as they are unable to afford paying the tuition fees. Nonetheless, only two
participants insisted that they would still fail the test without having a scholarship. Hence,
this suggests that since their tuition fees are paid by SACM, these students are not concerned
with any financial costs resulting from taking unnecessary, additional ESL classes. In
contrast, the findings of GCC participants were considerably consistent with the above
mentioned in that 26 out of the 39 participants, or 67%, indicated they would not
intentionally underperform on a placement test if they were responsible for paying tuition
fees.
Having discussed students’ different motives for deliberately underperforming on
ESL placement tests, one may wonder to what extent this issue affects the placement system
as a whole. To answer this question accurately, statistical evidence is needed whether a
placement test purposefully underperformed by a number of ESL students can affect a
program’s placement system. However, the researcher unfortunately had no access to
replacement rates or any other similar statistics for the three programs involved in the study.
Alternatively, some ESL administrators and assessment coordinators were asked about the
potential impact of having 18 to 20% of their students misplaced in their placement systems.
The findings revealed that only one participant indicated that these percentages would not
impact the entire program but would only impact these particular students’ placements. She
argued that her program has five different measures for placement, including a face-to-face
interview, which makes it difficult for students to consistently underperform on all of the
assessments deliberately.
On the other hand, the remaining 16 ESL administrators and assessment coordinators
expressed their concerns with such high percentages. For example, 13 pointed out that these
46
percentages would be problematic especially if there were several scholarship students
intentionally underperforming on their ESL placement tests. For example, one of these ESL
administrators said, “We would need to move them to the appropriate section and would
possibly need to add a class/take away a class; this would cause changes in the teachers'
assignments, groups of students in each class, etc.” (personal communication, February 11,
2016). This would therefore cause additional work and frustration for all involved.
Furthermore, two participants argued that such high percentages would lead to decreasing
student motivations for the self-misplaced students would not display the proper motivation
and work ethic needed in the classes they attended. In other words, studying with misplaced
students is likely to demotivate other appropriately-placed students, as there would be great
differences between these students.
Another participant indicated that such behavior would change the entire classroom
dynamic and could ultimately affect other students' perception of the seriousness of the class.
In addition, another participant stated that if the content validity of their placement test was
very high, having scholarship students deliberately underperforming on ESL placement tests
would be a serious issue. For example, she pointed out that her program’s placement test was
mostly abstracted from their program curriculum in order to ensure high content validity.
However, she continued, if 18% of prospective scholarship students intentionally perform
poorly on ESL placement tests, this would “create level drift in our program” (personal
communication, February 10, 2016). In addition, the remaining participants argued that if
they had 18% self-misplaced students, they would take immediate action to reexamine the
procedures of the placement test. As a consequence, ESL administrators’ responses suggest
that if this issue exists at these rates, this will create a great challenge for ESL practitioners.
47
Purposefully Underperforming on ESL Placement Tests: A Major or Minor Issue?
One of the key research questions of this study is the extent to which intentionally
underperforming on ESL placement tests is a serious or minor issue. Now that multiple sets
of evidence (i.e., anecdotes, pilot study, Saudi students, GCC students, and ESL
administrators) have been provided, one can, at least based on this study’s population, draw
some conclusions about the study’s issues. One issue is the discrepancy between students’
reported rates (18 to 20%) and those provided by ESL administrators (5% at most). That is,
one may argue that having 20% out of 71 and 18% out of 216 participants claim that they had
once deliberately underperformed on an ESL placement test would be adequate evidence to
consider this behavior a negative widespread phenomenon. Potential proponents of this
position are likely to consider student assertion to be strong evidence that this behavior exists,
perhaps with higher percentages.
Nevertheless, students’ assertion that they purposefully performed poorly on ESL
placement tests might be considered inadequate to make these high percentages
generalizable. For example, without obtaining concrete evidence of participants’ work
collected two weeks past the placement test date, observing students’ performance, or
monitoring the study program placement systems, it is difficult to consider this behavior as
either widespread or detrimental to the students and/or the programs in which they are
enrolled. In fact, both arguments can be valid for two reasons. First, the discrepancy between
the students’ reported rates and those of ESL administrators does not necessarily mean that
the behavior is a minor issue simply because students are unlikely to inform their teachers of
their intentions to fail a test. In other words, these participants find no plausible reasons for
telling ESL administrators that they deliberately underperformed on ESL placement tests.
48
Second, the other argument seems also logical since drawing generalizable conclusions of
such issue requires stronger evidence.
As a compromise between these two positions, the Saudi researcher, who was a
previous ESL student and is currently an EFL teacher, contends that although the issue at
hand exists based on the study data, such data are inadequate to consider it a common
phenomenon. This requires a much larger number of participants to corroborate the data even
though the issue still occurs in several different contexts. For example, during the writing of
this study, the researcher discovered four other cases of Saudi students intentionally
underperforming on ESL placement tests that were not included into this data set; however,
documenting them poses another perplexing problem, student dishonesty. For example, one
of the study’s three ESL programs students stated that although he had completed an
intensive English course at ARAMCO and had obtained a TOEFL score that meets admission
requirements, he decided to fail the test unlike his classmates, who took it seriously and were
subsequently placed in inter-mediate ESL levels. He did this because he plans to prepare for
the GRE so that he can obtain admission from another university.
DISCUSSION AND CONCLUSION
According to the data released by the Institute of International Education (IIE) on
Intensive English Programs (IEP), the total number of Saudi IEP students enrolled in U.S.
programs reached 32,557, ranking them top on the list of places of origin of IEP students in
the country in 2014. Despite this large number, some Saudi ESL students are placed into
lower than-expected levels. This can be attributed to numerous factors such as students’ low
levels of language proficiency, students’ lack of adequate exposure to the target language,
and so forth. However, based initial anecdotal evidence and pilot study, the findings indicated
that 11 out of 27 (40%) Saudi ESL students pointed out that they had intentionally
49
underperformed on university ESL placement tests for several reasons. Both the anecdotal
evidence and pilot study helped the researcher to rationalize the study issue.
Surprised by these findings, the researcher decided to explore more tangible evidence
in the literature of ESL students deliberately performing poorly on tests. At first, he
encountered difficulty locating resources addressing the topic, so he decided to collect more
data from a larger population. The first set of data showed that 14 out of 71 (20%) of Saudi
participants reported that they did purposefully perform poorly on ESL placement tests. To
examine the accuracy of this percentage, 216 GCC ESL students were surveyed about the
same issue of which 39 of them (18%) indicated that they had intentionally underperformed
on ESL placement tests. In order to gain further insight into this issue, 17 ESL administrators
and assessment coordinators were surveyed. Ten out of 14 participants indicated that they had
seen some form of conscious attempts made by some ESL students to deliberately
underperform on ESL placement tests, however, at a very low percentage (<5%).
The present study revealed contradictory findings between students’ reported
percentages of purposefully underperforming on ESL placement tests and those encountered
by some ESL administrators and assessment coordinators. In other words, although the
findings of the pilot study, Saudi ESL students, and GCC students were high (40%, 20%, and
18% respectively), the ESL administrators reported significantly lower percentages. This
discrepancy between the percentages can be ascribed to numerous reasons. First, the behavior
may indeed exist at these high percentages, but the self-misplaced students did not inform
ESL administrators of their motives. Second, these percentages might be exaggerated due to
impacts caused by some of the survey’s statement wording, which may have led students to
desired findings. Such discrepancies suggest that further research studies be conducted in
order to recruit more participants, obtain concrete evidence of participants’ work collected
50
two weeks past the placement test date, and monitor placement systems in order to make data
more valid.
Although the study does not have evidence besides students’ reports to have
purposefully underperformed on ESL placement tests, such findings suggest that immediate
action be taken by SACM for several reasons. First, if such behavior exists at these rates, it
would cost scholarship agencies additional money without gaining the desired outcomes.
Second, this behavior is likely to lead students to not take ESL classes seriously because they
are taking ESL levels far below their English proficiency level in order to focus more time
and energy for obtaining university admission. Third, these students might not improve their
English skills by attending lower ESL levels. Fourth, some ESL programs offer English
language endorsements by which the minimum TOEFL score requirements for university
entry can be waived. In such cases, these students might delay this chance, as they have
placed themselves in lower levels, and it will take longer to reach the higher levels that allow
them to apply for endorsement.
Regardless of the differences between the percentages of the study’s participants, it
can be concluded that deliberately underperforming on ESL placement tests indeed exists and
needs to be addressed. Therefore, this study proposes a number of concrete implications that
will help mitigate or even eradicate the problem. Prior to implementing any of the subsequent
implications, it is highly recommended that SACM, the Saudi Ministry of Education, or any
other scholarship agencies survey their students to identify their language needs upon which
the implications can be implemented. In other words, it might not be effective to take any
action regarding this problem without carefully examining students’ motives behind
deliberately underperforming on ESL placement tests. For example, students should be first
asked about their experiences in taking ESL classes in general and ESL placement tests in
51
particular in addition to any issues they confronted during their ESL experiences. This will
certainly help provide concrete implications based on students’ real issues pertaining to ESL
placement tests.
LIMITATIONS
One of the primary limitations of this study is that it lacks supporting data with
concrete evidence of students’ language work. In other words, one may assume that it may be
insufficient to ask ESL students if they have ever deliberately underperformed on an ESL
placement test, but rather data should have also included samples of students’ classroom
work (e.g., writing samples). This would enable us to compare participants’ responses to their
responses to the survey and interview, provided that these samples were written within a
short period of time of taking the ESL placement tests. Unfortunately, the researcher was
unable to obtain students’ language work due to fear of placing substantial burden on
participants to take part in the study. As a consequence, this option was excluded from the
study. To overcome this limitation, a larger number of participants (GCC students) were
recruited in order to support the existing data.
Another limitation of this study is concerned with the wording of the survey
questions, especially that of the study’s main question, “Did you intentionally, in one way or
another, perform poorly on the ESL Placement Test to be placed in lower levels?” One may
argue that participants should not have been asked directly if they have ever purposefully
underperformed on an ESL placement test since such wording may lead them to the
researcher’s desired findings. Alternatively, the researcher should have let the participants
talk about their experiences so that any naturally arising issues pertaining to the study’s
targeted behavior could be documented and further explored. However, this strategy, albeit
successful, is time-consuming and sometimes difficult. Therefore, in case another researcher

52
decides to conduct the same study in the future, s/he should make sure to avoid directly
asking the participants about this behavior. Instead, s/he should find another effective strategy
that helps his/her yield some answers that arise naturally.
Another limitation is that one of the survey questions asked participants about the
reasons why they intentionally performed poorly on the ESL placement tests. They were
given a close-ended format of options consisting of four main reasons accounting for their
motives to deliberately underperform on a test. The options were as follows:
1) to have more time to learn English
2) to adapt to the program and educational system
3) to adapt to the city environment
4) to prepare for standardized tests (TOEFL, IELTS, GRE)
Although the participants were allowed to select all choices that applied, the wording of this
question, in addition to the limited choices, could still make participants’ responses very
restrictive. Moreover, giving limited choices might implant some ideas that push the
participants towards certain preferred answers. To address this limitation, the participants
were provided with a subsequent question, “If you have any other reasons, please mention
them here,” in order to allow them to provide reasons other than those listed.
In addition to the aforementioned limitations, another limitation was found. That is,
the researcher’s objectivity was challenged at some of the data collection stages. For
example, in one of the interviews, the researcher asked a participant about some introductory
questions about ESL placement tests so that the participant could not be led to any desired
findings. However, the participant was off topic, discussing the accuracy of ESL placement
tests in detail. Thus, the researcher repeatedly attempted to redirect the discussion to the
study’s issue, which may have impacted some of his responses. For example, the researcher
53
asked the participant to assume he had no scholarship, would he still throw the test? The
participant seemed to be very hesitant to respond, and he did not mention that he had ever
deliberately underperformed any ESL placement test with having a scholarship. Hence, the
researcher felt that the participant was trying to find the answer that best fit the question.
IMPLICATIONS
Pedagogical Implications
Based on participants’ responses, one of the most intriguing implications of this study
is the notion that scholarship students should be enrolled in English programs in their home
country prior to studying abroad. There are several successful examples of this approach. For
example, through the College Degree Program for Non-Employees, that Saudi Aramco
Petroleum Company grants scholarships for qualified high school graduates to pursue
bachelor degrees in certain majors. Before studying abroad, the students are enrolled in a ten-
month language preparation program (Saudi Aramco College Preparatory Program), where
they are taught intensive English and some preparatory courses in their intended field of
study (Al Murshidi, 2014). After completing the program successfully, students are then
sponsored to study abroad. Despite the criticisms leveled at ARAMCO for having students
“come directly to US universities without attending language preparation programs in the
US” (Al Murshidi, 2014, p. 41), these preparatory programs have been very helpful for
ARAMCO students before they study abroad.
This implication is consistent with the TESOL position statement that “It is important
that the sponsors either send students already at a high enough proficiency level to progress
sufficiently within the sponsor’s time limit or that the sponsors recognize that additional time
may be necessary for English language study” (TESOL, 2010, p. 2). Thus, it is recommended
that the Saudi Ministry of Education consider engaging scholarship students in intensive
54
English courses prior to studying abroad. If this implication is considered undesirable, Saudi
ESL students should then be given more time to learn English in the target country, which
will allow them adequate time to develop their English skills. Moreover, it will provide them
with sufficient time for preparing for admission-related tests such as the TOEFL, GRE, and
GMAT.
Implications for Scholarship Agencies
One of the survey statements asked students about what the Ministry of Education
should do to help their ESL learning experiences. The vast majority of the Saudi and GCC
students chose the abovementioned two suggestions, providing students with English
programs prior to studying abroad and extending ESL study period. However, one of the
proposed suggestions that many participants unexpectedly selected was that a certain TOEFL
or IELTS score should be required for obtaining a scholarship. At first, the researcher thought
that the participants had selected this suggestion because it had been introduced to them in a
structured-based survey; nevertheless, the participants’ responses to the following textbox
item actually showed the contrary. That is, 12 out of 71 Saudis and 28 out of 216 GCC
students indicated since they found difficulty obtaining the minimum scores on the TOEFL or
IELTS required to meet admission requirements, it would be more useful if a specific score
was required prior to studying abroad. Although one may doubt these students’ willingness to
accept this high-stake resolution, it is still worth considering.
Some of the participating students provided another effective and low-stake
suggestion that could help them overcome some of the difficulties they face during their ESL
learning experiences. They suggested that before studying abroad, they should be provided
with preparation programs for standardized tests (e.g., TOEFL, IELTS, GRE, GMAT). For
example, some participants stated that they lack the basic test-taking skills needed for high-
55
stake standardized tests, as opposed to, for example, Japanese, Korean, and Singaporean high
school students who are required to take exit exams. Other participants stated that it is quite
difficult to focus on ESL program assignments and admission language requirements at the
same time. As a result, they suggested that one of them should be provided in Saudi Arabia
prior to studying abroad so that students can concentrate well on both of them upon arrival in
the host country.
In case none of the aforementioned implications is applicable for one reason or
another, scholarship students should then be given mandatory exam preparation programs
about admission-related tests (e.g., TOEFL, GRE, GMAT). Although Saudi ESL students
have unlimited opportunities to take these tests at the expense of SACM, these exam prep
programs should be part of the scholarship requirements so that students will take them
seriously. Given that these programs are optional, some students indicated that they take them
to avoid visa issues rather than for developing their test taking skills. For example, one of the
participants pointed out that he had completed all ESL levels, so his I20 was about to expire.
To avoid having a hold placed on his scholarship, losing tuition fees and stipend, or worse,
being deported from the U.S., he took a GRE course to extend his I20.
Implications for Both Scholarship Agencies and ESL Administrators
Given that regular ESL placement decisions are not as accurate as desired (Hughes &
Scott-Clayton, 2010), purposefully underperforming on an ESL placement test may therefore
exacerbate this issue and cause dire consequences. To overcome this issue, substantive efforts
should be exerted to raise students’ awareness of the adverse impact of being placed into
improper ESL levels including but not limited to susceptibility to boredom, laziness,
demotivation, uncertainty of obtaining college admission, and delay of achieving academic
goals (Banegas, 2013). The awareness-raising process should be systematic in a way that
56
informs students of the negative consequences of intentionally underperforming on placement
tests. For example, this issue can be included in scholarship regulations, addressed during
orientation programs in Saudi Arabia, or constantly reminded and sent to students’ emails by
their sponsors. In conclusion, future researchers are advised to make sure that students do not
report having purposefully underperformed on ESL placement tests because they do not want
to appear weak in their English skills. Moreover, they should take into account that some
students might refrain from participating in such a study, as it may question their honesty.
57
CHAPTER 3 INTEGRATING SELF-ASSESSMENT TECHNIQUES INTO L2
CLASSROOM ASSESSMENT PROCEDURES
58
ABSTRACT
Traditional assessment instruments are often considered to be “the realm of the
teacher” (Chen, 2008, p. 238). However, to promote students’ learning autonomy and
involvement in classroom activities, other alternative assessment forms have been introduced
to L2 learning contexts including performance assessment, portfolio assessment, peer-
assessment, and journal assessment (Brown, 2010; Shohamy, 2001). One of the more widely
recognized techniques of alternative assessment is student self-assessment (Boud, 1991;
Ekbatani & Pierson, 2000). Unfortunately, most EFL departments in Saudi Arabia lack
integrating alternative assessment techniques into assessment processes (Al Asmari, 2013;
Javid, Al-thubaiti, & Uthman, 2013).
As a result, this article aims to explore the use of alternative assessment forms in
English as a Second Language (ESL) and English as a Foreign Language (EFL) classroom
contexts. It examines the accuracy of ESL students’ self-rated scores using CEFR self-
assessment rubric compared to their recently obtained TOEFL scores. It also explores
whether gender and levels of language proficiency are major influential factors for causing
score underestimation. The participants of this study are 21 ESL students who are attending
the Intensive English Program (IEP) at the Center for English as a Second Language at the
University of Arizona in Tucson. Based on CESL’s class system, their levels of proficiency
are intermediate (n=5), upper-intermediate (n=8), and advanced (n=8). The findings revealed
no statistically significant correlation between the participants’ self-assessed scores and their
TOEFL scores. However, based on the qualitative data, the participants reported that they
find CEFR self-assessment rubric accurate in reflecting their levels of language proficiency.
Key Words: Self-assessment, Common European Framework of Reference (CEFR),
ESL, EFL, language descriptors, ANOVA

59
INTRODUCTION
The past three decades have witnessed a paradigm shift in L2 contexts from teacher-
to student-centered teaching (Benson, 2001, 2012; Brown, 2007; Ellis, 1994; Nunan, 1988;
Tarone & Yule, 1989; Tudor, 1993, 1996). This new pedagogical approach originated as a
reaction to several criticisms raised by some L2 researchers who had suggested that
substantive attention be paid to L2 learners rather to language forms and structures (Bachman
2000; Ellis, 1994; Huba & Freed, 2002; Nunan, 1988). This has led ESL practitioners and
curriculum designers to “take account of learners’ needs or preferences” as a step for
involving L2 students in language learning processes (Benson, 2012, p. 33). Later, self-
directed learning was introduced into L2 learning contexts, as a stronger form of learner-
centered education (Ekbatani & Pierson, 2000).
In parallel with this paradigm shift, other researchers have suggested that learner
involvement in language learning not be limited to learning aspects; rather, it should also
include engaging them in assessment processes (Ekbatani & Pierson, 2000). For example,
Nunan (1988) argues that to make a learner-centered curriculum more effective, “both
teachers and learners need to be involved in evaluation” (p.116). Thus, several alternative
assessment approaches have been widely used in ESL contexts. Among these approaches is
learner’s self-assessment, which refers to involving learners in making decisions about their
own learning. The effectiveness of self-assessment has long been discussed in the literature
(Cassidy, 2001; Ekbatani, 2011; Harris, 1997; LeBlanc & Painchaud, 1985; McNamara,
1995). Hence, this paper investigates the extent to which self-assessment, as a key element of
self-directed learning, can be an accurate and reliable measure for ESL and EFL learners to
identify and monitor their levels of language proficiency and serve as a supplementary
technique to classroom language assessment tools.

60
Definition
Self-assessment refers to “all judgments by learners of their work” (Taras, 2010, p.
200). Similarly, O'Malley and Pierce (1996) define self-assessment as “an appraisal by an
individual of his or her own work or learning process” (p. 240). On the other hand, Andrade
(2007) expanded more on self-assessment and defined it as
a process of formative assessment during which students reflect on and evaluate the
quality of their work and learning, judge the degree to which they reflect explicitly
stated goals or criteria, identify strengths or weaknesses in their work and revise
accordingly. (p. 160)
Self-assessment can be used for placement (Bachman, 2000), achievement (McDonald &
Boud, 2003), and diagnostic (Andrade & Du, 2007) purposes. Moreover, it can be used “to
detect changes and patterns of development over time” (Dörnyei, 2001, p. 194).
Benefits and Limitations of Self-assessment
Research has documented a plethora of benefits and limitations of self-assessment.
For example, benefits of self-assessment include promoting students’ learning autonomy
(Ekbatani, 2011; Harris, 1997; McNamara & Deane, 1995). Moreover, Brown and Hudson
(1998) point out that self-assessment facilitates student involvement in their own learning,
promote their learning responsibility, and increases their motivation. Furthermore, it helps
prevent any form of cheating (LeBlanc & Painchaud, 1985) and enables students to monitor
their learning (Harris 1997). However, there are some limitations of self-assessment one of
which is subjectivity in that students are likely to be “either too harsh on themselves or too
self-flattering” (Brown, 2001, p. 145). Moreover, Tara (2008) argues that when students have
inadequate experience in self-assessment, they may “judge their own work within their own
limitations” and end up considering their performance appropriate (p. 86).

61
LITERATURE REVIEW
Impact of Classical Teaching Methods on Self-assessment
Before discussing self-assessment, it is important to explore how it originated in L2
learning contexts and how early teaching methodologies have significantly influenced its use.
Before the 1970s, most if not all earlier second language teaching methodologies were
primarily concerned with analyzing language forms and structures (Brown, 2007). For
example, grammar-translation method focuses particularly on teaching grammar explicitly,
memorizing isolated words, and translating literary texts (Larsen-Freeman, 1986; Mitchell &
Vidal, 2001). This classical method, in addition to the audio-lingual method, was deemed to
neglect the learners’ role in language learning and merely considered them to be a passive
entity. As a consequence, an alternative approach was needed to empower the learners’ role
in language learning.
Impact of Communicative Language Teaching on Self-assessment
Later, Communicative Language Teaching (CLT) emerged in the U.S. and across
Europe as an alternative approach to previous teaching methodologies (Brown, 2007;
Kumaravadivelu, 2006; Nunan, 1988; Widdowson, 1978). This dynamic approach focuses on
all components of language so that learners have the opportunity to demonstrate their
language abilities more effectively and be more active role players in language learning
(Brown, 2007; Lee & VanPatten, 2003; Nunan, 2003). Benefits of the communicative
language teaching approach led several L2 researchers and theorists to conduct an extensive
number of research studies aimed at promoting the vital role of L2 learners in language
learning processes (Ellis, 2003). Forms of CLT such as task- and process-based have both
indicated that learners are evidently capable of taking charge of their own learning, while
teachers can act as facilitators and model/scaffold providers (Johnson, 2003; Nunan, 2004).
62
Learner-centeredness
During the 1980s, research stressed the need for learner-focused teaching, a
pedagogical approach that places substantive emphasis on “learners and learning in language
teaching, as opposed to a focus on language and instruction” (Benson, 2012, p. 30). Learner-
focused teaching was initially proposed through the work of Nunan (1988), Tarone and Yule
(1989), and Tudor (1993). Their contributions were followed by numerous research studies
that examined the impact of learner-centered teaching on promoting L2 learning, thereby
leading to a radical paradigm shift from teacher-centered to learner-centered education.
Learner-centeredness encompasses several pedagogical approaches including “negotiated
curriculum” (Nunan, 1989), “needs analysis” (Richards, 2001), “learner training” (Wenden,
1995), and “learning styles” (Brown, 2007). Among these types of learner-centeredness is
self-directed learning (Nunan, 1988), a concept that is considered as the theoretical
underpinning of self-assessment (Ekbatani & Pierce, 2000).
Self-directed Learning
Prior to discussing the impact of self-directed learning on introducing self-assessment
into L2 learning, a key issue concerning potential misconceptions between self-centered and
self-directed learning is noteworthy. Self-centered learning “takes account of learners’ needs
or preferences” (Benson, 2012, p. 33); nevertheless, it does not necessarily involve giving
students some control on their learning nor does it include obtaining their consultation about
that learning (Brown, 2007). On the other hand, learner-directed learning involves learners’
planning, monitoring, and evaluating of their learning (Garrison, 1997). Benefits of self-
directed learning include empowering learner autonomy (Benson 1995; McNamara & Deane,
1995), increasing learner motivation (Dörnyei, 1994), and promoting learner self-confidence
(Taylor, 1995), which have introduced self-assessment into language learning (Brown, 2010).
63
Self-assessment
Self-assessment was derived from the theoretical underpinnings of learner autonomy,
a principle that was first coined by Henri Holec (1981). This principle stresses the importance
of learners’ ability to be accountable for their own learning including setting their learning
goals, monitoring their performance, taking learning decisions, and developing their
motivation, which all reflect the characteristics and values of self-assessment (McNamara &
Deane, 1995). Thus, self-assessment has dramatically influenced second language assessment
in many ways including contributing to placing L2 learners into proper levels, reporting their
potential areas of strengths and weaknesses, providing them with feedback, and assessing
their attitudes (Saito, 2003). However, in its initial uses, as Brown (2010) noted, self-
assessment was deemed as “an absurd reversal of politically correct power of relationships”,
contending that some L2 learners, especially novice learners, are unlikely to be capable of
reporting an accurate assessment of their own performance (p. 144). On the contrary, research
has outlined a myriad of empirically proven benefits of self-assessment in second language
learning, which will be discussed thoroughly in this paper. However, what best accounts for
the efficacy of self-assessment instruments depends on examining how accurate they are
compared to traditional assessments (e.g. teacher-made tests).
Accuracy of Self-assessment
Numerous studies have investigated the effectiveness and accuracy of self-assessment
in second language learning (Andrade & Valtcheva, 2009; Bachman & Palmer, 1989;
Blanche & Merino, 1989; Butler, 2010; Cassidy, 2001; Garrison, 1997; Jassen-van Dieten,
1989; LeBlanc & Painchaud, 1985; McDonald & Boud, 2003; Oscarson, 1997; Taras, 1995,
2001). Most of these studies were correlational-oriented, through which two forms of
assessment are compared in order to yield any potential correlation or lack of it.
64
Correlations
Drawing on Raasch’s (1979) study, von Elek (1982) argued that the validity of self-
assessment can be similar to or at least not substantially lower than that of traditional
assessment instruments. In the same vein, LeBlanc and Painchaud (1985) found evidence of a
relatively positive correlation between a self-assessment instrument and a standardized
English proficiency exam (r = .53). Likewise, in her pilot study, Jassen-van Dieten (1989)
compared a self-assessment format with that of a traditional placement test of Dutch as a
second language though she focused only on grammar and reading skills. Results
demonstrated a higher correlation between the two instruments (ranging from .60 to .79). She
concluded that, out of 25 students, 21 were able to place themselves in the same levels into
which the traditional placement test had placed them.
In the same manner, Bachman and Palmer (1989) suggested that self-assessment “can
be reliable and valid measures of communicative language abilities” (p. 22), indicating that
their techniques had demonstrated unexpected high reliability. Moreover, Cassidy (2001)
reported higher correlations (r = .87 to .97) between students’ self-reported scores and their
actual SAT scores. In addition, in his meta-analysis validation study, Ross (1998) found that
self-assessment techniques could provide high validity, suggesting that “the degree of
experience learners bring to the self-assessment context influences the accuracy of the
product” (p. 16). However, he also noticed that the participants were more accurate in
assessing their receptive skills (listening and reading) than those of their productive skills
Discrepancies
Other studies have reported conflicting conclusions. For example, unlike her
aforementioned study that was limited to assessing students’ grammar and reading skills,
Janssen-van Dieten (1989) later conducted a study on all four language skills - listening,
65
speaking, reading, and writing - to obtain a broader perspective of any potential correlation
between the two instruments. She reported little relationship between self-assessment and
previously validated proficiency tests. Moreover, Wesche, Morrison, Ready and Pawley
(1990) found that self-assessment did not show any statistically significant correlation with
traditional tests. Furthermore, Pierce, Swain and Hart (1993) found weak correlations
between self-assessment proficiency tests and traditional assessments of students attending
French immersion programs. In addition, Wesche (1993) found that “placement via self-
assessment was extremely unreliable” under any circumstances (p. 15).
Reasons accounting for the discrepancies between self- and traditional assessment
tools vary depending on certain factors including but certainly not limited to “the linguistic
skills and materials involved in the evaluations” (Blanche & Merino, 1989, p. 315). Other
factors may include the type of language tasks used in both instruments, the degree of task
authenticity, learners’ levels of proficiency, cultural differences, and exposure to self-
assessment (Blanche, 1988; Coombe, 1992; Oscarson, 1989; as cited in Wolochuk, 2009).
How Can the Accuracy of Self-assessment Be Achieved?
Despite the fact that self-assessment has received heated discussions in the literature,
many researchers have proposed effective implications that help validate self-assessment
techniques. Harris (1997), for example, suggests that self-assessment rubrics be clearly
specified in advance so that students can pinpoint their areas of strengths and weaknesses.
Moreover, students should receive adequate training on using self-assessment rubrics in order
for them to gain insight into the target evaluation criteria (Taras, 2003). However, Brown and
Hudson (1998) argue that self-assessment might be more effective when used for research
rather for placement or diagnostic purposes since the former is unlikely to make learners
over- or underestimate their performance.

66
Common European Framework of Reference (CEFR)
In 1996, the Council of Europe published the first version of the Common European
Framework of Reference (CEFR) (Council of Europe, 1996b). CEFR is an international
language standard that describes language proficiency “through a group of scales composed
of ascending level descriptors couched in terms of outcomes” distributed at six different
levels: A1 and A2, B1 and B2, C1 and C2 (Weir, 2005, p. 281). These six levels fall into
three broad levels of proficiency: Basic User (A1, A2), Independent User (B1, B2), and
Proficient User (C1, C2) (See Appendix F). Each of these levels “attempt[s] to specify as full
a range of language knowledge, skills and use as possible” through which CEFR users can
identify their levels of language proficiency (Council of Europe, 2001, p. 7). As of 2014, the
website of Council of Europe indicates that CEFR language proficiency descriptors have
been translated into approximately 39 different languages (Council of Europe, 2014).
In a long-term research project, which is widely recognized as the ‘Can-do’ Project,
the Association of Language Testers in Europe (ALTE) developed and validated user-
oriented and performance-related scales anchored to CEFR levels “to establish a framework
of ‘key levels’ of language performance, within which exams can be objectively described”
(Council of Europe, 2001, p. 244). This led to developing the CEFR self-assessment scales
with six different levels. Each level consists of Listening, Reading, Spoken Interaction,
Spoken Production, Strategies, Language Quality, and Writing. According to the CEFR self-
assessment rubric, in order for CEFR users to reach a given level, they would need to respond
as ‘I-can’ to more than 80% of the given level’s ‘Can-do’ statements. That is, after reading
CEFR self-assessment rubric language descriptive tasks (statements), users should assess
their abilities to perform these tasks by ticking either ‘I-can-do’ or ‘I-can-not-do’. If a user’s
overall number of ticked ‘I-can-dos’ are 80% or above, this places him/her in the given level.
67
Validity of the CEFR Self-assessment Rubric
The validity of the CEFR self-assessment rubric has long been investigated in the
literature. For example, Alderson et al. (2004) contend that although CEFR itself provides
comprehensive language descriptors, its ‘I-can-do’ “scales provide a taxonomy of behaviors
rather than a theory of development in listening and reading activities” (p. 3). In addition,
Huhta et al., (2002) argue that “the theoretical dimensions of people’s skills and language use
which CEFR discusses are on a very high level of abstraction.” (p. 133). In investigating the
validity of CEFR self-assessment used by refugees in Ireland, Little, Lazenby Simpson, and
O’Connor (2002) found that “evidence of the difficulty that learners encounter in using the
CEFR to maintain on-going reflective self-assessment suggests a need for more detailed
descriptions of proficiency relevant to particular domains of language learning” (p. 64).
Moreover, Jones (2002) points out that “different people tend to understand ‘Can-do’
somewhat differently” (p. 181), thereby creating discrepancies in their self-rated scores. All
of the above raise legitimate concerns about the validity of CEFR self-assessment rubric.
On the contrary, Little (2006) argues that one of the advantages of the CEFR is its
ability “to bring curriculum, pedagogy and assessment into much closer interdependence than
has usually been the case” (p. 382). Despite what was reported by Huhta et al. (2002) above,
this multi-tasking ability engages CEFR users in ‘action-oriented’ language scenarios through
which they can demonstrate their language abilities (Council of Europe, 2001). Moreover,
drawn upon the (Basic Interpersonal Communication Skills) and (Cognitive Academic
Language Proficiency), North (2007) argues that the CEFR distinguishes between productive
and receptive language use by dividing the former into interaction and production, resulting
in “34 illustrative scales for listening, reading, oral production, written production, spoken
interaction, written interaction, note-taking, and processing text” (p. 646).

68
METHODOLOGY
The researcher, as a previous EFL student and current Teaching Assistant at a
university EFL department in Saudi Arabia, has first-hand experience regarding the lack of
using alternative assessment techniques in many EFL departments in Saudi Arabia. Thus, this
article emphasizes the importance of incorporating self-assessment rubrics into classroom
assessment processes as an alternative assessment technique. In other words, it recommends
that substantive efforts be exerted to use alternative assessments by not only L2 programs in
Saudi Arabia, but also in many other L2 contexts. The study involved 21 ESL students; 18 of
them are Saudis who represent a sample of a larger population of Saudi L2 learners.
The ultimate purpose of conducting this research was to promote incorporating self-
assessment techniques into EFL classroom assessment processes. Because some EFL
departments in Saudi Arabia fail to integrate alternative assessment techniques into
assessment processes, this paper aims to promote the use of alternative assessment
techniques, in particular self-assessment, not only in EFL contexts in Saudi Arabia, but also
in other L2 contexts. This paper examines the accuracy of ESL students’ self-rated scores
obtained by CEFR rubric compared to recently obtained TOEFL scores. Finally, the paper
will provide EFL departments in Saudi Arabia and other L2 practitioners with implications
on how self-assessment techniques can be effectively implemented in L2 settings.
Research Questions
1. Do ESL students’ self-rated scores correlate with their TOEFL scores? If not, why?
2. Are gender and levels of proficiency major influential factors for causing any potential
score underestimation?
3. What are ESL students’ attitudes towards a ‘Can-do’ self-rating rubric?

69
Research Tools
A mixed-methodology approach was used to collect data for this study using three
main research tools. First, there was a web-based CEFR self-assessment rubric (Appendix B),
which has already been mapped onto the TOEFL proficiency levels. It would, however, be
difficult to use the whole rubric, for it contains 227 ‘I-can-do’ and ‘I-cannot-do’ CEFR
statements, which can be time-consuming to use in its entirety and possibly inapplicable in
some situations. Thus, only levels B1, B2 and C1, which are equivalent to the TOEFL
intermediate, upper-intermediate, and advanced proficiency levels, respectively, were
selected. The second research tool was semi-structured interviews conducted with those who
provided their emails to participate in the second round of data collection (Appendix H). The
third research tool was participants’ most recent TOEFL scores, which were obtained based
on student consent.
Participants
The participants of this study were 24 (14 male and 10 female) ESL students
attending CESL (Center for English as a Second Language), University of Arizona. Their
levels of proficiency were intermediate (n = 7), upper-intermediate, (n = 9) and or advanced
(n = 8). In addition, they were attending CESL levels 3 through 7 in a seven-level program.
Their nationalities, as shown in Table 6, are Saudi (n=18), Chinese (n=3), Mexican (n=2),
and Qatari (n=1). For confidentiality purposes, their names were coded by their initials,
which they were requested at the beginning of the study rubric. Three of the participants were
excluded from the analyses because they did not provide their TOEFL scores, nor did they
provide their emails, leaving 21 participants. Moreover, after contacting those who provided
their emails, ten out of these participants agreed to participate in the interviews in order to
allow them to expand more on their attitudes towards using CEFR self-assessment rubric.
70
Table 6
The Participants in This Study
No. Gender Level of English Proficiency

1 Female Intermediate
2 Male Intermediate
3 Female Intermediate
4 Male Intermediate
5 Male Intermediate
6 Male Upper-intermediate
10 Female Upper-intermediate
11 Female Upper-intermediate
14 Male Advanced
15 Female Advanced
16 Female Advanced
17 Female Advanced
18 Female Advanced
19 Male Advanced
20 Female Advanced
21 Male Advanced
Background on Site
The Center for English as a Second Language (CESL) is a nationally accredited IEP
located at The University of Arizona, in Tucson, Arizona, USA (CESL, 2014). Its IEP
consists of seven levels; each level is comprised of an eight-week session leading to optional
bridge programs for either undergraduate or graduate students It is also one of the ESL
71
programs highly recommended by SACM (SACM, 2015). CESL provides three types of
programs. First, there are the English language programs, which include eleven programs:
ESL graduate and undergraduate bridge programs, evening programs, intensive English (full-
and part-time) programs, local portable classes, online programs, skill classes, skill intensive
workshops, a teen English program, and tutoring. Second, it provides two different teacher
training programs: general teacher training and Content Area Teacher Training (CATT)
(CESL, 2014). Third, it offers customized programs.
Procedures
A randomly selected group of CESL students, who were soon going to take or had
just taken the TOEFL, were recruited. To encourage them to participate in the study, they
were offered a free one-time proofreading for one of their class papers (10 pages or less). The
recruitment flyer (Appendix I) was sent to these students’ emails by CESL the IEP
coordinator. The flyer contained links to the self-assessment rubric, which correlated to the
three levels of English proficiency that the participants could select (intermediate, upper-
intermediate, or advanced) to self-determine their current CESL levels. The participants were
also told that they should use the rubric two weeks at most prior to or right after taking the
TOEFL test in order to avoid any potential effect of learning progress between the study’s
two sessions. Next, they were asked for their permission to provide their TOEFL scores. The
first part of the rubric explained in detail the entire self-rating process in order to ensure that
the participants understood it fully.
After that, participants’ self-rated scores were re-examined to ensure that they
satisfied the study’s conditions. For example, those who did not provide their TOEFL scores
were contacted via email a month later because two weeks were needed as a maximum
timeframe between the two sessions, and two other weeks were required for TOEFL scores to
72
be released by ETS. Fortunately, sixteen participants had taken the TOEFL test on January
30, 31, or February 7, 2016, before they used CEFR self-assessment rubric in mid- and late-
February. Hence, they provided their TOEFL scores before they performed the self-
assessment rubric. As for those who did not provide their TOEFL scores, two of them
responded and provided their TOEFL scores, whereas the remaining three participants did not
respond and were thereby excluded from the study. Finally, those who provided their emails
in the first session were asked via email to participate in the semi-structured interviews. Two
days later, 10 out of 17 participants agreed to participate in the interviews through Skype,
email, or Tango.
Before analyzing the data, the CEFR, TOEFL, and IELTS equivalency table
developed by (Tannenbaum & Wylie, 2007) was used to compare participants’ self-rated
scores against their TOEFL scores (See Appendix F) At first, the researcher attempted to
color-code each level of the equivalency table; nevertheless, this strategy appeared to be very
confusing and even seemingly misleading (see Appendix J). As a result, given that the study
has only three levels of English proficiency (intermediate, upper-intermediate, advanced), the
Basic and Proficient levels were excluded from the study equivalency table in order to make
the table readable and more meaningful. Table 7 illustrates how the range of scores of the
TOEFL and IELTS are constructed to correspond to CEFR different levels of language
proficiency.
Table 7
The Study Adapted CEFR, TOEFL, and IELTS Equivalency Table
Levels of Proficiency CEFR TOEFL IBT IELTS

Intermediate B1 30 – 64 3.0 – 4.0
Upper-intermediate B2 65 – 78 4.5 – 5.0
Advanced C1 79 – 95 5.5 – 6.0
73
After that, the 80% ratio needed for meeting each level was determined. For example,
since intermediate and upper-intermediate levels have 42 statements, their ratio is (36.6). In
the same vein, the advanced level has 37 statements; thus, its ratio is (29.6). After that,
intermediate and upper-intermediate participants’ self-rated scores were divided by (36.6)
and the self-rated scores of advanced participants were divided by (29.6) so that they can be
entered into ANOVA as ratios. One crucial issue is that the participants were not told that if
they have achieved 80% of ‘I-can-do’ statements, this would indicate that they met the level.
This was done to avoid any potential effects of participants focusing on reaching 80% instead
of rating their actual level of proficiency, which might impact their overall self-rating
performance.
FINDINGS
1. Do ESL students’ self-rated scores correlate with their TOEFL scores? If not, why?
Data were analyzed first using a two-factor ANOVA with participants’ gender (male,
female) and level of English proficiency (intermediate, upper-intermediate, advanced) as the
factors (see Table 8). Participants’ TOEFL scores were mainly used as a covariate in each of
the analyses described below. The findings indicate that the participants’ self-rated scores did
not correlate with their TOEFL scores or with their levels of proficiency. For example, the
analysis showed that the gender effect was not significant F(1, 14) = .027, p = .87, and the
level of English proficiency was not significant either: F(2, 14) = 1.81, p = .335. This is
because there were only three participants (two intermediate and one upper-intermediate)
who had over 80% of ‘I-can-do’ statements, suggesting that they had probably reached these
levels.
74
Table 8
Findings of UNIANOVA Ratio by Gender Level of Proficiency with the TOEFL
Type III Sum Observed

Source Df Mean Square F Sig.
of Squares Powerb
Corrected Model .367a 6 .061 1.048 .436 .284
Intercept .083 1 .083 1.419 .253 .199
Gender .002 1 .002 .027 .873 .053
Level of proficiency .138 2 .069 1.182 .335 .217
TOEFL .149 1 .149 2.553 .132 .319
Gender * Level of proficiency .209 2 .104 1.790 .203 .311
Error .816 14 .058
Total 14.337 21
Corrected Total 1.182 20
This statistical analysis did not, however, reveal the extent to which each participant’s
self-rated score was accurate. In other words, although the findings concluded that there is no
correlation between participants’ self-rated and TOEFL scores, they did not show how many
participants reached, were very close, close, somewhat close, far, or very far from reaching
the assigned ratio of 80%. To obtain a fuller explanation of these findings, a descriptive
analysis was applied in order to identify participants’ self-rated performance individually.
Therefore, data were examined descriptively in order to gain further insight into these
findings. As discussed earlier, in order for participants’ self-rated scores to correlate with
their TOEFL scores, the participants should have at least 80% or more of the responses
selected in ‘I-can-do’ column to reach the level in which they participated. As shown in Table
9, only three out of 21 participants were able to obtain over the assigned ratio, achieving
100% (42/42), 85% (36/42), and 95% (40/42) respectively, suggesting that their self-rated
scores correlated with their TOEFL scores.
75
Table 9
Data of the Participants Who Reached or Were Very Close to Reaching the Assigned Ratio
Name Gender Level of English TOEFL/IELTS CEFR Self-rated Score Overall %

Proficiency Score # of I can do % obtained
1 Female Intermediate 90 42 100% 42/33.6
2 Male Intermediate 39 36 85% 36/33.6
3 Male Upper-intermediate 72 40 95% 40/33.6
4 Male Upper-intermediate 36 33 78% 33/33.6
5 Female Advanced 70 29 78% 29/29.6
6 Female Advanced 65 28 75% 28/29.6
Furthermore, the data showed that only three participants (4, 5, and 6) were very close
to 80%: 78% (33/33.6), 78% (29/29.6), and 75% (28/29.6). On the other hand, how close or
far the remaining self-rated scores from the assigned ratio (80%) varied depending on
participants’ self-rated performance. For example, they ranged from close (73%=31/33.6 and
70%=26/29.6), somewhat close (67%=25/29.6), far (59%=25/33.6 and 50%=21/33.6), to very
far (40%=15/29.6 and 28%=12/33.6)3. These percentages suggest that participants’ self-rated
scores were not normally distributed. In other words, they do not reflect participants’
determined levels of language proficiency, nor are they consistent with participants’ TOEFL
scores. In order to account for these discrepancies, a randomly selected sample of the
participants (10) was engaged in semi-structured interviews.
The interviews were designed in a way that could help provide further insight into the
accuracy or inaccuracy of participants’ self-rated performance. For example, any external
factors that could have potentially affected participants’ self-rated scores were included in the
3
For full details of each participant’s self-rated scores, look at Appendix (K)
76
interview questions. This included how many times have participants taken the TOEFL test
(test familiarity), if they ever used a self-assessment rubric before (lack of exposure), and
other questions that could help answer the other part of the first question (If not, why?). It
was found that three out of 10 participants had taken the TOEFL test for the first time,
whereas the remaining seven participants had taken the test more than two times. This
suggests that most of the participants have background knowledge about the TOEFL test,
which is likely to minimize any potential effects on their TOEFL scores.
Moreover, none of these 10 participants had ever used any self-assessment measures
before. This calls into question the extent to which participants’ lack of adequate exposure to
self-assessment rubrics may have led to lack of correlation between their self-rated TOEFL
scores. This is supported by the findings that most of the participants’ TOEFL scores were
consistent with their current CESL levels as shown in Table 10.
Table 10
Consistency of Participants’ TOEFL Scores with Their CESL Levels
Participants’ Current CESL Levels Participants’ Range of TOEFL Scores
Advanced 79 – 95
Upper-intermediate 65 – 78
Intermediate 30 – 64
To identify why the majority of participant’s self-rated scores are not consistent with
their levels of language proficiency or TOEFL scores, the participants were also asked which
measure scores, TOEFL or self-rated, they believe to be more accurate. All 10 participants
argued that their self-rated scores are more accurate in reflecting their actual levels of English
proficiency; nevertheless, this still does not answer the second part of the first question.
77
To account for why 18 out of 21 participants obtained lower self-assessed scores,
empirically derived evidence is needed such as observing participants’ performance on the
three measures: TOEFL, level of proficiency, and self-assessment. Since this study did not
intend to collect such empirical evidence, the participants who were interviewed were asked
to what extent they consider their self-rated scores accurate. All 10 participants indicated that
they feel that their self-rated scores reflect their actual levels of English proficiency. After
that, each participant was provided with a table showing his or her level of proficiency,
TOEFL scores, and self-rated scores and was asked what s/he thought of them. Although all
of the ten participants, except one, obtained lower self-rated scores compared to their TOEFL
scores, they did not show any form of embarrassment or frustration.
However, seven out of 10 participants indicated that they did not expect their self-
rated scores would be very low. After being shown their different scores, three of the
participants attributed their low self-rated scores to the fact they often become uncertain to
choose ‘I-can-do’ or ‘I-can-not-do’ statements, as they find them dichotomous. Therefore,
they decided to underestimate their scores. For example, one of the participants indicated that
once he read the survey instructions, he decided to assess his language abilities accurately
without any form of bias. This, therefore, has led some participants to underrate themselves.
Moreover, another participant argued that when the CEFR self-assessment rubric asked her
about language tasks that she once performed, she found no difficulty choosing ‘I-can-do’ or
‘I-can-not-do’ statements based on her previous background. Nevertheless, when asked about
language tasks that she has never encountered before, she spent some time visualizing herself
performing this task, and then selected ‘I-can-not-do’ statements. Another participant asserted
that when he wanted to select ‘I-can-do’, he remembered his reluctance to interact with native
78
speakers and ended up choosing ‘I-can-not-do’ statements, a CEFR problem that Weir (2005)
called channel for the communication.
Having indicated that none of them had ever used a self-assessment rubric for
proficiency purposes, the participants were asked about any other factors that have could
have potentially impacted their self-rated performance. Although the self-assessment task
was explained to them in writing, the participants pointed out that they encountered some
difficulties using the CEFR self-assessment rubric, especially during the first parts of the
rubric. This suggests that engaging students in performing self-assessment tasks might not be
as effective unless the tasks are demonstrated for them through modeling or scaffolding
(Tara, 2003). For example, McDonald and Boud (2003) divided their study participants into
experimental and treatment groups. The former received formal training on using self-
assessment rubrics, while the latter received no training. The findings revealed that the
experimental group “outperformed their peers who had been exposed to teaching without
such training in all curriculum areas” (McDonald & Boud, 2003, p. 217).
Another interesting finding accounting for participants’ low self-rated scores was the
ambiguity of some statements that some participants faced. For example, one participant
argued that during responding to some of the statements, she found difficulty understanding
the given language task fully. Another participant pointed out that some statements have
unknown or confusing words (especially verbs), which makes it difficult for them to respond
to the statement accurately. This is consistent with previous literature that urges CEFT self-
assessment rubric developers to take the synonyms they use into considerable account. For
example, Alderson et al., (2004) argue that B2 level has several different verbs are used
interchangeably to describe language scenarios relating to comprehension such as scan,
locate, monitor, identify, and so forth. The researchers express some concerns whether these
79
verbs are simply “stylistic synonyms” or they indicate “real differences in cognitive
processes” (p. 9). During the interviews, one intermediate and two upper-intermediate
participants indicated that they were unable to identify the meanings of some verbs.
2. Do gender and level of English proficiency cause any form of score underestimation?
Given that the number of participants was relatively small, the two main effects
(gender and levels of proficiency) were not significant, and that only three participants were
able to do more than 80% of the ‘Can-do’ statements, it would be rather difficult to run an
analysis of variance to identify whether gender and levels of proficiency have a substantial
impact on CEFR users’ score underestimation. However, the only statistical analysis that can
help provide some yet inadequate, explanation about any potential association between these
two factors and CEFR users’ score underestimation is by using a scatter plot.
Figure 2. Scatterplot (BIVAR)=TOEFL with Ratio by Level of Proficiency for Male and
Female.
As shown in Graph 1, the overall pattern of the plot shows that participants’ gender
and levels of proficiency scattered randomly, as the correlation is very weak. For example,
scores of intermediate and upper-intermediate participants scattered everywhere, acting
80
completely different from each other. On the other hand, although the correlation of advanced
participants’ scores is also weak, they are less randomly distributed, suggesting that the more
advanced participants obtain higher TOEFL scores, they more their self-rated scores will
increase. As for gender, the findings suggest that the more female participants obtain higher
TOEFL scores, the more their self-rated scores increase. However, this does not help draw a
clear conclusion about the impact of these two variables on score underestimation.
The other option was to examine data descriptively to identify any potential effect of
gender and levels of proficiency in causing score underestimation. To do so, a cutoff score
should be first established indicating that a user’s self-rated score is underrated compared to
his/her TOEFL scores. For example, if a participant placed herself in advanced level but her
‘I-can-not-do’ statements in level C1 dramatically outnumbered their ‘I-can-do’ counterparts,
what is the cutoff score that suggests that this user should be in a lower level (e.g. B2 or B1).
To the best of the researcher’s knowledge, no cutoff score has ever been identified to indicate
the lowest performance in CEFR self-assessment rubric as opposed to the 80%. Thus, due to
the lack of a previous established minimum cutoff score in addition to not having internal
reliability estimated by Cronbach's alpha, this option was not pursued further.
The following table contains the participants who achieved less than 60%, taking into
account that this is a random rather a cutoff percentage. It can be noticed that there are ten
participants who were either far (59% - 41%) or very far (40%28%) from the assigned ratio
(80%). Even if all external variables such as test anxiety or inadequate experience in using
self-assessment rubrics are controlled, the data in Table 11 still do not reveal any pattern
indicating that gender and levels of proficiency cause any form of score underestimation.
Therefore, it can be concluded that because the aforementioned approaches did not reveal
clear findings, further data are needed so that valid conclusions can be drawn.
81
Table 11
Participants’ Self-rated Scores Lower Than 60% of the Assigned Ratio
Gender Level of English CEFR TOEFL CEFR Self-rated Score Overall %
Proficiency Levels Score # of I can do % obtained
Female Intermediate B1 46 25 59% 25/33.6
Male Intermediate B1 36 23 54% 23/33.6
Male Intermediate B1 32 12 28% 12/33.6
Male Upper-intermediate B2 42 25 59% 25/33.6
Female Upper-intermediate B2 63 13 30% 13/33.6
Male Advanced C1 78 19 51% 19/29.6
Female Advanced C1 78 15 40% 15/29.6
Male Advanced C1 68 14 37% 14/29.6
3. Participants’ Attitudes towards the CEFR Self-assessment Rubric
Finally, the participants were asked about their attitudes towards using the CEFR self-
assessment rubric. In addition to what have been discussed earlier, the participants provided a
wide range of interesting insights from which some implications can be drawn. All ten
participants consensually indicated that they found the CEFR self-assessment rubric more
effective in determining their levels of English proficiency. For example, two participants
argued that the rubric enabled them to pinpoint both their strengths and weaknesses. Another
participant asserted that the rubric allowed him to assess his abilities to demonstrate his
language skills in a performance-based manner as opposed to the TOEFL test. In addition,
another participant pointed out that one of the advantages of the CEFR self-assessment rubric
82
is that it does not only focus on users’ four language skills, but it also includes other skills
including interaction, strategies, and language quality. Such skills, as another participant
noted, helped her visualize herself perform certain tasks inside and outside the classroom.
However, in addition to the limitations discussed at the beginning of this study, some
participants also provided some drawbacks of using self-assessment rubrics. For example,
one participant, as indicated earlier, argued that some of the rubric statements are
dichotomous, which makes her uncertain which ‘Can-do’ choice she should select. Moreover,
another participant indicated that using the rubric for purposes other than grades made her in
some cases do not take the assessment seriously. Furthermore, two intermediate participants
provided interesting comments in that although they can perform some of the rubric language
tasks, they often found themselves not motivated enough to select ‘I-can-do’ choice. In
addition and more importantly, the participants complained about the difficult level of some
words used in the rubric statements. For example, one of the participants argued that he
responded to some prompts without fully understanding the task. Yet, all of the participants
concluded that they are interested in using the CEFR self-assessment in classroom settings.
Discussion and Conclusion
Self-assessment has long been used in L2 learning contexts for various different
purposes (Ekbatani, 2011; Cassidy, 2001; Harris, 1997; LeBlanc & Painchaud, 1985;
McNamara, 1995). However, the extent to which self-assessment rubrics correlate with
traditional measures is still controversial. This study explored the experience of 21 ESL
participants in using the CEFR self-assessment rubric in order to identify whether their self-
rated scores correlate with their TOEFL scores, whether gender and levels of proficiency play
an influential role in causing score underestimation, and the possibility of incorporating this
rubric into L2 classroom assessment procedures. The findings indicated that 18 out of 21
83
participants were unable to rate themselves to the level reflecting their TOEFL scores and
CESL levels, even though the latter were found to be consistent. Moreover, the small number
of participants, lack of significance of the two main effects, and the fact that only three
participants were able to rate themselves to the given levels did not help obtain adequate data
to conclude whether gender and level of proficiency have major effects in score
underestimation.
Nevertheless, based on their responses during the interviews, ten participants showed
great enthusiasm and high motivation to use self-assessment rubrics in L2 learning settings.
In addition, they provided valuable insights about their experience in using the CEFR
assessment rubric, which can be very beneficial for L2 practitioners for promoting the
integration of self-assessment rubrics into classroom contexts. Generally, it can be concluded
that self-assessment rubrics can be very effective if certain conditions are met. For example,
students should be trained on using these rubrics efficiently in order to avoid any potential
discrepancies between their self-rated scores and their actual levels of proficiency. Moreover,
self-assessment rubrics can be a complement to traditional assessment methods until
empirically evidence is obtained indicating their high reliability.
TOEFL-related Limitations
One of the main limitations of this study was that the participants were not asked if
their reported TOEFL scores were obtained the first time they took the test. This is important
because familiarity and more practice typically increase test scores, and we are not certain
about the correlation between the participants’ TOEFL scores and their level of proficiency.
Although seven out of 10 participants who were interviewed indicated that they have taken
the TOEFL test more than two times, this does not necessarily mean that their TOEFL scores
were not impacted by any external factors. For example, test anxiety is a critical factor that
84
affects test-takers’ performance on any test, especially high-stake tests. That is, when taking
the self-assessment rubric, it is unlikely that the subjects had high-test anxiety, as the task
they were performing was for research rather for grade purposes. Thus, concurrent validity is
needed.
Another limitation is concerned with the limited number of participants (21), which
calls into question the appropriateness of drawing valid and generalizable conclusions about
the accuracy and effectiveness of self-assessment rubrics. In addition to the limited number of
participants, only three out of 21 participants were able to select more than 80% of the ‘I-can-
do’ statements, a problematic issue that which poses critical questions. Moreover, in order to
gain valid and generalizable findings, the participants should have been divided into two
groups: an experimental group, which receives training on the CEFR self-assessment rubric,
and a treatment group, which receives no training. This would certainly help draw more
accurate conclusions. Furthermore, time constraints are also another limitation of this study.
That is, the participants were not given certain time constraints to complete the CEFR self-
assessment rubric. This is unlike many assessment measures where time allotment is one of
the key aspects of evaluating students’ performance.
Implications for ESL/EFL Program Administrators
ESL program administrators should first compare their program levels to a previously
validated measure (e.g., TOEFL, IELTS, ITEP) against which students’ self-rated scores can
be compared. In this way, student performance in various measures (traditional assessment
tools, self-assessment rubrics, achievement tests) can be observed and compared. It is also
recommended that self-assessment measures be incorporated into ESL program assessment
procedures from early levels so that students’ progress throughout the subsequent levels can
85
be effectively monitored. This will help ESL practitioners and self-assessment researchers
identify the extent to which level of language proficiency impact users’ self-rated scores.
Implications for ESL/EFL Teachers
One of the key implications for ESL/EFL teachers is training students on using self-
assessment rubrics. Teachers can start by having students use a small version of any self-
assessment measure to rate their performance on a certain language task. As a result, with the
continuous use of self-assessment rubrics, students will be able to identify their areas of
language strengths and weaknesses in that the more they practice using self-assessment
rubrics the more they will become accurate in identifying their levels of proficiency.
Moreover, the CEFR ‘Can-do’ rubric is highly recommended to be used for a pre- post
measure purposes through which students’ learning progress can be identified.
Implications for Future Research
Future researchers are advised to recruit a larger number of participants so that they
can be able to increase statistical power. Moreover, they should also divide participants into
two main groups: 1) an experimental group and 2) a treatment group in order to explore the
impact of training on students’ self-assessment performance. In addition, self-assessment
rubrics can be introduced in participants’ L1 to identify their L2 skills. Finally, researchers
can look at participants’ language skill subsets (reading, listening, speaking, writing).
86
CHAPTER 4 QUALITY ASSURANCE AND ACCREDITATION AS FORMS FOR
LANGUAGE PROGRAM EVALUATION: A CASE STUDY OF TWO EFL
DEPARTMENTS IN A SAUDI UNIVERSITY
87
ABSTRACT
In 2004, the Saudi Ministry of Higher Education, currently named the Ministry of
Education, established the National Commission for Academic Accreditation and Assessment
(NCAAA) to ensure that Saudi higher institutions adhere to predetermined national standards
and frameworks. This led to a paradigm shift from institutional-oriented to outcome-based
education. As a result, many programs in Saudi Arabia seek quality in order to obtain
academic accreditation. As an initial step to assure quality, this paper provides a simulated
evaluation of two EFL departments at a Saudi University, which will be referred to as X
University throughout the paper. The evaluation processes were based on a review using a
combination of integrated standards of the NCAAA and CEA (the Commission on English
Language Program Accreditation). Data were collected qualitatively through surveys and
interviews with students, faculty, and quality assurance coordinators. Moreover, the mission,
goals, objectives, curriculums, teaching strategies, assessment techniques, and learning
outcomes of the two departments were also evaluated.
The findings indicated that the two EFL departments appear to partially meet the
standards based on their the mission, curriculum, student learning outcomes (SLOs), and
program development, planning, and review, whereas the standards for teaching strategies,
assessment methods, and student achievement were not met. At the end of the paper, some
concrete suggestions for improvement are provided that should help the two departments
address areas of weaknesses in order to be ready to obtain academic accreditation from CEA
and NCAAA in the future.
Key words: NCAAA, CEA, quality assurance, academic accreditation, program review,
curriculum, and male and female-section departments, EFL.
88
INTRODUCTION
Brown (2007) points out that “no curriculum should be considered complete without
some form of program evaluation” that corresponds with its changing conditions and
emerging developments (p. 158). Research suggests that academic programs be evaluated on
a constant basis to ensure that their learning outcomes are achieved effectively (Ellis, 1993;
Halliday, 1994; Mercado, 2012; Royse et al., 2009). Pennington and Hoekje (2010) point out
that program evaluation can be undertaken on a formal and informal basis, where the former
may include external review and accreditation, and the latter may involve ongoing internal
monitoring, SWOT (Strengths, Weaknesses, Opportunities, Threats) analysis, and internal
quality control. Royse, Thyer, and Padgett (2009) argue that program evaluation is
characterized as “a practical endeavor, not an academic exercise, and is not primarily an
attempt to build theory” (p. 2). In other words, it is a process of evaluating a program using
research tools to improve its outcomes.
In recent decades, numerous researchers have reported a growing interest in quality
assurance and accreditation at higher education institutions in both developed (MacKay,
1998; Mercado, 2012; Stensaker & Rosa, 2007; Van Damme, 2004; Westerheijden) and
developing countries (Sallis, 2002; Shawer, 2013; Smith, & Abouammoh, 2013). The
relationship between program evaluation on one side and quality assurance and accreditation
on the other has long been recognized in higher education contexts as one of the key
impetuses for program evaluation that “takes the form of accreditation-mandated student
learning outcomes assessment” and quality control (Norris, 2016, p. 173). As a result and in a
response to the rapidly growing quality assurance and accreditation interests, this paper
evaluates quality assurance processes at two EFL departments in a Saudi Arabian university
89
using an integrated set of accreditation standards of NCAAA and CEA to obtain accreditation
from them in the future.
What Is Program Evaluation?
Research has documented a wide range of different, albeit interrelated, definitions of
program evaluation. From the early 1980s to the present, these definitions have revolved
around one central theme even though other considerations have been suggested for the term
program evaluation. For example, Palmer (1992, p. 144) defines program evaluation as a
process of finding out whether a program is “feasible” in terms of its curriculum (practicality
issue), “productive” in terms of producing the intended learning outcomes (validity issue),
and “appealing” in terms of responding to real-life language scenarios (authenticity issue).
Barker (2003) defines program evaluation as a “systematic investigation to determine the
success of a specific program” (p. 149). In a broader view, this investigation should involve
examining the efficiency of “the individual components of [any] program in relation to each
other and to contextual factors, goals, criteria of value” (Pennington & Hoekje, 2010, p. 262).
Thus, program evaluation involves thorough assessments of the targeted program’s
performance.
Allen (2004) defines program evaluation as an ongoing process “for focusing faculty
attention on student learning and for provoking meaningful discussions of program
objectives, curricular organization, pedagogy, and student development” (p. 4). In this study,
the main focus is on the program evaluation form that leads to maintaining quality assurance
and thereby obtaining academic accreditation. This is because program evaluation processes
in Saudi Arabia have recently received substantial attention for accreditation purposes. As a
result, this paper adopted Pennington and Hoekje (2010)’s comprehensive definition of
program evaluation as a systematic investigation of the extent to which a program adheres to

90
nationally and internationally determined standards, which in this case are the standards of
the NCAAA and CEA agencies, respectively.
LITERATURE REVIEW
Given that applied linguistics is a field that began in the 1940s (Kaplan, 2010),
program evaluation literature “within the field is quite scant” as well (Lynch, 1996, p. 12).
However, research has documented some evidence of early systematic evaluations in public
education. For example, the first documented program evaluation dates back to the 19th
century, when many federally-funded schools in the United States, Great Britain, and
Australia began evaluating school curricula (Rea-Dickins, 1994). The premise underpinning
the evaluation processes undertaken during that era was based on evaluating curricula
“through scrutiny of the competence and behavior of the teacher” (Kiely & Rea-Dickins,
2005, p. 18). Then, during the late 19th century, programs were evaluated quantitatively based
on data (e.g., student retention rates, learning outcomes) from a positivistic view (Gitlin &
Smyth, 1989). Nevertheless, program evaluation during these eras remained recognized as a
field inseparable from applied linguistics (Beretta, 1992; Lynch, 1991).
Program evaluation began to emerge as an independent field of applied linguistics
during the 1960s and early 1970s (Beretta, 1992; Cronbach, 1963, Keating, 1963; Lynch,
1990; Sherer & Wertheimer, 1964; Smith, 1970). According to Lynch (1996), the emergence
of program evaluation field was due to some researchers’ calls for more systematic
approaches to program evaluation that could provide us with “what counts as evidence” (p.
9). In other words, program evaluation should be used as an effective tool for providing
concrete evidence of the appropriateness and efficiency of teaching and learning of the
program being evaluated. As a consequence, many funding agencies expect program
administrators to provide them with some statistical information as proof indicating that a
91
program indeed achieves the desired learning outcomes that will allow it to continue to
receive funding (Gitlin & Smyth, 1989).
Foreign Language Program Evaluation
There was a plethora of studies on foreign language program evaluation from the
1960s through the 1990s (Brown, 1995). According to Beretta (1992), until the late 1980s,
most of the studies and discussions on program evaluation were limited to research papers,
and there were very few books addressing this area. This was, as Beretta (1992) puts it, due to
“the seemingly never-ending” quantitative versus qualitative research-method debate (p. 5).
Beretta (1992) reviewed most, if not all, of these studies, and the extent to which they
contributed to shaping the modern field of program evaluation. For example, one of the early
studies on foreign language program evaluation, conducted by Keating (1963), examined the
impact of laboratory-based versus classroom-based teaching of French. The findings showed
that classroom-based teaching achieved better results as opposed to laboratory teaching.
Similarly, Campbell and Stanely (1963) conducted the first true- and quasi-experimental
study on L2 program evaluation, which provoked the research-method debate for a long time.
During the 1970s, several program evaluation studies were conducted (e.g., Asher et
al. 1974; Bushman & Madsen, 1976; Gary, 1975; Levin, 1972; Postovsky, 1974; Smith,
1970; as cited in Beretta, 1992). These studies focused on manipulating the teacher variable
so that variations could be minimized. This variable was controlled by comparing teachers’
performance from different programs (Smith, 1970), comparing the performance of two
program taught by the same teacher (Bushman & Madsen, 1976; Postovsky, 1974), and
finally “[eliminating] teachers altogether and [replacing] them with tape-recorded lessons”
(Beretta, 1992, p. 11). During the 1980s, Tyler’s Model of Evaluation, developed in the
1940’s, was again used by program evaluators by comparing the desired outcomes with
92
achieved outcomes (Beretta & Davies, 1985; Prabhu, 1987; Wagner & Tilney, 1983; Wolfe
& Jones, 1982; as cited in Beretta, 1992).
During the 1990s, program evaluation literature witnessed an increasing focus of
studies addressing many areas of program evaluation. For example, Beretta (1992) examined
‘program-fairness’ of evaluation and concluded that there was a gap between program
evaluation theory and practice that needed to be addressed. Drawing upon this, Lunch (1996)
suggested that data-gathering techniques “provide information that makes sense and counts as
evidence” of program performance (p. 155). To achieve these two implications, standards of
the Joint Committee on Standards for Educational Evaluation (JCSEE) were reviewed in the
mid-1990s, which led to innovative and standard-based program evaluation (Patton, 1997). In
the late 1990s, several researchers advocated the incorporation of political, affective, and
cultural aspects into program evaluation processes, for language learning involves all these
aspects (Rea-Dickins, 1998). This resulted in including language changing environmental
factors into program evaluation, which can be found in Pennington’s (1998) work that studied
dynamic and interactive dimensions of language program evaluation than solely focusing on
quantitative data.
At the beginning of the 21st century, the epistemological and methodological tenets of
program evaluation have changed due to the impact of a wide range of political, economic,
and technological changes (Kiely & Rea-Dickins, 2005). This has resulted in spurring the
demand for compliance with mandates as a basis for language program evaluation.
Evaluation approaches such as self-study, peer review, key performance indicators, and
external audits have become key to verifying program’s compliance with mandates (Kiely,
2001), which have led to conducting language program evaluation for meeting “internal and
external commitments to quality assurance” (Kiely & Rea-Dickins, 2005, p. 52). In

93
conjunction with this mandate compliance, the culture of promoting and assuring quality,
albeit not new, has made language program evaluation key for accreditation purposes (Eaton
2006; Harvey 2004; Norris, 2009).
Evaluation Paradigms
Every evaluation approach is premised on certain theoretical underpinnings, which
makes it crucial at this point to understand how programs have been evaluated. Lynch (1996)
provides a broad discussion of a ‘paradigm dialog’ in social literatures dominated by two
paradigmatic camps: 1) positivistic view and 2) naturalistic view. Debates between these two
paradigms are based on “the epistemological basis of research” (Lynch, 1996, p. 13).
Advocates of the positivistic paradigm prefer traditional, quantitative, and experimental
approaches to conducting inquiry (Kiely & Rea-Dickins, 2005). This paradigm has generated
two key inquiry approaches: 1) true experiments, which assign participants randomly to
experimental and control groups to draw comparisons between them, and 2) quasi-
experiments, which compare the two groups without random assignment (Lynch, 1996).
Many researchers (Beretta, 1986; Long, 1983) criticized the positivistic paradigm, for it
merely focuses on “product or outcome rather than also attending to the process of how the
program was being carried out” (Lynch, 1996, p. 32).
On the other hand, the naturalistic paradigm “requires an emergent or variable design”
where evaluation takes place as the “evaluator proceeds to investigate the program setting,
allowing new information and insights to change how and from whom data will be gathered”
(Lynch, 1996, p. 14). This form of evaluation does not manipulate any conditions or
variables; rather, it observes, describes, and interprets how a program performs in real-life
contexts. In other words, a naturalistic inquiry is carried out in a program’s natural settings,
which is consistent with the naturalistic second language learning approach (Krashen, 1982).
94
Such an approach includes “observations, interviews, questionnaires, tests, and program
documentation” (Lynch, 1996, p. 82). Later, a combination of positivistic and naturalistic
approaches was applied to help us articulate “what counts as evidence for our evaluation”
(Lynch, 1996, p. 40).
Most early program evaluation studies have focused mainly on program design and
evaluation as key to examining “pedagogical innovations” in L2 language programs (Ross,
2003, p. 2). Later, a wide range of program evaluation approaches emerged, changing to
some extent the motives why language programs are evaluated. For example, during the mid-
and late-1990s, program evaluation shifted from merely improving program specific features
to focusing more on “large-scale educational accountability mandates” (Norris, 2009, p. 8).
This impetus for accountability was a result of the necessity of gaining more funds from
funding agencies (Brindley, 1998). Program outcome-based evaluation can therefore have
several motives including but not limited to responding to greater accountability needs,
maintaining quality assurance and control, obtaining academic accreditation, gaining higher
rankings, and so forth. Regardless of the purpose of program evaluation, it is prudent realize
that program evaluation provides valuable conclusions about a program performance as well
as the extent to which strengths and weaknesses should be supported and addressed
respectively.
Having identified different program evaluation paradigms, program evaluators should
then decide how program evaluation processes will be planned, implemented, and even
evaluated in a systematic way. To help program evaluators make a decision with this regard,
Brown (1995) argues that prior to conducting program evaluation, evaluators should first
consider the following six fundamental types of program evaluation:
1) formative or summative
95
2) involves external experts or internal participants
3) field or laboratory research
4) on-going or short-term evaluation
5) quantitative or qualitative data
6) process or product focused.
These illustrate some of the key conceptual and practical approaches to program evaluation
that program evaluators, administrators, teachers, practitioners, and external reviewers should
take into consideration.
Formative vs. Summative
Formative and summative evaluations are two distinct measures that have long been
used in the evaluation industry (Alderson, 1986; Bachman & Palmer, 1996; Black & William,
2009; Brown, 1995). Although formative evaluation was first introduced into educational
psychology literature by Scriven (1967), it was defined in detail by Bloom et al. (1971) as
“the use of systematic evaluation in the process of curriculum construction, teaching, and
learning for the purpose of improving any of these three processes” (p. 117). It is aimed at
improving a course, curriculum, or program as a whole through an ongoing evaluation
process. In other words, it refers to the process of collecting information and data that
contribute to making useful “decisions about a program while it is under development”
(Bachman & Palmer, 1996, p. 62). However, Bailey (2009) defines program level formative
evaluation as a form of appraisal that provides “feedback for program improvement” (p. 707).
It helps stakeholders gain periodic or ongoing guidance that helps them “adjust their activities
accordingly” (Mohr, 1995, p. 33).
In contrast to formative evaluation, summative evaluation refers to any form of
measure that “typically occurs at the end of a program or a funding period for a program”
96
(Bailey, 2009, p. 707). It is typically used to provide “evaluative conclusions for any other
reasons besides development” (Scriven, 1991, p. 21). Brown (1995) argues that summative
evaluations occur at the end of a program “to determine whether the program was successful”
(p. 228). This form of evaluation provides teaching practitioners with an inclusive summary
of the value of their program for future improvement. As for the study contexts, most of the
evaluations undertaken by the NCAAA are summative, while those conducted by program
participants are formative. A useful distinction of this dichotomy was suggested by Alderson
(1986) that “when the cook tastes the soup, that is formative. When the guests taste the soup,
that is summative" (p. 11).
External Experts vs. Internal Stakeholders
Program evaluation may involve bringing in external experts, internal participants, or
both. Some program administrators invite external experts to carry out their program
evaluation processes. For example, during one interview, the quality assurance coordinator of
the male EFL department under study indicated that they once invited a quality assurance
expert from another Saudi university to evaluate their learning outcomes. The NCAAA
requires Saudi higher educational institutions to bring in a group of external experts to help
them conduct the developmental self-study or the mock program review. According to Brown
(1995), external reviewers can provide “a certain amount of impartiality and credibility to the
results” (p. 232). Nevertheless, Alderson and Scott (1992) argue that outside evaluators “are
perceived by insiders as at least threatening to themselves and the future of their project, and
at worst as irrelevant to the interests and perspectives of the project” (26-27). Hence, they
suggest depending more on stakeholders, which is widely known in literature as the
Participatory Evaluation Model (Alderson & Scott, 1992).
97
The Participatory Model includes involving stakeholders in program planning,
implementation, and evaluation processes (Alderson & Scott, 1992). This model helps
provide a dynamic and spectral analysis of program performance by obtaining stakeholders’
different perspectives about the effectiveness of their program. One of the main benefits of
this model is that it helps “develop greater insights not only about the roles that stakeholders
may play in evaluations, but also how learning may take place” (Kiely & Rea-Dickens, 2005,
p. 201). Nonetheless, to overcome any potential bias, Alderson and Scott (1992) suggest that
a consultant be involved in the evaluation processes to provide guidance rather than
judgment. Moreover, Ross (1992) recommends engaging teachers in the evaluation processes
to reduce their anxiety of “being watched” and to promote their roles as “both practitioner
and observer” (1992, p. 172).
Field vs. Laboratory Research
Another aspect that program evaluators should be aware of is whether the evaluation
processes will be undertaken through field or laboratory research. Beretta (1986, p. 296)
defines field evaluation as a “long-term, classroom-based inquiry” aimed at identifying the
effectiveness of a program through an expansive evaluation of its components (curriculum,
teaching strategies, and assessment procedures). In contrast, laboratory research refers to a
short-term evaluation that often “involves the testing of individual components of a theory in
an environment in which extraneous variables are artificially held constant” (Beretta, 1986, p.
296). Reliance on either form depends on the purpose of the evaluation (Brown, 1995).
Alderson and Scott (1992) argued that some program evaluators prefer a field evaluation to
obtain an overall image of a program, while others prefer to conduct a small-scale laboratory
evaluation to avoid wasting time.
Ongoing vs. Short-term Evaluations

98
Similarly, program evaluation can also take the form of ongoing or short-term
evaluations depending on the evaluation purpose and context. According to Brown (1995),
previous studies of program evaluation were primarily longitudinal in that they were
conducted during the implementation of a program. On the other hand, short-term evaluations
(often called after-program evaluations) are carried out at the end of a program in an attempt
to evaluate the program outcomes in a retrospective manner. Some program evaluation
researchers pay substantive attention to the content of the program by observing student
performance constantly (on-going evaluation), while others seek to look for the final output
(short-term evaluation). Brown (1995) suggests that a combination of both ongoing and
short-term evaluations be integrated “during the program, immediately after it, and in a
follow-up as well” so that all potential aspects of the program can be covered effectively (p.
233).
Qualitative vs. Quantitative
Program evaluation data can be gathered and analyzed qualitatively, quantitatively, or
a combination of both. Qualitative methods include observations, interviews, journal entries,
questionnaires, and so forth (Lynch, 1996). Advocates of qualitative approaches adopt the
naturalistic design, which depends mainly on data interpretation (interpretivist perspective)
(Lynch 1990, 1992; McKay, 1991; Rea-Dickens, 1999). Qualitative approaches are useful for
providing “the best array of information types” allowing program evaluators to gain broader
and deeper insight into program processes (Brown, 1995, p. 234). However, such approaches
can be problematic, for data collection and analysis are time-consuming and may include
multi-faceted themes, an issue that encourages many program evaluators to rely more on
quantitative methods.
99
On the other hand, some researchers choose to collect and analyze data quantitatively
(positivist perspective). Brown (1989) defines quantitative data as any form of information
“gathered using those measures which lend themselves to being turned into numbers and
statistics” including course grades, test scores, accountability criteria, faculty qualifications,
ratio of students per teaching faculty and so forth (p. 231). One of the benefits of quantitative
data in program evaluation is that they have been seen as “easier to gather, and more
amenable to analysis and summary” (Alderson & Scott, p. 53). Another benefit is that they
can help alleviate as much stakeholders’ bias as possible, as rarely is there any form of
subjective analysis (Ross, 2003). Nevertheless, recent program evaluation studies have
incorporated eclectic approaches (i.e., measurement, non-measurement) into program
evaluation to provide multiple perspectives of how learning processes are undertaken
(Alderson & Beretta, 1992). According to Ross (2003), this diversity of approaches can help
us “yield richer contextualized data about program processes as well as empirical data about
outcomes” (p. 3).
Process vs. Product
Program evaluators should also adopt a process-based, product-oriented evaluation, or
both. Brown (1989) defines product-oriented evaluation as an evaluation verifying if “the
goals (product) of the program were achieved,” as opposed to process-oriented evaluation
that focuses mainly “on what it is that is going on in the program (process) that helps to arrive
at those goals (product)” (p., 231). During the 1990s, as denoted earlier, program evaluation
was focused more on process-oriented methodologies (Lynch, 1996). That is, program
evaluation advocates called for examining how learning outcomes are achieved effectively
(Beretta, 1986b; Burstein & Guiton, 1984; Campbell, 1978; Elley, 1989; Lynch, 1990, 1996;
Ross, 2003). Process-oriented evaluations can provide internal stakeholders (e.g., students,
100
teachers, administrators) and external stakeholders (e.g., accreditation agencies, funding
agencies) with detailed information about their program to enhance its performance and to
respond to accountability (Gredler, 1996).
Having reviewed how programs were evaluated in different decades, it is crucial to
identify the different motives of educational institutions for evaluating their programs. Over
time, according to Thomas (2003), program evaluation has changed dramatically to include
other considerations such as accreditation, quality assurance, external review, and so forth in
addition to curriculum development, accountability, the value and worth of programs, and
fund gaining (as cited in Kiely & Rea-Dickins, 2005). Hence, purposes for program
evaluation differ depending on the expectations of the stakeholders of the program being
evaluated. That is, program evaluation can be sought for different purposes including
developing “a thorough understanding of the program”, obtaining “information for
organizational development”, and drawing “valid generalizable conclusions” about the
program being evaluated (Posavac, 2015, p. 23). Most of these purposes lead to quality
assurance and accreditation goals.
Program Evaluation through Quality Assurance
During the 1990s, quality assurance became a systematic and comprehensive
component of program evaluation (Van Damme, 2004). The term quality assurance refers to
a systematic evaluation process through which a program’s learning outcomes are evaluated
to ensure that they meet specific predetermined standards (McNaught, 2009). According to
Halliday (1994), quality assurance was first sought in educational contexts during the 1980s
“as a response to political demands for institutional accountability” (p. 36). The term quality
assurance is “derived partly from manufacturing and service industry, partly from health
care” and then was pervasively integrated into educational contexts (Ellis, 1993, p. 3). Norris
101
(1998) listed six approaches to curriculum evaluation of social programs that help manage
quality assurance and comply with mandates, namely “1) Experimentation, 2) Measurement
of Outcomes, 3) Key Performance Indicators (KPIs), 4) Self-study Reports, 5) Expert Review
and 6) Inspection” (p. 209).
Gosling and D'Andrea (2001) went beyond quality management to a higher phase that
they call Quality Development, which is “an integrated educational development model that
incorporates the enhancement of learning and teaching with the quality and standards
monitoring processes in the university” (p. 11). The premise of this model is to assure total
quality of program curriculum rather than solely focusing on achieving other goals, for
example, accreditation. Thus, various new initiatives have been introduced into quality
assurance literature including total quality, quality control, trust, and so forth (Sallis, 2002).
Research reveals many approaches to assuring quality such as self-report (Ellis, 1993), needs
analysis (Richard, 2001), SWOT analysis (Dyson, 2004), benchmarking (Ellis & Moore,
2006), and site visit (Norris, 2009). Moreover, one of the purposes of assuring quality is to
obtain accreditation, a process by which institutions, universities, and programs are validated
(Pennington & Hoekje, 2010).
Accreditation-based Program Evaluation
Accreditation is another essential measure of quality assurance that “leads to the
formal approval of an institution or program that has been found by a legitimate body to meet
predetermined and agreed upon standards” (Van Damme, 2004, p. 129). Accreditation is a
self-regulated process of “external quality review used by higher education to scrutinize
colleges, universities, and educational programs for quality assurance and quality
improvement” (Council for Higher Education Accreditation, 2002, p. 1). That is, it entails
evaluating the extent to which certain standards are met by an institution or program.
102
According to Eaton (2006), accreditation functions as “a reliable authority on academic
quality” whereby quality assurance, federal and state funds, accountability, etc. can all be
maintained effectively (p. 3). There are two types of accreditation in higher education:
institutional accreditation and programmatic accreditation. The former provides an institution
with “a license to operate”, while the latter accredits programs “for their academic standing”
(Harvey, 2004, p. 208).
Many accreditation bodies stress “the role of evaluation within institutions,
departments, and programs” to assure quality in their programs (Norris, 2009). The
exponentially growing demands for accountability, outcome-based education, and quality
assurance have placed substantial emphasis on accreditation. Consistent with this,
postsecondary institutions in many developing countries, including Saudi Arabia, have begun
to seek “to participate and compete in the global economy” in order to ensure that their
programs meet predetermined accreditation standards (Smith & Abouammoh, 2013, p. 104).
In fact, some Saudi universities have imposed strict systems that lead their programs to meet
national accreditation standards. For example, in 2009, King Saud University imposed
probation on its programs, “and unless they satisfy the NCAAA standards by September
2012, they would be shut down” (Shawer, 2013, p. 2884).
Quality Assurance and Accreditation in Saudi Arabia
Education in Saudi Arabia has been undergoing drastic reforms that have resulted in
extraordinary improvements. These improvements were associated with several initiatives,
starting from introducing accreditation-based evaluation in 2004, launching the King
Abdullah Scholarship Program in 2005, establishing more governmentally funded
universities in 2007, and ending with sponsoring several private higher education institutions
in 2010 (Alamri, 2011). This has led the Saudi Ministry of Education to implement quality
103
assurance in higher education institutions. Nonetheless, according to Darandari et al. (2009),
“there was no quality system for higher education at the national level in Saudi Arabia”
before 2004 save some institutional endeavors (p. 40). Since that time, the NCAAA has
become the official government accreditation body that governs national quality assurance
and accreditation standards to which many public and private postsecondary institutions
should adhere in order for them to obtain academic accreditation.
NCAAA
The National Commission for Academic Accreditation and Assessment (NCAAA)
was established in 2004 to oversee quality assurance and accreditation processes in Saudi
Arabia. Although it is a governmentally funded agency operating under the Ministry of
Education, the NCAAA is an independent body that accredits Saudi postsecondary
institutions (NCAAA, 2013). The mission of the NCAAA is “to encourage, support, and
evaluate the quality assurance processes of postsecondary institutions and the programs they
offer” (NCAAA, 2014). In 2005, it became a member of the International Network for
Quality Assurance Agencies in Higher Education (Darandari et al., 2009). The quality
assurance and accreditation trends in conjunction with the expansion of higher education
have revitalized program evaluation in Saudi Arabia.
The Saudi higher education system has recently witnessed a significant expansion
(Smith & Abouammoh, 2013). As of 2015, there are 26 government universities, 18 male and
80 female Primary Teacher Colleges, 37 Health Colleges and Institutes, 12 Technical
Colleges, and 49 Private Universities and Colleges (MOHE, 2015). Although obtaining
accreditation is not mandatory in Saudi Arabia, nor does it impact government-based funding,
many Saudi universities seek to obtain accreditation from the NCAAA. According to the
104
Saudi Ninth Development Plan (2014), accreditation is a national strategic dimension of the
policies of the country (Ministry of Economy and Planning, 2015). Hence, all universities are
required to restructure their academic programs by closing those that do not fulfill job market
needs and promoting those that do. Moreover, no university is allowed to establish new
programs unless high employability rates relating to these programs can be obtained. Thus,
many universities tend to customize their current programs to meet job market needs. One of
the most effective and highly recommended approaches to do so is by obtaining national
accreditation from NCAAA.
Quality assurance and accreditation are therefore a key trend to shaping the future of
Saudi higher education (Darandari et al., 2009). Since quality assurance processes, as
Mercado (2012) noted, are better performed by internal reviewers first (Emic perspective),
this paper serves as a simulated evaluation of two EFL departments in a Saudi university as
an initial step for assuring quality and thereby obtaining academic accreditation. Therefore, in
coordination with the vice-rector and the heads of the two departments, the researcher
conducted a site visit to the two departments to evaluate their quality assurance processes. It
is also hoped that this paper will be the first step toward promoting language program
evaluation in Saudi Arabia in order to assure quality and obtain academic national and
perhaps international accreditation.
METHODOLOGY
The primary motive for conducting this study is to guide two EFL departments
through preliminary program evaluation processes using an integrated model from
accreditation standards of NCAAA and CEA commissions that meet the nature, needs, and
purposes of the 18 EFL departments. The quality assurance practices of two departments (one
for male students and the other for female students) were evaluated to serve as internal
105
benchmarks for the remaining 16 departments to help their administrators assure quality and
obtain accreditation from NCAAA and CEA. Any inapplicable or unnecessary standards
were excluded (e.g., marketing, housing, and others). To evaluate these two EFL departments
effectively, the researcher made a site visit to X University where these departments are
located. The researcher circulated surveys to male and female students, teaching staff, and
department heads, and he also conducted semi-structured interviews with a randomly selected
sample of participants.
Background on Site
X University has 18 male and female EFL undergraduate departments, which are
planning to obtain departmental accreditation from the NCAAA in the next two years and
perhaps from the CEA in the near future. Both the departments have a four-year, full-time
education program divided into eight levels and preceded by a one-semester pre-university
intensive English language program. The total credit hours required for graduation are 137,
22 of which are non-major courses. The departments have two main tracks: 1) EFL and
Translation and 2) EFL and Literature. These two departments were selected in particular
because they are the largest male and female EFL departments at X University, so they can
function as internal benchmarks for their comparable EFL department counterparts at X
University.
Research Objectives
The research objectives of this paper vary but fall under one overarching goal. First, they
aim to obtain stakeholders’ (students, faculty, and department administrators) opinions about
the effectiveness of the two departments being studied. Second, they attempt to identify the
extent to which the two departments meet the integrated standards of NCAAA and CEA in
order to prepare them to obtain academic accreditation from these two commissions. Third,
106
they pinpoint both positive and potentially poor quality assurance practices in the two
departments. Fourth, they identify potential dilemmas and barriers that may delay the two
EFL departments from obtaining academic accreditation from the NCAAA and CEA. Fifth,
they provide some implications that will contribute to helping the two EFL departments
maintain quality assurance and thereby obtain academic accreditation. Finally, the researcher
will provide a concise report describing quality assurance processes of these two EFL
departments. It is hoped that the report will be useful not only for EFL departments at X
University, but also for any similar EFL departments that plan to obtain national or
international academic accreditation.
Research Questions
1. To what extent do the two departments meet the study-integrated model of standards?
2. To what extent are students satisfied with the two departments’ curricula, teaching
strategies, learning outcomes and assessment methods?
3. What are good quality assurance practices in the two departments?
4. What are poor quality assurance practices in the two departments that need to be
improved?
5. Are there any potential dilemmas and barriers that may delay the two EFL
departments from obtaining academic accreditation? If so, what are they?
Participants
The participants of this study are divided into two groups, male and female. Each
group is subdivided into three subgroups: teaching staff, students, and administrators. The
first group consists of a randomly selected sample of teaching faculty, students, and the
quality assurance coordinator of the male EFL department. The second group consists of a
randomly selected female group of the same categories from the female department. The first
107
subgroup of participants is male and female undergraduate students majoring in EFL. Their
levels of English proficiency, which were identified based on their current levels at the two
departments, range from beginner to advanced. Two hundred twenty seven students studying
in the EFL departments participated in the survey, and 15 of them (six male and nine female)
were interviewed. The second subgroup consists of two male and four female teaching
faculty. The third subgroup consists of one male quality assurance coordinator and one
female academic coordinator.
Research Tools
The central research tools used in this study included a web-based survey (Appendix
M), semi-structured interviews (Appendix N), and the researcher’s evaluation checklist
(Appendix O). The survey consisted of six themes: general information (4 items), course
objective (10 items), teaching strategies (10 items), student learning outcomes (10 items), and
student assessment (10 items). There were also three different semi-structured interviews: 1)
student interviews, 2) teaching faculty interviews, and 3) administrator interviews. The third
research tool was the evaluation checklist (Appendix O), which was used to evaluate the
mission, curricula, objectives, and goals of the two departments. Survey items, semi-
structured questions, and evaluation checklist were all designed based on the standards and
substandards of the integrated model of NCAAA and CEA to ensure that the evaluation
emulates those used by the two commissions.
Integrating NCAAA and CEA Standards
It is relatively difficult and perhaps time-consuming to integrate all NCAAA and CEA
standards in Table 12 into one model; thus, only standards suiting the Saudi higher education
contexts were included. Moreover, this paper focuses on the standards that address student
learning experience and ignores those that deal with administrative and/or financial matters.
108
Hence, 1) Mission, Goals, and Objectives, 2) Learning and Teaching, and 3) five sub-
standards of Learning and Teaching standard were selected from NCAAA standards.
Similarly, 1) Mission, 2) Curriculum, 3) Program Development, Planning, and Review, and
5) Student Achievement were selected from CEA standards. The integration process also
included the sub-standards of the selected standards, as it is shown in Table 13 below. One
noteworthy issue is that meeting the integrated model of standard does not mean in any case
that the two departments meet all of the CEA and NCAAA standards; rather, it means that
they appear to meet those in the model.
109
Table 12
NCAAA and CEA Standards
NCAAA CEA
Mission Goals and Objectives Mission
Learning and Teaching Curriculum
Management of Quality Assurance Faculty
Facilities and Equipment Facilities, Equipment and Supplies
Financial Planning and Management Administrative and Fiscal Capacity
Student Administration and Support Services Student Services
Employment Processes Recruiting
Learning Resources Length and Structure of Program of Study
Governance and Administration Student Achievement
Research Student Complaints
Table 13
Integrated Model of Standards and Sub-standards
NCAAA CEA
Mission Goals and Objectives Mission
Learning and Teaching Curriculum
Sub-standard 4.2: Learning Outcomes

Student Achievement
Sub-standard 4.5: Student Assessment
Sub-standard 4.3: Program Development

Processes
Program Development, Planning, and Review
Sub-standard 4.4: Program Evaluation and
Review Processes
Procedures
110
Data were collected circulating surveys and by conducting semi-structured interviews
with a randomly selected sample of the two departments (i.e., students, teaching faculty,
program quality assurance coordinator). That is, having visited the male department, the
researcher attended some classes, where he was allowed to spend time with students to
collect data. Although the survey was delivered via Qualtrics, online survey software, the
participants did not have access to the Internet. At first, the researcher thought a technical
glitch occurred for some reason; nevertheless, it turned out that the department did not have
Internet access. Thus, the researcher had to print 300 copies of the survey. Then, students
were provided with a detailed description of the study’s purpose and procedure, and the
survey was explained to them in order to clarify any potential confusion. Nonetheless, for
ethical considerations, students were given the option of completing the survey via hard copy
or online using their phones.
After that, a randomly selected sample of male students was engaged in semi-
structured interviews to allow them to expand on their attitudes towards the department. Each
interview session took approximately 20-25 minutes. By randomly choosing some students
from each level, the researcher ensured that the interviewed students had different levels of
proficiency to obtain a broader insight into the department. Moreover, the mission, goals,
curriculums, teaching strategies, assessment methods, and learning outcomes of the two
departments under study were evaluated using the evaluation rubric that has been designed
based on the integrated standards model (See Appendix O). As for the female-section
department, the researcher was unable to interview female participants face-to-face due to
cultural constraints; thus, a female faculty was assigned by the department head to carry out
the interviews under the researcher’s supervision.
111
Data Analysis
Data were analyzed systematically and qualitatively as carried out by Mitchell (1989)
for her evaluation of the language program at the University of Stirling, Scotland. That is,
responses of all participants were used as evidence for exploring the extent to which the two
departments meet the target standards and then to provide a comprehensively descriptive and
explanatory report about the two departments’ performance. Based on participants’
responses, all potential themes, explicit or implicit patterns, or emerging trends were sought.
For example, all answers obtained in the second round of data collection (interviews) were
transcribed and then analyzed to identify themes using comparative techniques against CEA
and NCAAA standards. The findings of this study were divided into four major areas: 1) the
two departments’ adherence to the integrated model of standards, 2) student satisfaction with
the departments, 3) good and poor quality practices of the two departments, and finally 4) any
barriers delaying the departments from obtaining academic accreditation.
FINDINGS
The extent to which the practices of the two departments meet the CEA and NCAAA
standards was examined by evaluating them against the study-integrated model of standards.
Standard One: Mission/Goals/Objectives
Research has documented several definitions of a program’s mission statement,
defined by Allen (2004) as a “holistic vision of the values and philosophy” of an institution or
program (p. 28). According Kiely and Rea-Dickens (2005), articulating a mission is key to
organizing a program’s activities and for conforming to institutional norms. Mission
statements are "ubiquitous in higher education” in that “accreditation agencies demand them,
strategic planning is predicated on their formulation, and virtually every college and
university has one available for review” (Morphew & Hartley, 2006, p. 456). Conway,
112
Mackay and Yorke (1994) provide six criteria that a mission statement should fulfill: 1)
institutional mission, 2) target groups, 3) goals and objectives, 4) programs offered, 5)
program context, and 6) values sought by the program over other similar programs.
Evaluating the Mission Statements of the Two EFL Departments
As discussed earlier, the mission statement plays an integral role in determining how a
program operates within a previously determined set of standards. Therefore, both the CEA
and NCAAA commissions emphasize the importance of developing a publicly announced
mission statement that can be accessed by stakeholders at anytime and from anywhere.
Moreover, the CEA mission standard states that a program should have “a written statement
of its mission and goals, which guides its activities, policies, and allocation of resources”
(CEA, 2015, p. 7). Surprisingly, the two EFL departments under investigation have the same
mission statement in writing, which also appears to have areas of weaknesses. For instance, it
was derived from the mission of the college under which the two departments operate (See
Appendix P).
To better evaluate the departments’ mission statement, it is necessary to examine the
college’s mission, which functions as the umbrella for the two departments’ mission.
Although the college’s mission is posted on the college’s website, which is adequate for
meeting one aspect of the NCAAA sub-standard, “Stakeholders should be kept informed
about the mission and any changes in it” (NCAAA, 2013, p. 8) and CEA main standard, “The
mission should be communicated to faculty, students, and staff, as well as to prospective
students, student sponsors, and the public” (CEA 2015, p. 7), it appears to be too wordy. This
may make it difficult to be understood by stakeholders and evaluators. Moreover, it does not
specify the college’s activities, policies, and allocation of resources. Rather, it only describes
its future graduates’ attributes as teachers in humanities. As a result, it would not be

113
surprising that the mission statement of the two departments, though not wordy, is also not
specific enough in summarizing the departments’ educational and service goals.
Paradoxically, the Arabic version of the college’s mission is also posted on the same
website, yet with an entirely different meaning as follows:
Preparing nationally qualified graduates in languages and humanities to cope with
sustainable development, achieving educational and research development by means
of modern technological and scientific advances to mobilize the community towards
the local, national, and international scientific and social fields. (college’s website,
2015)
This inconsistency may create some ambiguity for stakeholders. For example, when the
researcher first visited the college’s website, he was confused as to which mission, the
English or the Arabic one, guides the college’s goals, policies, and allocation of resources, as
each mission has a completely different purpose. Hence, one of these two mission statements
should be selected, and the other one should be removed.
The mission of the two departments (Promoting the overall performance of learners of
language, literature, and translation to achieve the possibly highest quality of learning
outcomes) is posted on the college’s website as well. Nevertheless, having examined the
department specifications report, it was noticed that the two departments have a different
mission, namely, “Preparing highly qualified cadres with skills and expertise in English
language, translation, and literature, and well-trained educated researchers who follow the
scientific approach in thinking and in dealing with technology, multi-faceted thinking, and
problem solving” (Department specifications report, 2015). This mission seems to have been
written clearly as required by CEA and NCAAA. Moreover, it includes “the elements
114
commonly understood to form the basis for a higher education mission” (Morphew, &
Hartley, 2006, p. 458).
The departments’ mission statement appears to fulfill only two criteria of those
mentioned by Conway et al. (1994); it highlights the departments’ goals and objectives, and it
includes the programs offered (i.e., literature, & translation). However, it neglects the
remaining four criteria: 1) relevance to the college mission (though it is almost aligned with
that of X University), 2) the target groups (EFL students), 3) the departments’ geographical
areas, and 4) the added value sought by the departments over other similar EFL departments.
Another drawback is that the mission statement does not have a controlling idea. That is,
when compared with the mission statement from a comparable department, for example, the
Department of English at King Abdulaziz University, which states “Building an interactive
learning environment for the rehabilitation of graduates in English language and translation
able to meet society's expectations for a promising future” (Department website, 2015), it is
clear that the latter has a controlling idea (creating an interactive English learning
environment), whereas the former does not. Furthermore, unlike that of X University, King
Abdulaziz University’s mission statement is developed in a suitable breadth.
Taking the above into account, the two departments’ mission statement fail to
provide a sense of the current identity of X University. In addition, based on data obtained
from the interviews, 11 out of 15 of the students asserted that they have no idea what the
mission statement of these two departments is. More importantly, data showed that this
mission statement was updated only once in the past four years. This may impede the mission
from reflecting “new goals or shifts in the focus of [their] educational programs or services
and whenever activities and policies are conceived, implemented, or revised” (CEA, 2015, p.
7). All of these factors suggest that this mission statement appears to only partially meet the
115
Mission Standard of CEA and NCAAA. A poorly phrased mission statement would make it
difficult to meet other standards such as ensuring alignment of the curriculum with the
mission, which is considered problematic for the CEA Curriculum Standard.
Evaluating the Goals and Objectives of the Two EFL Programs
Since CEA does not have a separate standard for goals and objectives, NCAAA
standards were used to identify whether the goals and objectives of the departments meet
these standards. The goals and objectives of the two departments are almost identical and can
be summarized as preparing students to become seasoned EFL teachers, professional
translators, competent linguists, and excellent researchers. These are clear, attainable, and
“consistent with and support the mission” (NCAAA, 2013, p. 8), which mostly fulfill the
NCAAA sub-standard (Relationship between Mission, Goals & Objectives). However, the
other objectives, such as training students on computer-assisted language learning, preparing
them for seeking higher studies, and developing their presentation and rhetorical skills are not
derived from the mission of the departments. That is, the mission does not indicate a desire to
instill in the students a sense of providing community service, which seems to be irrelevant to
the two departments’ mission.
Students’ Attitudes towards Course Objectives and Requirements
One of an instructor’s main duties is to ensure that students are fully aware of course
objectives (Allen, 2004). Out of the 202 participants who responded to this item, 13 (6%)
students indicated that they strongly agree that most of the departments’ course objectives are
communicated to them at the beginning of each course, while 95 (47%) of them agreed with
this item. On the other hand, 22 (11%) of the participants disagreed, and 12 (6%) strongly
disagreed with this statement, suggesting that only a small number of the students are
dissatisfied with the poor communication of course objectives or lack of them. However, 60
116
(30%) of the participants were uncertain (neither agree nor disagree) about the delivery of
course objectives at the beginning of the course. This high percentage poses serious questions
as to why these participants would be uncertain about whether course objectives were
communicated to them earlier. This can be attributed to several factors detailed in the
following discussion.
Moreover, some of these participants are likely to have “no clear idea of what the
intended outcomes of the course are,” and thus ending up being confused about the ultimate
goal of the course itself (Menges, Weimer, & Associates, 1996, p. 188). Another bewildering,
albeit reasonable, question is why are these students unsure about being informed of these
objectives in advance if, as Allen (2004) argues, articulating course objectives on a course
syllabus “allows students to make informed decision before enrolling, to monitor and direct
their own learning, and to communicate what they have learned to others, such as graduate
schools, employers, or transfer institutions” (p. 44)? To obtain further insight into this issue,
some of the participants were asked during interviews about their uncertainty. Seven out of
15 argued that they are rarely provided with written syllabi that highlight course objectives.
In fact, one of the female participants asserted that the she has never been given a syllabus in
writing.
Standard Two: Curriculum
The second and one of the key standards of CEA is curriculum, which refers to the
goals, methods, and assessment through which an academic program achieves its targeted
instructional objectives (Nation & Macalister, 2009). As for NCAAA, the word curriculum is
not mentioned in their handbook; rather, all curriculum components are distributed across
various standards and sub-standards. To better evaluate the curricula of the two EFL
departments, it is easier to follow CEA classification of curriculum as: 1) an educational goal

117
or purpose, 2) objectives for each course in the curriculum, 3) statements of student learning
outcomes, 4) processes for teaching and learning, and 5) means of assessment. Nonetheless,
the educational goal and purpose as well as the objectives for each course in the curriculum
have been discussed in the Mission, Goals, and Objectives standard, so we are left with the
remaining three components.
The two departments do not have a curriculum in writing, per se; instead, they have a
Study Plan and Program Specifications from which adequate information about the
curriculum components can be obtained. Having compared these two documents against CEA
and NAAA standards, there were some interesting findings. The departments’ study plan is
fairly consistent with their mission. That is, it has “adequate and appropriate levels to meet
the assessed needs of the students through a continuity of learning experiences” (CEA, 2015,
p. 9). Moreover, it has several strengths in that it is designed with a logical progression from
one level to the next through eight levels. It also has consistent objectives across courses and
levels in a way that is appropriate for students’ needs. Nevertheless, one of the drawbacks of
this study plan is that it is not well-organized, nor does it allow evaluators and stakeholders to
monitor and document students’ performance. Despite this, the study plan of the two
departments appears to partially meet the standards with further improvements needed.
Student Learning Outcomes
Student learning outcomes (SLOs) are well-developed statements of “what a learner is
expected to know, understand and/or be able to demonstrate at the end of a period of
learning” (Adam, 2004, p. 2). They are “broad statements of what is achieved and assessed at
the end of a course of study” (Harden, 2002, p. 155). Although SLOs of the two departments
are specified, measurable, attainable, aligned with each other, and relatively consistent with
course objectives, they do not “represent significant progress or accomplishment” (CEA,

118
2015, p. 10). This is because they do not reflect various levels of students’ cognitive,
affective, and interpersonal skills as required by NCAAA. For example, labeling
“autonomous and collaborative learning” and “self-confidence and the responsibility of
directing the classroom management” as cognitive and interpersonal skills respectively is
inadequate for meeting the NCAAA standards.
Figure 3. Student learning outcomes cover all language skills.
However, what concerns the researcher is the extent to which SLOs cover all
language skills. According to students’ responses (Fig. 3), out of 115 participants who
responded to this item, 12 strongly agreed that SLOs cover all language skills, 41 agreed, 26
were uncertain, 30 disagreed, and 6 strongly disagreed. This suggests that SLOs of the two
departments need to be reviewed. For example, three out of six teaching staff participating in
this study indicated that SLOs do not appear to cover all language skills. Based on these
findings, SLOs practices appear to partially meet the standards of both the CEA and NCAAA
due to the abovementioned considerations.
Teaching Strategies
Among various NCAAA standards, Learning and Teaching standard is the largest
standard in length, suggesting that it plays a vital role in helping an institution or program
obtain academic accreditation. Therefore, both students and teaching faculty were asked
about the practices of this standard extensively. The graphs below (Figs. 4-7) show students’
119
responses to four key items of the Teaching Strategies section, which were derived from the
NCAAA fourth standard:
Figure 4. Teaching strategies are proper for various learning styles. Figure 5. Instructors communicate with you in English.
Figure 6. You are engaged presentations & leading discussions Figure 7. You are engaged in research projects.
Moreover, five students strongly agreed, 46 agreed, 23 were uncertain, 30 disagreed,
and 12 strongly disagreed that teaching strategies used in the two departments are proper for
their different learning styles. Yet, the participants complained that rarely do teaching faculty
speak English. Moreover, they stated that due to the large number of students per class, they
are not given the opportunity to demonstrate their learning. On the other hand, their responses
to the extent to which they are engaged in research projects varied in that 10 strongly agreed,
38 agreed, 40 were unsure, seven strongly disagreed, and 21 disagreed. Based on these
results, it can be concluded that teaching strategies at the two departments appear to partially
meet the standards.
120
Assessment Methods
Effective assessment methods help faculty make “informed decisions about
pedagogical or curricular changes” in addition to improving “student mastery of learning
objectives that faculty value” (Allen, 2004, p. 13). Thus, it is crucial to ensure that program
assessment methods are “for learning (i.e., formative in nature) rather than of learning (i.e.,
not summative, as in conventional tests) (Cumming, 2009, p. 93). Despite this, two out of six
teaching faculty indicated that they only use direct assessment tools (e.g., midterm/final
exams, homework). Lack of using other forms of assessment (e.g., performance assessment)
may not help identify students’ strengths and weaknesses (McNamara, 1996). When asked
about the infrequent use of alternative assessment methods, one faculty member noted that
they have “a set schedule to follow,” which does not give them the flexibility to diversify
assessment methods to meet students’ learning differences.
Figure 8. Are assessment methods communicated in advance?
One of the crucial good practices of the CEA and NCAAA assessment substandards is
to inform students of assessment methods in advance. The two departments have achieved a
noticeable progress in this respect. For example, out of 112 students who responded to this
item, 10 students strongly agreed and 34 agreed that assessment methods are communicated
to them in advance. Nevertheless, only 16 students disagreed and 12 strongly disagreed with
this statement, which can be considered a small percentage, yet it still needs to be addressed.
121
Moreover, course assessment methods are set out in student handbook distributed to students
at the orientation day.
Standard Three: Student Achievement
Since assessment methods are closely related to CEA Student Achievement Standard,
it may be better to shed some light on the practices of the two departments relating to this
standard. First, CEA requires that assessment practices be undertaken consistently throughout
the program and aligned with its mission, curriculum, objectives, and student learning
outcomes. As shown in Figure 9, students revealed different, yet profound insights into
assessment methods used in the two departments. That is, 16 out of 186 students strongly
agreed that assessment practices are used consistently, 57 agreed, 38 disagreed, and 19
strongly disagreed. One perplexing finding is that almost 56 participants were unsure about
this, which has raised the researcher’s curiosity to know why. After being asked about their
uncertainty, eight of these students argued that even though assessment methods are
undertaken consistently, they are limited to quizzes, a midterm, and a final exam. They
suggested that assessment methods be diversified so that their understanding and mastery of a
course can be examined effectively.
Figure 9. Are these methods used consistently?
As for NCAAA standards, students’ responses to the assessment methods section suggest that
assessment practices in the two departments do not appear to meet the standards. For
122
example, the majority of the participants indicated that they are either unsure (68), disagree
(32), or strongly disagree (10) that assessment techniques measure aspects on which they
expect to be measured, grade distribution is appropriate to course objectives, and exams and
assignments are graded and returned to them promptly, all of which imply that this standard
is not being met.
Standard Four: Program Development, Planning, and Review
As program evaluation processes can spark investigation across the program, this
form of investigation helps language program administrators document their program
performance, taking into account other considerations such as “the social and political basis
and motivation for language learning and teaching (Lynch, 1996, p. 10). Both the CEA and
NCAAA place extensive emphasis on program evaluation. The last standard of CEA
standards is Program Development, Planning, and Review, while in NCAAA it comes in a
form of two sub-standards of the fourth standard Learning and Teaching as Program
Development Processes and Program Evaluation & Review Processes. The departments’ data
about this standard obtained from teaching faculty did not reveal any findings worth-
mentioning because their responses focused on two main themes: 1) lack adequate training in
applying quality assurance processes, and 2) inability to participate in quality assurance
processes effectively due to their heavy teaching loads (23 hours per week).
Quality Assurance Coordinator of the Male-section Program
In order to gain further insight into the two departments’ evaluation practices, one of
the teaching or administrative staff involved in program evaluation quality assurance
processes was interviewed apart from the department heads. Therefore, this section will be
allocated for discussing the interview responses obtained from the coordinator of the quality
assurance unit, John, with whom the researcher recently spoke using Line, a widely-used
123
communication application. The interview with John, who has been the quality assurance
coordinator for the male student department for three years, took nearly 30 minutes. He
shared his experience in taking over the department’s quality assurance processes. He
indicated that during the past three years, quality assurance evaluation and processes were
unsatisfactorily slow for several reasons to be discussed below.
John indicated that the department does not have a plan in writing for planning,
implementing, or evaluating its performance and quality assurance practices. Instead, he
stated, that the NCAAA handbook has been used as a plan for the department evaluation
processes. However, the NCAAA requires that all academic programs and/or departments
have a plan in writing for managing all “quality assurance processes across the [program],
and the adequacy of responses made to evaluations processes that are made” (NCAAA, 2013,
p. 18). The lack of such plan has resulted in not setting out a route for reviewing the
departments’ components such as teaching strategies, assessment practices, learning
activities, policies, and services. To address this issue, a quality assurance unit was
established at the two departments in 2013. Although this unit takes on all responsibilities of
department evaluation and quality assurance processes, it still does not comply with the
NCAAA standard and sub-standards of Administration of Quality Assurance.
Although all teaching and other staff are engaged in program evaluation processes,
there is a lack of quality assurance and program evaluation literacy. According to the male
department annual report (2014), teaching faculty’s participation in department evaluation
processes were solely limited to filling in the NCAAA course reports without involvement in
department planning, implementing, and monitoring of overall department evaluation. John
complained about the lack of training teaching faculty on quality assurance processes.
Furthermore, John described a lack of effective cooperation with other departments affiliated
124
with the same college regarding exchanging experiences about quality assurance processes.
Another issue concerns the instructional staff’s teaching and administrative workloads. For
example, John is responsible for teaching 23 hours per week in addition to supervising
quality assurance and examination processes. This does not meet the NCAAA substandard
that states “Sufficient time should be given for a senior member to supervise quality
processes” (p. 17).
Unlike many language programs, where a program evaluation director can exercise
authority over all available or needed resources (Westerheijden, et al., 2007), John, along
with his team, was not given adequate authority that meets the standards of the NCAAA and
CEA. Providing a quality assurance or program evaluation coordinator with sufficient
authority to obtain all necessary is a basic requirement of the NCAAA and CEA so that s/he
can close the loop of data feedback (Allen, 2004, p. 17). However, John indicated that he
finds difficulty obtaining program evaluation-related data from various sections such as the
program, the college, and other similar programs, for he is not authorized to do so. This has
resulted in poor practices of the two departments in evaluating, reporting on, and improving
quality assurance arrangements. Moreover, when John asked the college to provide him with
funding for conducting external benchmarking, he said, “the college refused to do so for
unrealistic reasons.”
To promote an effective practice of program evaluation, both the NCAAA and CEA
advocate conducting an independent or even a mock evaluation. Nevertheless, the male
student department has had only one informal external evaluation, which was conducted by a
visiting professor from King Saud University. Although the external reviewer provided
concrete recommendations for improvement, he has different areas of interest, as he is a
professor of Arabic. According to John, despite the external reviewer’s great efforts, he was
125
unable to pinpoint areas of weaknesses from which language-related suggestions could be
made for further improvement and systematic review. On the other hand, the female
department has never been subjected to any evaluation. When asked about this, the
department head stated that they have no access to data needed for internal or external
benchmarking, as their role is only limited to filling in the NCAAA forms. Thus, it appears
that this standard is not being met for either the NCAAA or CEA.
Student Dissatisfaction with the Two EFL Departments
Now that the extent to which the two departments meet the integrated model of
standards has been identified, one can understand the students’ level of satisfaction with the
departments. One noteworthy issue is that the evaluation findings of some standards may help
us identify students’ satisfaction with the two departments (e.g., curriculum, learning
outcomes, teaching strategies, assessment methods), while others may not provide us with
sufficient information regarding students’ satisfaction (e.g., mission statement, program
review). Having conversed with some male students, the researcher noticed that they seemed
dissatisfied with some of the department’s educational activities. For example, six out of 15
students who participated in the interviews indicated that they have not noticed dramatic
progress in their English skills. Specifically, one student argued that some of the courses do
not promote students’ critical thinking skills; instead, they primarily rely upon memorization.
Moreover, nine of the 15 participants expressed anxiety and frustration with their poor
speaking and writing skills in English. One of them argued that most of the courses are
designed in a way that improves their receptive skills (e.g., listening and reading), but
neglects improving their productive skills. Another student indicated that he had to engage
himself in communicative activities outside the classroom, which has helped him speak
English intelligibly. Unfortunately, the NCAAA standards do not include theoretical

126
underpinnings about L2 oral fluency, for these standards are developed for various areas of
specialization, the CEA standards; on the other hand, place substantial emphasis on
improving language skills as a whole. Having examined the male department’s objectives
(See Appendix Q), it can be noticed that instead of focusing on developing students’
language skills, they address other important skills such as technical, research, and
intercultural competence skills.
Furthermore, one of the students complained that although they are “placed in a safe,
clean, livable environment” (CEA, 2015, p. 29), many students complain that it is not a
motivating learning environment. He said, “Some of the department courses are taught in a
high-school manner wherein the instructor dominates the classroom discourse without
engaging students in interactive discussions” (personal communication, April 18, 2015)
When asked about this issue, one of the female faculty members indicated that this could be
true to some extent, pointing out that teaching students who have drastic learning
backgrounds makes it difficult to diversify teaching strategies. Thus, some teaching faculty
members use merely a lecture-based teaching format. Another teaching faculty complained
that having a large number of students per class does not allow them to avoid lecturing and
engaging students in interactive discussions.
One of the students complained that some non-native English speaking teaching
faculty members speak Arabic frequently. To determine if this was true, the researcher asked
some faculty members how often they spoke Arabic inside class. Two of them asserted that
they speak English most of the time; however, when they want to provide students with
important instructions (e.g., about exams), they often repeat what they have said in Arabic.
Another student complaint was that their assignments are not graded and returned in a timely
manner. According to the survey results, 20 students strongly agree that their work is not
127
graded and returned to them, 55 agree, 56 are uncertain, 30 disagree, and 25 strongly disagree
(Fig. 10).
Figure 10. Students’ responses to how often their work is graded and returned to them
promptly.
Student Satisfaction with the Practices of the Two Departments
Although most of the practices discussed earlier were found inadequate to fully meet
the study-integrated model of standards, students seemed satisfied with some of these
practices. For example, it can be noted that the percentages of students who strongly agree
and agree with most of the statements outweigh those who strongly disagree or disagree. This
may suggest that the practices of these statements are relatively satisfying, with further
improvements needed. Another example is that one of the highest overall ratings of student
satisfaction of the practices was about student learning outcomes. When asked about the
extent to which SLOs meet their learning expectations, 95 out of 190 participants (50%)
seemed satisfied with SLOs, while only 37 of them (20%) were dissatisfied that SLOs did not
meet their expectations (Fig. 11).
128
Figure 11. Students’ responses to the extent to which SLOs meet their expectations.
In addition, as a former student and current lecturer at the male EFL department, the
researcher has noticed progress in numerous aspects of the department performance. For
example, compared to its performance in 2008, when the department had its first review, the
male department currently operates according to predetermined standards and KPIs based on
which the achievement of the department outcomes are verified. Teaching strategies, SLOs,
and assessment methods are all evaluated on a semesterly basis and the evaluation results are
included into the NCAAA templates for improvement. For example, each course coordinator
is responsible for preparing course blueprints indicating the consistency between the intended
SLOs with teaching and assessment strategies (Male Department Annual Report, 2014).
Another satisfying practice of both the male and female departments, which was not
included into the survey but naturally arose during the interviews, was concerned with the
academic and career counseling and advice provided for students. Two out of 15 of the
participants commended the efforts of teaching faculty in providing students with effective
advising about their current academic performances and potential career opportunities. They
indicated that the department administrators have allocated separate offices for student
individual counseling services. Moreover, four of the female participants pointed out that
whenever they face any form of learning difficulties, the academic advisors to which they are
assigned provide them with all needed counseling and assistance. These good practices meet
one of the salient substandards of the NCAAA, namely, Educational Assistance for Students.
129
Generally, students appeared to be satisfied with the two departments’ newly
established facilities. In the interviews, three of the male students indicated that unlike the old
building, the environment of the new building is clean, attractive, and well maintained, which
meets Standard 7, Facilities & Equipment (NCAAA, 2013, p. 39). Despite their complaints
about the lack of Internet connectivity, they also noted that the main library provides them
with “efficient access to online databases, research, and journal material” relating to the
department courses, which meets Standard 6, Learning Resources (NCAAA 2013, p. 35). The
female department has also recently transferred to a new building equipped with adequate
technological and learning resources that facilitate the educational process. In the interviews,
having attended classes at both buildings, five out of nine participants indicated that the new
building is well-equipped and adapted to meet their learning requirements as opposed to the
old building.
Good Quality Assurance Practices
Given that the ultimate purpose of this paper is to evaluate the readiness of the two
departments for obtaining academic accreditation from the NCAAA and CEA in the future, it
would be very useful to highlight both good and potentially poor practices so that they can be
considered carefully for future enhancement and improvement respectively. In general, the
two departments have numerous good practices that the researcher has identified in this
study. For example, although the two departments’ mission statement seems to partially meet
the Mission Standard of the CEA and NCAAA, it only needs few improvements with greater
emphasis placed on students’ language skills, and then it will fulfill most of the practices of
the mission standard of the NCAAA and CEA. Therefore, if the problematic issues raised
earlier about the departments’ mission statement are addressed effectively, the mission
standard will be one of the strongest standards in meeting the two commission standards.
130
Another good quality assurance practice of the two departments is collecting data on
student achievement of SLOs, advancement from one level to another, graduation, and so
forth (Male Department Annual Report, 2014), which is highlighted as a positive aspect in
the CEA Program Development, Planning, and Review standard. According to John, most
data is collected on a semester basis and included into the NCAAA evaluation templates for
future use and improvements. In each department, John continued, a coordinator is assigned
to oversee the extent to which course reports and specifications are prepared on a semester
basis according to the NCAAA templates. Another good quality practices was evident in
rating all practices on an annually basis. For example, according to the two departments’
recent annual reports, feedback from stakeholders (e.g., students, teaching staff, employers,
alumni) is obtained for improvement purposes.
Poor Quality Assurance Practices
Despite the good practices discussed above, the two departments still have some poor
quality assurance practices that need to be addressed in order to improve their performance.
First, as stated in their annual report (2014), the two departments fail to conduct internal or
external benchmarking against quality comparable EFL departments from which
recommendations for improvement can be made. Second, most of the quality assurance
processes, as John noted, are based on individual initiatives even though there is a quality
assurance unit in both departments. In other words, since quality assurance processes at the
two departments are not institutionalized, this results in not involving all teaching and other
staff in quality assurance processes. This was noticed in John’s complaints about the lack of
having a plan in writing for quality assurance practices, lack of cooperation among quality
assurance coordinators at different departments, and lack of having access to data necessary
for program evaluation.

131
Another poor quality assurance practice is that the departments’ data is not utilized
well. In other words, although the two departments lack necessary data for conducting
benchmarking, they can still overcome this issue by utilizing any data at hand. Unfortunately,
despite collecting some essential data about student performance (e.g., student achievement
and progression) as discussed earlier, the departments’ annual reports do not indicate that
these data are used for improvement (closing the loop). For example, no department has ever
conducted internal benchmarking against its performance from previous years. Another
observed poor practice of the two departments was that their quality assurance processes are
not themselves evaluated and reported on a regularly basis. This is one of the fundamental
quality assurance practices that the NCAAA highlights in its substandard Administration of
Quality Assurance Processes.
Assessment-related Dilemmas
One of the potential dilemmas that may delay the two EFL departments from
obtaining academic accreditation from the NCAAA and CEA is concerned with their
assessment practices. First, as indicated in their annual reports (2014), the departments lack
using performance-based assessment approaches such as self-assessment, task assessment,
group-project assessment, portfolios and so forth. Second, the two departments fail to engage
students in a culminating experience course through which they can demonstrate their gained
skills throughout their learning experience in the departments. For example, the study plans
of the two departments lack a capstone or research course that verifies students’ overall
achievement. Third, although one of the two departments’ main goals is to graduate EFL
teachers, their study plans lack a teaching practicum course, which is a basic requirement of
NCAAA (i.e., field experience).
Quality Assurance-related Dilemmas

132
As for issues relating to quality assurance practices, the two departments lack training
their teaching and other staff on quality assurance and program evaluation processes. As
noted earlier, this poses a barrier for obtaining academic accreditation especially from the
NCAAA. In other words, one of the NCAAA substandards requires that “all teaching and
other staff participate in self-evaluations and cooperate with reporting and improvement
processes” (p. 16). Thus, it is likely that teaching and other staff find difficulty participating
in self-study evaluations without having adequate training programs about quality assurance
processes. Moreover, another potential dilemma of obtaining accreditation from the CEA is
the lack of having a program evaluation plan in writing that “guides the review of curricular
elements, student assessment practices, and student services policies and activities” (CEA,
2015, p. 46).
DISCUSSION AND CONCLUSION
In general, the two EFL programs are to be commended for their efforts in applying
academic accreditation standards. Nonetheless, it is clear that their efforts lack adequate
experience in program evaluation, preventing them reaching an efficient evaluation. In other
words, the researcher noticed that most of the practices lack essential compliance with the
standards of both CEA and NCAAA. For example, the programs’ mission is lacking for
several reasons. First, the programs have two different missions: one stated in program
specifications and another one posted on the university’s website. This does not meet the
NCAAA sub-standard “The mission should be publicized widely within the institution and
action taken to ensure that it is known about and supported by teaching and other staff and
students” (NCAAA, 2013, p. 8). In addition, neither department has a curriculum in writing,
which made it difficult for me to document evidence as they are scattered in various
133
documents. Nevertheless, the curriculum of the two programs appears to partially meet the
standards of both the CEA and NCAAA.
As for SLOs, they seem to meet the standards partially as they do not “represent
significant progress or accomplishment” (CEA, 2015, p. 10). Another SLO issue is that they
do not reflect various levels of students’ cognitive, affective, and interpersonal skills as
required by the NCAAA. Furthermore, teaching strategies were also found to only partially
meet the standard, for they are not appropriate for various learning styles. In addition, some
students were unsatisfied with why some teaching faculty members persistently speak Arabic.
Furthermore, both assessment practices and student achievement do not meet the standards
for the considerations discussed before. Therefore, this standard requires immediate action
leading to further improvements. Finally, based on the data obtained, the Program
Development, Planning, and Review standard appears to only partially meet the standards of
the CEA and NCAAA.
Implications
Standard One: Mission/Goals/Objectives. First, given that the two departments
have two different mission statements, only one of them should be chosen and posted on the
university website. It is also recommended that the mission statement fulfill all six criteria
proposed by Conway et al. (1994). More specifically, it lacks relevance to the college
mission, the target groups (EFL students), the departments’ geographical areas, and the added
value sought by the departments similar to other EFL departments. In addition, in order to
develop a mission statement that meets the NCAAA and CEA standards, the mission should
have a controlling idea that guides the two departments’ activities, policies, and allocation of
resources. Finally, a plan for regularly reviewing, evaluating and reaffirming the mission
statements of the two departments in light of changing conditions should be developed.

134
Standard Two: Curriculum. As discussed earlier, the two departments do not have
an organized curriculum; instead, they have several documents representing components of a
curriculum such as a study plan, course specifications, and so forth. Therefore, it is
recommended that a separate curriculum be written in a guide and be made clear for all
stakeholders and evaluators. Moreover, SLOs should focus on all learning domains as
proposed by Bloom (1956) and be expressed in observable, measurable terms as required by
the standards. In addition, teaching strategies used in the two departments should be proper
for various types of learning outcomes. Performance assessment techniques should also be
integrated into assessment practices. Furthermore, the departments’ objectives should focus
on developing students’ language skills. Finally, teaching faculty should be trained on “the
theory and practice of student assessment” (NCAAA, 2013, p. 23).
Standard Three: Student Achievement. Closely related to the NCAAA student
assessment standard is the CEA Student Achievement Standard, which needs further
improvement. For example, it is highly recommended that systems be established “for central
recording and analysis of course completion and program progression and completion rates”
(NCAAA, 2013, p. 23) by “timely reporting to students of their progress through a level
and/or completion of the course” (CEA, 2015, p. 41). Another significant implication is
concerned with ensuring that all assessment practices are aligned with program-level
outcomes to ensure that they meet the professions of teaching, translating and interpreting.
Finally, it is highly recommended that an assessment expert be hired at the two departments
as an assessment coordinator to ensure that assessment practices meet the targeted standards.
Standard Four: Program Development. Planning, and Review.
One of the essential implications for promoting the two departments’ development,
planning, and review processes is by regularly reviewing their mission statements,

135
curriculum, teaching strategies, and assessment practices. Moreover, teaching faculty should
be trained on program evaluation and quality assurance processes so that they can obtain
adequate knowledge and skills needed for undertaking these processes. Moreover, to make
quality assurance and program evaluation processes consistent and involve all teaching and
other staff, it is necessary to establish an institutionalized quality assurance system and
include it into the job descriptions of teaching and other staff to ensure that quality assurance
practices are integrated into normal planning and development strategies. Furthermore, all
available data should be utilized for self-evaluation purposes instead of relying on difficult-
to-obtain data. Finally, quality assurance coordinators should be given delegated authority to
perform quality assurance processes.
136
CHAPTER 5: CONCLUSION
The findings of this dissertation will contribute to the literature of L2 testing,
assessment, and evaluation even though some of them still need further research. For
example, although the findings of article one, “Saudi Student Placement into ESL Program
Levels: Issues beyond Test Criteria”, revealed that 11 out of 27 of the pilot study participants
(40%), 14 out of 71 of the Saudi ESL students (20%), and 57 out of 216 of GCC students
(18%) reported that they deliberately underperformed on ESL placement tests, these cannot
be generalizable due to lack of obtaining concrete evidence. In other words, program
placement and replacement rates, Saudi ESL students’ placement rates across different
programs (obtained from SACM), and participants’ language performance were not obtained
as concrete evidence to support the findings. Moreover, the substantial discrepancies between
the responses of ESL administrators and assessment coordinators as well as those of
participating students bring into question the validity of the latter. That is, if the issue at hand
really exists in these percentages, one may wonder why the vast majority of ESL
administrators, at least in this study, are unable to detect the issue.
As a consequence, given that there are 32,557 Saudi IEP students enrolled in US
programs, and perhaps double this number of other IEP scholarship students, the findings of
article one suggest an urgent need for further research studies that better provide more
tangible evidence of ESL students purposefully underperforming on ESL placement tests.
More specifically, future researchers should avoid asking students directly if they have ever
intentionally performed poorly on ESL placement tests. It is recommended that they allow
such findings to occur naturally in order to make the data more valid and reliable. In sum,
regardless of the differences between the percentages of the study’s participants, it can be
concluded that this study’s issue indeed exists and is worth addressing. Therefore, this study
137
proposes some concrete suggestions that may contribute to eradicating or at least mitigating
this issue.
The findings of the second article, “Integrating Self-assessment Techniques into L2
Classroom Assessment Procedures”, encourage many ESL/EFL programs to incorporate self-
assessment techniques into their assessment procedures. That is, although self-assessment
techniques have long been utilized in many L2 classrooms (Ekbatani, 2011), several language
programs, especially in Saudi Arabia, still lack using not only self-assessment rubrics but also
many other alternative assessment techniques. As a result, this study suggests that much more
attention be paid to using alternative assessment techniques, including self-assessment
measures, in order to promote student-centeredness (Nunan, 1988), enhance student learning
(Boud 1995; Taras, 2010), and augment learner autonomy (Benson, 2012). Nonetheless, the
study concluded that a larger number of participants should be recruited in order to identify
any potential correlation between participants’ self-assessed scores and their TOEFL/IELTS
scores.
More specifically, this study investigated the experience of only 21 ESL participants
using the CEFR self-assessment rubric, which was inadequate to draw valid and
generalizable conclusions that answer the research questions and accurately reflect students’
attitudes and experiences in using self-assessment rubrics. However, although the study did
not reveal a statistically significant correlation between the two measures, ten participants
reported that they found self-assessment rubrics very accurate in reflecting their actual levels
of language proficiency. For example, they provided valuable insights about their experience
in using CEFR assessment rubric including but not limited to pinpointing their areas of
strength and weakness, engaging them in higher language skills such as interaction,
strategies, and language quality, and involving them more in their own learning. Generally,
138
self-assessment rubrics can be very effective especially if they are first used as a supplement
to traditional assessment methods, taking into consideration the importance of training
students on using them effectively.
As for the third article, “Quality Assurance and Accreditation as Forms for Language
Program Evaluation: A Case Study of Two EFL Departments in A Saudi University”, it is
hoped that it will contribute to raising language program stakeholders’ awareness of the
importance of program evaluation as a process for enhancing student learning outcomes. In
general, the article concluded that the two EFL programs that participated in the study should
be commended for utilizing many good quality assurance practices. However, they still lack
adequate background regarding program evaluation. In other words, it was found that most of
their practices lack compliance with the standards of the CEA and NCAAA due to lacking
adherence to the two agencies guidelines. For example, the two departments lack a program
evaluation plan in writing, which resulted in dissatisfaction of students with some essential
standards and substandards. Moreover, lack of having a curriculum in writing is another issue
common to the two departments, which may make it difficult to document evidence on their
current performance. However, student achievement standards are not being met due to the
absence of using alternative assessment techniques.
It is important to note that this study is not solely intended for EFL programs, but it
also provides all language programs with a simulated evaluation that can be used as a
reference for future program evaluation projects. In other words, in addition to making this
simulated study an internal benchmark for the remaining 16 EFL departments at X
University, it can also be beneficial to program administrators of other disciplines. For
example, the findings here can serve any program that seeks to obtain programmatic
accreditation from the NCAAA since the study integrated a set of standards can be applicable
139
to any post-secondary programs in Saudi Arabia. In conclusion, administrators of any
program in general but language programs in particular are highly advised to make their
program evaluation processes as systematic as they can in order to meet the standards of the
target accreditation commission.
140
APPENDIX A: PILOT STUDY FINDINGS
141
8. If you have other reasons, please mention them here.
142
11. Do you think the Ministry of Education should establish effective mechanisms to
enhance your English skills before you study abroad?
143
APPENDIX B: SURVEY QUESTIONS
If you are a Saudi ESL student (who has once taken an ESL placement test), would
you please participate in this survey to help me collect some data for my dissertation research
project? My name is Adnan Alobaid, and I am a Ph.D. candidate in Second Language
Acquisition and Teaching (SLAT), University of Arizona, Tucson, USA. For my doctoral
dissertation, I am conducting a research study on the placement of Saudi ESL students into
ESL levels. If you decide to participate in this study, you would have to respond to a 20-item
online survey, which will take you approximately 10-15 minutes. Then, I will, if you provide
me with your email address, interview you either at your ESL program, at any other place
you prefer, over Skype, or via any other online platforms. To participate in the survey, please
respond to the following items. However, at the end of the survey, you will be asked to
provide your email address in case you decide to participate in the interview part. Please
share this email with your friends, classmates, or any other categories that you think fit this
study. I will not collect any video recordings for this research; however, I will, with your
permission, audiotape and take notes during the interview. I might also need to collect one of
your classroom writing samples.
All survey responses, interview transcriptions, audio recordings, and writing samples
will be stored on a password-protected drive. The drive will be kept in a locked departmental
office at the University of Arizona for six years past the date of project completion. However,
no one other than (the primary investigator) will have access to the data. Thus, all of your
information and responses will be kept confidential. Also, your name will not be recorded on
the tape as well as will not be associated with any part of the written reports or oral
presentations. Your participation is voluntary. You may refuse to participate in this study.
You may also leave the study at any time. No matter what decision you make, it will not
144
affect your grades or future benefits. Participation Consent: I have read (or someone has read
to me) this form, and I am aware that I am being asked to participate in a research study. I
have had the opportunity to ask questions and have had them answered to my
satisfaction.
1. Do you voluntarily agree to participate in this study?
- Yes
- No
If No Is Selected, Then Skip To End of Survey
Q2. Do you allow the researcher to publish your responses for a research project and
potential publication? (Note that your name will not appear in the survey).
- Yes
- No
If No Is Selected, Then Skip To End of Survey
Q3. Gender?
- Male
- Female
Q4. Which of the following ESL programs are you currently or have previously
attended?
- (Program A).
- (Program B).
- (Program C).
- Others.
Q5. How long have you been studying English abroad in ESL Classes?
145
- 0 – 3 months.
- 4 – 6 months.
- 7 – 12 months.
- More than a year.
Q6. What is your current ESL level?
- Beginner.
- Lower-intermediate.
- Intermediate.
- Upper-intermediate.
- Advanced.
- Passed all ESL Levels
- Others.
Q7. When did you take the Placement Test?
- 0-2 months ago.
- 3-6 months ago.
- 7-12 months ago.
- More than a year ago.
- Never taken it before
If never taken it before Is Selected, Then Skip To End of Survey.
Q8. Do you think your placement scores reflected your actual level of language
proficiency?
- Definitely yes
- Probably yes
146
- Probably not
- Definitely not.
Q9. Do you think the content of ESL Placement Test reflected real-life language uses?
- Definitely yes
- Probably yes
- Probably not
- Definitely not.
Q10. The time given for completing the test was:
- Too long.
- Relatively long.
- Too short and inadequate to answer the test items.
- Relatively short.
- Appropriate for completing the test.
Q11. Did you intentionally, in one way or another, perform poorly on the ESL
Placement Test to be placed in lower levels?
- Yes.
- No.
If No Is Selected, Then Skip To Do you think the Saudi...
Q12. If yes, what are the reasons that made you intentionally place yourself into lower
levels? Please choose all boxes that apply.
- To have more time for learning English.
- To have adequate time to adapt to the program and educational system.
- To have adequate time to adapt to the city environment.
- To focus more on preparing for standardized tests (TOEFL, IELTS, GRE).

147
- Other reasons.
Q13. If you have any other reasons, please mention them here:
Q14. Who encouraged you to perform poorly in the test?
- A friend of mine who has already taken an ESL Placement Test.
- Friend(s) with whom I took the test.
- Advice obtained from the Internet.
- Myself.
Q15. If you did not have a scholarship, would you still internationally perform poorly?
- Yes.
- No.
- Maybe.
- I don't know.
Q16. Do you think the Saudi Ministry of Education should establish effective
mechanisms to enhance your English skills before you study abroad?
- Definitely yes
- Probably yes
- Probably not
- Definitely not.
Q17. If yes, what are some suggestions that help establish such mechanisms? Please
check all boxes apply.
- We should be given ESL courses prior to studying abroad.
- A TOEFL or IELTS score should be required to obtain a scholarship.
- We should be given more time to learn English in our country.

148
- Others.
Q18. If not mentioned above, what are other suggestions do you have?
Q19. Briefly, provide us with some comments about your experience in taking ESL
placement tests that you would like to share.
Q20. Please provide your email if you would you like the researcher to interview you
(either individually or in groups) to allow you to expand more on your responses? I do
look forward to meeting with you (either face-to-face or online) to obtain your opinions
about ESL Placement Tests. (Note that your information will be kept confidential).
149
APPENDIX C: INTERVIEW QUESTIONS
1. What are the challenges you faced when you took the ESL placement test?
___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________
2. Do you think that the ESL placement tests that you have taken are accurate in reporting
your actual level of language proficiency? Why?
___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________
3. Do you think that the ESL placement test that you had taken placed you in a level
appropriate to your language skills? Please explain.
___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________
4. Did you prefer to be placed into a lower, higher, or the same level of that of your level of
proficiency? Why or why not?
___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________
150
5. If you had intentionally performed poorly in the ESL placement test, did you benefit from
that afterwards? Why or why not?
___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________
6. After you were placed into lower than-expected levels, were you able to improve your
English better more than if you were placed into a class that fits your level of proficiency?
Please explain.
___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________
7. If you had not intentionally performed poorly on ESL placement tests, can you tell me why
you did not do so?
___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________
8. If the Saudi Ministry of Education demands you to take a six-month ESL course in Saudi
Arabia prior to coming to the targeted country, would you commit to attending that class?
Please explain why and why not?
151
___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________
9. If the Saudi Ministry of Education demands you to obtain a score of (4.5) in the IELTS or
a score of (40) in the TOEFL prior to studying abroad, would you be willing to do so? Why
and why not?
___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________
10. Do you have any comments that you have not mentioned in the survey?
___________________________________________________________________________
___________________________________________________________________________
152
APPENDIX D: ESL ADMINISTRATORS & ASSESSMENT COORDINATORS
My name is Adnan Alobaid. I am a doctoral candidate in Second Language Acquisition and
Teaching (SLAT), University of Arizona, Tucson, USA. For my doctoral dissertation, I am
conducting a research study on the placement of Saudi scholarship students into ESL levels.
Based on my data, I found that some ESL Saudi students intentionally perform poorly on
ESL placement tests to be placed into lower ESL classes. This is because some of them need
to have more time learning English, preparing for standardized tests, applying for
universities, or avoiding returning to Saudi Arabia (as they did not obtain unconditional
admission). Research has reported negative consequences of such misplacements. Based on
his findings, Stewart (1999) for example reported that 22% of ESL placement test-takers
were misplaced. This rate has resulted in complaints among test-takers. Moreover, Crusan
(2002) argued that if misplaced, students would have to pay additional financial expenses. In
addition, Kokhan (2014) pointed out that cases of misplacement are likely to make students
fail to significantly improve their English. In my data, I am missing an important part, which
is concerned with the extent to which such behavior (students throwing a test in purpose) can
affect the whole placement system of an ESL program. Therefore, I would like to obtain your
valuable opinions on this issue. Moreover, if you, based on your permission, provide your
email, I will send you the findings of this study. Note: No names of participants or ESL
programs are required in this survey.
Do you agree to participate in this survey?
Yes (1)
No (2)
153
1. Have you ever noticed any incident of students throwing an ESL placement test in
purpose? If yes, what is the approximate rate?
2. How do you identify cases of misplaced students? Please check all that apply.
Teacher feedback.
Students’ class grades.
Students’ claims of being misplaced.
Others.
3. If you have other means of identifying cases of misplacement, please state them here:
4. How do you usually deal with significant misplacements? Please check all that apply.
A systematic diagnostic assessment is given to students.
Teacher evaluation.
Standardized tests are used as supplementary.
Others.
5. If you have other ways for alleviating misplacements, please state them here:
6. Suppose that almost 20% of scholarship students attending your program
intentionally perform poorly on the placement test, how would this affect the whole
placement system?
7. If you wish to receive the findings of this study, please provide your email address:
154
APPENDIX E: STUDY PARTICIPANTS
Total
Type of Participants Male Female
50 21 71
Saudi ESL Students
GCC Students
162 54 216
ESL Administrators and Assessment

17 17
Coordinators
155
APPENDIX F: CEFR, TOEFL, AND IELTS EQUIVALENCY TABLE
(TANNENBAUM & WYLIE, 2007)
Competency CEFR TOEFL IBT IELTS
0-8 0 - 1.0
Basic A1 9 - 18 1.0 - 1.5
A1 19 - 29 2.0 - 2.5
A2 30 - 40 3.0 - 3.5
Independent B1 (IELTS 3.5)
B1 41 - 52 4.0
B1 (IELTS 4.5) 53 - 64 4.5 - 5.0
B2
B2 65 - 78 5.5 - 6.0
Proficient C1 79 - 95 6.5 - 7.0
C2 96 - 120 7.5 - 9.0
156
APPENDIX G: CEFR SELF-ASSESSMENT RUBRIC DESCRIPTORS
Taken from the Bank of descriptors for self assessment in European Language Portfolios
© Council of Europe, Language Policy Division
1. Your First Name and Middle Initial (Again, your name will not appear in the study).
2. Gender:
m Male (1)
m Female (2)
3. What is your current ESL Level? (Note that your level of proficiency must be
intermediate, upper-intermediate, or advanced).
m Intermediate (1)
m Upper-intermediate (2)
m Advanced (3)
4. What was your most recently obtained TOEFL score? And when did you take the
test? If you are planning to take the TOEFL very soon, please provide the test date.
157
Welcome ESL Intermediate Students (B1)
Instructions:
I can do this
I cannot do this
a. Please try to assess your level of language proficiency by choosing the appropriate (I can do this or I cannot do
this) choice as accurate as you can.
b. If you think you can efficiently do the skill/task that each statement describes, choose the (I can do this) choice.
However, if you think you can't, in a way or another, do it, please select the (I cannot do this) choice.
Listening 1 2
I can follow clearly articulated speech directed at me in everyday conversation, though I sometimes have to ask for repetition of
particular words and phrases.
I can generally follow the main points of extended discussion around me, provided speech is clearly articulated in standard dialect.
I can listen to a short narrative and form hypotheses about what will happen next.
I can understand the main points of radio news bulletins and simpler recorded material on topics of personal interest delivered
relatively slowly and clearly.
I can catch the main points in TV programmes on familiar topics when the delivery is relatively slow and clear.
I can understand simple technical information, such as operating instructions for everyday equipment.
Reading 1 2
I can understand the main points in short newspaper articles about current and familiar topics.
I can read columns or interviews in newspapers and magazines in which someone takes a stand on a current topic or event and
understand the overall meaning of the text.
I can guess the meaning of single unknown words from the context thus deducing the meaning of expressions if the topic is familiar.
I can skim short texts (for example news summaries) and find relevant facts and information (for example who has done what and
where).
I can understand the most important information in short simple everyday information brochures.
I can understand simple messages and standard letters (for example from businesses, clubs or authorities).
In private letters I can understand those parts dealing with events, feelings and wishes well enough to correspond regularly with a
pen friend.
I can understand the plot of a clearly structured story and recognise what the most important episodes and events are and what is
significant about them.

Spoken Interaction 1 2
I can start, maintain and close simple face-to-face conversation on topics that are familiar or of personal interest.
I can maintain a conversation or discussion but may sometimes be difficult to follow when trying to say exactly what I would like
to.
I can deal with most situations likely to arise when making travel arrangements through an agent or when actually travelling.
I can ask for and follow detailed directions.
I can express and respond to feelings such as surprise, happiness, sadness, interest and indifference.
I can give or seek personal views and opinions in an informal discussion with friends.
I can agree and disagree politely.
Spoken Production 1 2
I can narrate a story.
I can give detailed accounts of experiences, describing feelings and reactions.
I can describe dreams, hopes and ambitions.
I can explain and give reasons for my plans, intentions and actions.
I can relate the plot of a book or film and describe my reactions.
I can paraphrase short written passages orally in a simple fashion, using the original text wording and ordering.
Strategies 1 2
I can repeat back part of what someone has said to confirm that we understand each other.
I can ask someone to clarify or elaborate what they have just said.
When I can’t think of the word I want, I can use a simple word meaning something similar and invite ”correction”.
Language Quality 1 2
I can keep a conversation going comprehensibly, but have to pause to plan and correct what I am saying – especially when I talk
freely for longer periods.
I can convey simple information of immediate relevance, getting across which point I feel is most important.
I have a sufficient vocabulary to express myself with some circumlocutions on most topics pertinent to my everyday life such as
family, hobbies and interests, work, travel, and current events.
I can express myself reasonably accurately in familiar, predictable situations.
Writing 1 2
I can write simple connected texts on a range of topics within my field of interest and can express personal views and opinions.
I can write simple texts about experiences or events, for example about a trip, for a school newspaper or a club newsletter.
159
I can write personal letters to friends or acquaintances asking for or giving them news and narrating events.
I can describe in a personal letter the plot of a film or a book or give an account of a concert.
In a letter I can express feelings such as grief, happiness, interest, regret and sympathy.
I can reply in written form to advertisements and ask for more complete or more specific information about products (for example a
car or an academic course).
I can convey – via fax, e-mail or a circular – short simple factual information to friends or colleagues or ask for information in such
a way.
I can write my CV in summary form.
160
Welcome ESL Upper-intermediate Students (B2)
Instructions: Students
I can do this
I cannot do this
Listening 1 2
I can understand in detail what is said to me in standard spoken language even in a noisy environment.
I can follow a lecture or talk within my own field, provided the subject matter is familiar and the presentation straightforward and
clearly structured.
I can understand most radio documentaries delivered in standard language and can identify the speaker’s mood, tone, etc.
I can understand TV documentaries, live interviews, talk shows, plays and the majority of films in standard dialect.
I can understand the main ideas complex speech on both concrete and abstract topics delivered in a standard dialect, including
technical discussions in my field of specialisation.
I can use a variety of strategies to achieve comprehension, including listening for main points; checking comprehension by using
contextual clues.
Reading 1 2
I can rapidly grasp the content and the significance of news, articles and reports on topics connected with my interests or my job,
and decide if a closer reading is worthwhile.
I can read and understand articles and reports on current problems in which the writers express specific attitudes and points of
view.
I can understand in detail texts within my field of interest or the area of my academic or professional speciality.
I can understand specialised articles outside my own field if I can occasionally check with a dictionary.
I can read reviews dealing with the content and criticism of cultural topics (films, theatre, books, concerts) and summarise the
main points.
I can read letters on topics within my areas of academic or professional speciality or interest and grasp the most important points.
I can quickly look through a manual (for example for a computer program) and find and understand the relevant explanations and
help for a specific problem.

I can understand in a narrative or play the motives for the characters’ actions and their consequences for the development of the
plot.
I can initiate, maintain and end discourse naturally with effective turn-taking.
I can exchange considerable quantities of detailed factual information on matters within my fields of interest.
I can convey degrees of emotion and highlight the personal significance of events and experiences.
I can engage in extended conversation in a clearly participatory fashion on most general topics.
I can account for and sustain my opinions in discussion by providing relevant explanations, arguments and comments.
I can help a discussion along on familiar ground confirming comprehension, inviting others in, etc.
I can carry out a prepared interview, checking and confirming information, following up interesting replies.
I can give clear, detailed descriptions on a wide range of subjects related to my fields of interest.
I can understand and summarise orally short extracts from news items, interviews or documentaries containing opinions,
argument and discussion.
I can understand and summarise orally the plot and sequence of events in an extract from a film or play.
I can construct a chain of reasoned argument, linking my ideas logically.
I can explain a viewpoint on a topical issue giving the advantages and disadvantages of various options.
I can speculate about causes, consequences, and hypothetical situations.
Strategies 1 2
I can use standard phrases like ”That’s a difficult question to answer” to gain time and keep the turn while formulating what to
say.
I can make a note of ”favourite mistakes” and consciously monitor speech for them.
I can generally correct slips and errors if I become conscious of them or if they have led to misunderstandings.
I can produce stretches of language with a fairly even tempo; although I can be hesitant as I search for patterns and expressions,
there are few noticeably long pauses.
I can pass on detailed information reliably.
I have sufficient vocabulary to express myself on matters concerned to my field and on most general topics.
I can communicate with reasonable accuracy and can correct mistakes if they have led to misunderstandings.
Writing 1 2
162
I can write clear and detailed texts (compositions, reports or texts of presentations) on various topics related to my field of
interest.
I can write summaries of articles on topics of general interest.
I can summarise information from different sources and media.
I can discuss a topic in a composition or “letter to the editor”, giving reasons for or against a specific point of view.
I can develop an argument systematically in a composition or report, emphasising decisive points and including supporting
details.
I can write about events and real or fictional experiences in a detailed and easily readable way.
I can write a short review of a film or a book.
I can express in a personal letter different feelings and attitudes and can report the news of the day making clear what – in my
opinion – are the important aspects of an event.
163
Welcome ESL Advanced Students (C1)
Instructions:
I can do this
I cannot do this
Listening 1 2
I can follow extended speech even when it is not clearly structured and when relationships are only implied and not signalled
explicitly.
I can understand a wide range of idiomatic expressions and colloquialisms, appreciating shifts in style and register.
I can extract specific information from even poor quality, audibly distorted public announcements, e.g. in a station, sports
stadium etc.
I can understand complex technical information, such as operating instructions, specifications for familiar products and services.
I can understand lectures, talks and reports in my field of professional or academic interest even when they are propositionally
and linguistically complex.
I can without too much effort understand films employing a considerable degree of slang and idiomatic usage.
Reading 1 2
I can understand fairly long demanding texts and summarise them orally.
I can read complex reports, analyses and commentaries where opinions, viewpoints and connections are discussed.
I can extract information, ideas and opinions from highly specialised texts in my own field, for example research reports.
I can understand long complex instructions, for example for the use of a new piece of equipment, even if these are not related to
my job or field of interest, provided I have enough time to reread them.
I can read any correspondence with occasional use of a dictionary.
164
I can read contemporary literary texts with ease.
I can go beyond the concrete plot of a narrative and grasp implicit meanings, ideas and connections.
I can recognise the social, political or historical background of a literary work.
I can keep up with an animated conversation between native speakers.
I can use the language fluently, accurately and effectively on a wide range of general, professional or academic topics.
I can use language flexibly and effectively for social purposes, including emotional, allusive and joking usage.
I can express my ideas and opinions clearly and precisely, and can present and respond to complex lines of reasoning
convincingly.
I can give clear, detailed descriptions of complex subjects.
I can orally summarise long, demanding texts.
I can give an extended description or account of something, integrating themes, developing particular points and concluding
appropriately.
I can give a clearly developed presentation on a subject in my fields of personal or professional interest, departing when
necessary from the prepared text and following up spontaneously points raised by members of the audience.
Strategies 1 2
I can use fluently a variety of appropriate expressions to preface my remarks in order to get the floor, or to gain time and keep
the floor while thinking.
I can relate own contribution skilfully to those of other speakers.
I can substitute an equivalent term for a word I can’t recall without distracting the listener.
I can express myself fluently and spontaneously, almost effortlessly. Only a conceptually difficult subject can hinder a natural,
smooth flow of language.
I can produce clear, smoothly-flowing, well-structured speech, showing control over ways of developing what I want to say in
order to link both my ideas and my expression of them into coherent text.
I have a good command of a broad vocabulary allowing gaps to be readily overcome with circumlocutions; I rarely have to
search obviously for expressions or compromise on saying exactly what I want to.
I can consistently maintain a high degree of grammatical accuracy; errors are rare and difficult to spot.
165
Writing 1 2
I can express myself in writing on a wide range of general or professional topics in a clear and user-friendly manner.
I can present a complex topic in a clear and well-structured way, highlighting the most important points, for example in a
composition or a report.
I can present points of view in a comment on a topic or an event, underlining the main ideas and supporting my reasoning with
detailed examples.
I can put together information from different sources and relate it in a coherent summary.
I can give a detailed description of experiences, feelings and events in a personal letter.
I can write formally correct letters, for example to complain or to take a stand in favour of or against something.
I can write texts, which show a high degree of grammatical correctness and vary my vocabulary and style according to the
addressee, the kind of text and the topic.
I can select a style appropriate to the reader in mind.
166
APPENDIX H: INTERVIEW QUESTIONS
(DEVELOPED FROM ANDRADE AND DU (2007)’S SELF-ASSESSMENT RUBRIC)
1. How many times have you taken the TOEFL test?
2. How well did you prepare for the TOEFL test?
3. During learning English, what was/were the most useful/accurate technique(s) that have helped
you identify your levels of proficiency?
4. In this study, you have obtained two different scores (self-rated and TOEFL), which one do you
think is more accurate in reporting your actual levels of proficiency?
5. Have you ever used a self-assessment technique before this study? If yes, how was it?
6. Tell me about your experience in using the self-assessment rubric in this study?
7. Would you like to use this assessment rubric (or a similar one) in classrooms? Why and why not?
8. Do you think you can assess your English skills using self-assessment tools? Why or why not?
9. When you used CEFR self-assessment rubric in this study, were you able to detect your areas of
strengths and weaknesses? How?
10. When you used the CEFR self-assessment rubric in this study, did you find it difficult to use?
11. When you used CEFR self-assessment rubric in this study, did you find any difficulty assessing
yourself in one particular area/skill? Would you explain it?
12. Please provide some advantages and disadvantages of using self-assessment techniques?
13. Finally, if you support using self-assessment techniques in classrooms, do think it should be used
alone? Or it should be supported with other assessment techniques? Why?
167
APPENDIX I
A One-time Free Proofreading of One of your CESL Papers
Dear CESL students:
My name is Adnan Alobaid, a doctoral candidate, in the Second Language Acquisition and Teaching
program at the University of Arizona, Tucson.
I am offering you a one-time free proofreading of one of your CESL papers (10 pages or less) if you
participate in this study. To participate in this study, you would have to:
1) Assess your level of English proficiency using an online CEFR rubric, which will take you
approximately 15-20 minutes.
2) After you assess yourself, you will, upon your permission, provide me with your TOEFL scores
once you obtain them (Note that you have the right to refuse to do so).
If you are in Level (3) or (4), please click on this link:
https://goo.gl/nEFoWY
If you are in Level (5) or (6), please click on this link:
https://goo.gl/0K153y
168
If you are in Level (7) or Bridge Program, please click on this link:
https://goo.gl/EietQm
Important Notes:
1. Your name, TOEFL scores, or any other private information will not appear in the study.
2. Do not forget to include your email so that we can arrange for proofreading.
3. Keep in mind that proofreading will be done for the first 50 participants ONLY. Therefore, it is highly
recommended that you respond to the survey as soon as possible.
If you have any question, please feel free to shoot me an email.
adnanalobaid@email.arizona.edu
Yours,
Adnan Alobaid.
169
APPENDIX J: COLOR-CODED CEFR, TOEFL, AND IELTS EQUIVALENCY TABLE
(TANNENBAUM & WYLIE, 2007)
Competency CEFR TOEFL IBT IELTS

N/A N/A 0–8 0 - 1.0
A1 9 – 18 1.0 - 1.5
Basic A1 19 – 29 2.0 - 2.5
A2
30 – 40 3.0 - 3.5
B1 (IELTS 3.5)
B1 41 – 52 4.0
Independent B1 (IELTS 4.5)
53 – 64 4.5 - 5.0
B2
B2 65 – 78 5.5 - 6.0
C1 79 – 95 6.5 - 7.0
Proficient
C2 96 – 120 7.5 - 9.0
170
APPENDIX K: PARTICIPANTS’ SELF-RATED SCORES
Name Gender Level of English TOEFL/IELTS CEFR Self-rated Score Overall %
Proficiency Score # of I can do % obtained
A. A. Female Intermediate 46 25 59% 25/33.6
H. A. Male Intermediate 36 23 54% 23/33.6
S. A. Female Intermediate 90 42 100% 42/33.6
S. M. Male Intermediate 39 36 85% 36/33.6
M. M. Male Intermediate 32 12 28% 12/33.6
A. A. Male Upper-intermediate 72 40 95% 40/33.6
H. A. Male Upper-intermediate 42 25 59% 25/33.6
A. K. Male Upper-intermediate 66 24 57% 24/33.6
M. A. Male Upper-intermediate 47 21 50% 21/33.6
D. J. Female Upper-intermediate 63 13 30% 13/33.6
C. C. Female Upper-intermediate 62 31 73% 31/33.6
Y. N. Male Upper-intermediate 57 28 66% 28/33.6
Y. C. Male Upper-intermediate 36 33 78% 33/33.6
M.A. Male Advanced 78 19 51% 19/29.6
A. A. Female Advanced 78 15 40% 15/29.6
N. M. Female Advanced 50 25 67% 25/29.6
A. A. Female Advanced 70 29 78% 29/29.6
A.M. Female Advanced 65 28 75% 28/29.6
O. I. Male Advanced 96 26 70% 26/29.6
S. N. Female Advanced 70 25 67% 25/29.6
J. M. Male Advanced 68 14 37% 14/29.6
171
APPENDIX L: CONSENT FORM
My name is Adnan Alobaid, a doctoral candidate, in the Second Language Acquisition and Teaching
program at the University of Arizona, Tucson. I am conducting a research study for using alternative
assessment approaches for proficiency purposes. To participate in this study, you would have to:
1) Assess your level of English proficiency using an online CEFR rubric, which will take you
approximately 15-20 minutes.
2) Upon your permission, provide me with your TOEFL scores once you obtain them (Note that you have
the right to refuse to do so.
3) And finally be engaged in a face-to-face or online interview at your discretion to allow you to expand
more on your experience in and attitudes towards using self-assessment rubrics.
All self-rated scores, interview transcriptions, and audio recordings will be stored on a password-
protected drive that will be kept in a locked departmental office at UA for six years past the date of
project completion (May. 2016). However, no one other than the primary investigator (me) will have
access to the data. In case we are unable to arrange for a face-to-face meeting, we can carry out the
interview over Skype. However, The interview will be audio recorded after obtaining your consents. In
case you choose not to be recorded, you can still participate in the interview part without being recorded
under any circumstances. Moreover, if you refuse to participate in the interviews, your participation in the
first round of data collection is still highly appreciated and adequate for the study.
Your participation is voluntary. You may refuse to participate in this study. You may also leave the study
at any time. No matter what decision you make, it will not affect your grades or future benefits.
Participation Consent
I have read and I am aware that I am being asked to participate in a research study. If you have any
questions about your rights are a research participant, please contact the Human Subjects Protection
Program at 520-626-6721 or at www.orcr.arizona.edu/hspp or you can contact me via: (818) 614-4522,
adnanalobaid@email.arizona.edu
By taking part in the survey, you are agreeing to have your responses used as part of research.
172
APPENDIX M: WEB-BASED STUDENT SURVEY
To what extent do you agree or disagree with the following statements:
Statements4
Strongly To some Strongly

Agree Disagree Comments
agree extent disagree
Practices
Course Objectives
1. Course objectives are communicated
to you at the beginning of the course
2. Course syllabus states course
objectives clearly.
3. Course requirements are consistent
with its objectives.
4. Course objectives are appropriate and
help achieve those of the Program.
5. You are fully aware of the course
requirements in advance.
6. Course objectives clearly determine
the materials that will be taught.
7. Course materials are appropriate and
help achieving course objectives.
8. Instructional activities and
4
The statements have been developed from the NCAAA and CEA standards.
173
assignments help you master the targeted
objectives.
9. Course goals and objectives are
measurable.
10. Course content and assignments are
aligned with course objectives.
Teaching Strategies
1. Teaching strategies are appropriate for
various learning styles.
2. The instructor is always well-prepared
3. The instructor provides you with
feedback to your work constantly.
4. The instructor always communicates
with you in English.
5. The instructor models activity tasks
for you.
6. The instructor promotes groupwork.
7. You often demonstrate your skills
(e.g. presentations, leading discussions).
8. The instructor monitors your
performance constantly.
9. The instructor allows you to assess
your own work.
10. You are engaged in research projects.
174
1. Student learning outcomes are clearly
communicated to you at the beginning of
the course
2. Student learning outcomes are
consistent with course objectives.
3. Student learning outcomes meet your
expectations from the course.
4. Classroom materials, activities, and
procedures help achieve the intended
learning outcomes.
5. Learning outcomes are attainable.
6. Learning outcomes cover all language
skills.
7. Learning outcomes address your
competency rather than course content.
8. Student learning outcomes assessed
through direct assessment tools (e.g.
assignment, exam, term paper, and etc.)
9. Student learning outcomes assessed
through indirect assessment tools (e.g.
surveys, self-reported gains, exit
interviews and etc.)
10. Generally, student learning outcomes
175
have emphases on real-life language uses
Student Assessment
1. Assessment techniques are appropriate
for various learning styles.
2. Assessment techniques are clearly
communicated to you at the beginning of
the course
3. These assessment techniques are used
consistently throughout the Program
courses.
4. Assessment techniques measure things
on which you expect to be measured.
5. Exams questions are written clearly.
6. Your work is assessed fairly.
7. Homework assignments reflect course
objectives?
8. Grade distribution is appropriate to
course objectives.
9. Exams and assignments are graded
and returned to you promptly.
10. You are given an opportunity to
complain about your grades?
176
APPENDIX N: INTERVIEWS5
• Interviews with Students:
a. Curriculum:
1. To what extent does the curriculum fulfill your language needs? Please explain?
2. How well does the curriculum materials reflect real-life language uses and situations?
3. How well is the curriculum structured with regard to students’ level-to-level progression?
4. How well does the curriculum prepare you for your future career as stated in course goals?
b. Teaching Strategies:
1. How effective is your instructor in varying his/her teaching strategies?
2. How often are you engaged in activities such as leading discussions, presentations etc.?
3. What is the overall quality of teaching strategies used in this Program?
4. What suggestions do you have to improve the instructor’s performance?
c. Assistance and Support:
1. To what extent are you provided with professional counseling whenever needed?
2. To what extent are you provided with academic advising during your study?
3. How often do your instructors provide you with formative feedback?
4. How often do your instructors monitor your academic performance?
d. Course Assessment Methods:
1. How accurate are course assessment methods in assessing the course-learning objective?
2. Do assessment methods measure something in which you expect to be measured? Explain?
5
All interview questions have been developed based on the NCAAA and CEA standards.
177
3. To what extent does your instructor use a variety of different assessment methods?
4. Are you given the opportunities to demonstrate your English skills well? If yes, How?
5. Do the assessment results provide you with areas of strengths and weaknesses?
6. Are you provided with a proficiency scale that shows your levels of language proficiency?
• Interviews with Teaching Faculty:
a. Learning Outcomes
1. How are student intended learning outcomes defined? Please explain it in detail?
2. When defining intended learning outcomes, do you ensure that they are aligned with the Saudi
National Qualifications Framework?
4. Are intended learning outcomes communicated to students at the beginning of the course?
5. To what extent do student learning outcomes cover all language skills?
6. How often do you review and update student learning outcomes of your courses
b. Student Assessment
1. How are course assessment methods chosen for each course?
2. Are assessment procedures communicated to students at the beginning of the course?
3. How often do you integrate alternative assessment techniques into assessment processes?
4. How often do you provide students with informative feedback to their performance?
5. What mechanisms do you use to respond to students’ complaints about their grades?
6. Do you use direct and indirect assessment tools to assess student learning outcomes?
c. Teaching Strategies:
1. What mechanisms do you use to ensure that teaching strategies are appropriate for various
learning styles?
2. To what extent do you communicate with students in English?
178
3. What strategies do you use to facilitate learning activities for students (e.g. modeling,
scaffolding etc.)?
4. What types of visual means do you use in delivering the course?
5. How often do you engage students in leading discussions, presentations, and
collaborative/cooperative learning?
6. How is the effectiveness of teaching strategies evaluated?
d. Faculty Development
1. How often does the Program offer you general training programs (e.g. teaching skills)?
2. How often do you integrate technologies into your classroom teaching activities?
3. Do you maintain a portfolio that shows evidence of the course you teach?
4. Do you think the Program administrators provide you with adequate time and opportunity for
faculty development? Please explain?
5. Are you engaged in heavy teaching and/or administrative workloads? If yes, to what extent do
they affect your research activities?
6. Generally, what are some challenges you face in teaching the Program courses?
• Interviews with Program Administrators:
a. Program Mission:
1. Has the program mission been approved by the University Senior Administration?
2. Has the Program mission been developed after obtaining consultation of stakeholders?
3. Is the Program’s mission communicated to all stakeholders?
4. To what extent do you use the Program mission decision-making processes?
5. How often do you review and update the Program mission?
b. Faculty Recruitment and Development:
179
1. How do you undertake faculty recruitment processes?
2. What mechanisms do you use to ensure that prospective faculty members have appropriate
academic credentials and possess knowledge of the needs of students?
3. What strategies do you undertake to encourage teaching faculty members to develop strategies
for improvement of their own teaching?
4. How are faculty members encouraged to participate in professional development activities?
5. What training opportunities are given for teaching faculty members?
6. Are faculty members given the opportunities to attend and participate in conferences?
c. Program Development, Evaluation, and Review:
1. Do you have a plan for the Program development and evaluation? Please explain it?
2. What mechanisms do you use to evaluate the Program (e.g. surveys, employment data)?
3. How often do you do so for the Program courses, learning outcomes, and study plan?
4. Do you invite professional people, from fields relevant to the Program areas of specialization,
to evaluate the Program courses?
5. Does the program have KPIs that include learning outcomes? How are they determined?
6. In addition to regular program evaluations, do you conduct a comprehensive reassessment of
the program performance at least once every five years?
180
APPENDIX O: RESEARCHER’S EVALUATION CHECKLIST
Program__________________________________ Date of Evaluation_________
Standard 1: Mission Goals and Objectives
College Mission:
__________________________________________________________________________________
__________________________________________________________________________________
College Goals
College Objectives
181
Program Mission
__________________________________________________________________________________
__________________________________________________________________________________
Program Goals
Program Objectives
182
RESEARCHER’S EVALUATION CHECKLIST CONT.
Statements6 Yes No To Some Extent Comment
Program Mission
1. Does the program have a written
statement showing its mission and goals?
2. Is the program’s mission consistent
with those of the college and University?
3. Is the Program’s mission clearly
defined in a form that summarizes the
program’s ultimate goals?
4. Is the mission available for
prospective students and employees (e.g.
on the program website)?
5. Does the mission state the needs of the
community to which the Program is
affiliated?
6. Does the program’s mission place
emphasis on research?
Program Goals
1. Are the Program’s goals clearly and
coherently stated in measurable terms?
2. Are the goals related to the Program’s
6
The statements have been developed from the NCAAA and CEA standards.
183
mission?
3. Are the Program’s goals aligned with
the curriculum?
4. Do the goals specify clearly the
Program’s intended outcomes?
5. Are the Program’s goals achievable
within the level of available resources?
Program Objectives
1. Are the Program’s objectives very
specific?
2. Are the Program’s objectives logically
related?
3. Are the objectives consistent with the
Program’s goals?
4. Are the Program’s objectives aligned
with the curriculum?
5. Are the Program’s objectives
assessable?
Statements Yes No To Some Extent Comment
Curriculum
1. Is the curriculum consistent with the
program’s mission?
2. Is the curriculum structured in a
way that ensures a logical progression
184
from one level to the next?
3. Does the curriculum have consistent
objectives of courses and levels?
4. Is the curriculum appropriate for the
needs of students?
5. Are course objectives consistent
with those of the program?
6. Do course syllabi have detailed
information about course
requirements, attendance policies,
topics to be covered, and assessment
techniques?
7. Are courses appropriate for meeting
the program’s needs and purposes?
8. Are course goals and objectives
measurable
9. Are course materials appropriate for
achieving course objectives
10. Are course content and
assignments aligned with its
objectives?
Teaching Strategies
1. Are teaching strategies included into
185
course specifications?
2. Are teaching strategies aligned with
the Program’s learning outcomes?
3. Are students engaged in research
activities?
4. Does the instructor communicate
with students in English?
5. Does the instructor provide students
with feedback to their work?
6. Are teaching strategies appropriate
for various learning styles?
7. Does the instructor models the tasks
for students?
8. Does the instructor promote
groupwork?
9. Does the instructor monitor
students’ performance?
10. Does the instructor use visual
representations of the lessons?
1. Are student learning outcomes
written?
186
measurable?
attainable?
visible (e.g. in textbooks)?
4. Do student learning outcomes
reflect various levels of students’
cognitive skills?
affective skills?
interpersonal skills and responsibility?
Communication, Information
Technology, Numerical skills?
8. Are student learning outcomes a
result of learning?
9. Are course goals, objectives, and
learning outcomes aligned with each
other?
187
Assessment Methods
1. Are various types of assessment
methods used?
2. Do course specifications include
direct and indirect assessment tools?
3. Are course grades distributed
logically in course specifications?
4. Do assessment practices cover all
language skills?
5. Does the instructor allow students to
assess their own work?
6. Are students engaged in papers,
projects, and presentations tasks?
188
APPENDIX P: COLLEGE MISSION
The college aims at preparing the prospective teacher who is religiously committed; the one is
attached strongly to his homeland. This teacher should be a good model for his students in his
work and behavior. He should be at a high level of professionalism. Learning instinct and love
for the career should be part and parcel of him. He should be completely experienced and fully
aware of his role in facilitating learning approaches. He should be continuously developing in his
field of specialty as well as his teaching styles. He should have the traits of the strong leader who
has the ability to convince others and to prove that his opinion is correct. He should attain the
capacity of making a decision and of taking on responsibility. He should be able to plan well and
to put into consideration individual differences among students. He should be a good thinker
possessing all types of thinking skills. He should be able to develop these skills in his students.
He should have the traits of a social pioneer who has the ability to communicate effectively with
his society and to cooperate in solving its problems. He should be ever renewing in how to deal
with modern technology and how to function it properly in all instructional settings. He should
be a guide taking charge of directing the instructional process in fulfilling its targets and solving
its problems.
189
APPENDIX Q: MALE DEPARTMENT OBJECTIVES
Goals and Objectives Major Strategies Measurable Indicators

§ Providing the undergraduate and § Class lectures § Written/Oral feedback
graduate students with well-rounded § Class discussion for assignments
quality education and rigorous training § Assigning presentations § Rubric
in the English language, translation, § Assigning written works § Quizzes/ Exams
and literature and reading texts
§ Providing students with the necessary
linguistic, translation, interpreting, and
literary skills that can enrich their
intellectual, cultural, and artistic
visions
§ Developing the students' English § Class lectures § Written/Oral feedback
language, thinking, and research skills. § Class discussion for assignments
§ Preparing efficient graduates who can § Assigning presentations § Rubric

competently meet the needs of both § Assigning written works § Quizzes/ Exams
public and private sectors in the field and reading texts
of English language, linguistics,
translation, and literature.
§ Conducting community service and
continuing education programs
through the performance of various
forums, workshops, and training
sessions in the field of English
language, linguistics, translation, and
literature.
§ Emphasizing scholarly research and § Class lectures § Written/Oral feedback
community outreach and service, and § Class discussion for assignments
promoting intercultural understanding § Assigning presentations § Rubric
and exchange. § Assigning written works § Quizzes/ Exams
§ Taking an active part in conducting and reading texts
scholarly researches, and relating such
researches to the human and
developmental needs of the Kingdom
of Saudi.
190
REFERENCES
Adam, S. (2004). Using learning outcomes. In Report for United Kingdom Bologna Seminar (pp.
1-2).
Ahlgren Reddy, A., & Harper, M. (2013). Mathematics placement at the University of Illinois.
PRIMUS, 23(8), 683-702.
Alamri, M. (2011). Higher education in Saudi Arabia. Journal of Higher Education Theory and
Practice, 11(4), 88.
Al Asmari, A. (2013). Practices and prospects of learner autonomy: Teachers' perceptions.
English Language Teaching, 6(3), 1.
Alderson, J. C., & Scott, M. (1992). Insiders, outsiders and participatory evaluation. Evaluating
second language education, 25-58.
Alderson, J. C., Clapham, C., & Wall, D. (1995). Language test construction and evaluation.
Cambridge: Cambridge University Press.
Alderson, J. C., Figueras, N., Kuijper, H., Nold, G., Takala, S., & Tardieu, C. (2004). The
development of specifications for item development and classification within The
Common European Framework of Reference for Languages: Learning, Teaching,
Assessment: Reading and Listening: Final report of The Dutch CEF Construct Project.
Allen, M. J. (2004). Assessing academic programs. Boston: Anker Publishing.
Andrade, H., & Du, Y. (2007). Student responses to criteria-referenced self-assessment.
Assessment and Evaluation in Higher Education, 32, 159–181.
Andrade, H. & Valtcheva, A. (2009). Promoting learning and achievement through self-
assessment. Theory into Practice, 48(1), 12-19.
Al Murshidi, G. (2014). Participation challenges of Emirati and Saudi students at US
191
universities. International Journal of Research Studies in Language Learning, 3(5).
Alqahtani, A. (2014). Evaluation of King Abdullah Scholarship Program. Journal of Education
and Practice, 5(15), 33-41.
American Psychiatric Association. (2000). Diagnostic and statistical manual of mental
disorders (4th ed.,Text Revision). Washington, DC: Author.
Bachman, L. F., & Palmer, A. S. (1989). The construct validation of self-ratings of
communicative language ability. Language Testing, 6(1), 14-29.
Bachman, L. (1990). Fundamental considerations in language testing. Oxford: Oxford
University Press.
Bachman, L. F., Davidson, F., & Ryan, K. (1995). An investigation into the comparability of
two tests of English as a foreign language (Vol. 1). Cambridge University Press.
Bachman, L. & Palmer, A. (1996). Language testing in practice. New York: Oxford
University Press.
Bachman, L. F. (2000). Learner-directed assessment in ESL. In G. Ekbatani & H. Pierson
(Eds.), Learner-directed assessment in ESL (pp. ix-xii). New Jersey: Lawrence Erlbaum
Associates, Inc.
Bachman, L. F. (2005). Building and supporting a case for test use. Language Assessment
Quarterly: An International Journal, 2(1), 1-34.
Bailey, K. M. (2009). 37 Issues in Language Teacher Evaluation. The handbook of language
teaching, 706.
Banegas, M. R. (2013). Placement of English as a Second Language (ESL) Student Learners in
Higher Education (Doctoral dissertation). Retrieved from ProQuest Dissertations
Publishing, http://search.proquest.com/docview/1369866651.
192
Barker, R. (2003). The social work dictionary. Washington, DC: NASW Press.
Benson, P. (1995) 'A critical view of learner training'. Learning: JALT Learner
Development N-SIG Forum, 2:2, 2-6.
Benson (2012) in Burns, A., & Richards, J. C. (Eds.). (2012). The Cambridge guide to
pedagogy and practice in second language teaching. New York: Cambridge University
Press.
Beretta, A. (1986). Program‐Fair Language Teaching Evaluation. TESOL Quarterly, 20(3), 431-
444.
Black, P., & D. Wiliam. (2009). Developing the theory of formative assessment. Educational
Assessment, Evaluation and Accountability 21.1: 5–31.
Blanche, P., & Merino, B. J. (1989). Self‐Assessment of Foreign‐Language Skills:
Implications for Teachers and Researchers. Language Learning, 39(3), 313-338.
Bloom, B. S., Madaus, G. F., & Hastings, J. T. (1981). Evaluation to improve learning. New
York: McGraw-Hill.
Bock, R. D. (1997). A brief history of item theory response. Educational Measurement: Issues
and Practice, 16(4), 21-33.
Boud, D. (1991). Implementing student self-assessment. Campbelltown: The Higher Education
Research and Development Society of Australasia (HERDSA).
Bourke, B. (2014). Positionality: Reflecting on the research process. The Qualitative Report,
19(33), 1-9.
Brantmeier, C. (2006). Advanced L2 learners and reading placement: Self-assessment, computer-
based testing, and subsequent performance. System, 34(1), 15-35.
Brindley, G. (1998). Outcomes-based assessment and reporting in language learning
193
programs: A review of the issues. Language Testing, 15(1), 45-85.
Brown, D. (1989). Language program evaluation: A synthesis of existing possibilities. In R. K.
Johnson (ed.) The second language curriculum. Cambridge, UK: Cambridge University
Press, 222-241.
Brown, J. D. (1989). Improving ESL placement test using two perspectives. TESOL
Quarterly, 23(1), 65–83.
Brown, D. (1995). Language Program Evaluation: Decisions, Problems and Solutions. Annual
Review of Applied Linguistics, 15(1), 227-248.
Brown, J. D. (1996). Testing in language programs. New Jersey: Prentice Hall Regents.
Brown, D. (1997). Computers in language testing: present research and some future
directions. Language Learning and Technology, 1(1), 44–59.
Brown, J. D., & Hudson, T. (2002). Criterion-referenced language testing. UK: Cambridge
University Press.
Brown, D. (2007). Teaching by principles: An integrated approach to language pedagogy.
White Plains, NY: Pearson Education.
Brown, D. (2010). Language Assessment: Principle and Classroom Practices. New
York: Pearson Education.
Butler, Y. G., & Lee, J. (2010). The effects of self-assessment among young learners of
English. Language Testing, 27(1), 5-31.
Byrnes, H. (1991). Issues in Foreign Language Program Articulation. Critical Issues in
Foreign Language Instruction. Boston: Heinle & Heinle, pp. 6–28.
Cargan, L. (2007). Doing social research. Maryland: Rowman & Littlefield Publishers.
Carmines, E. G., & Zeller, R. A. (1979). Reliability and validity assessment. Beverly Hills, CA:
194
Sage publications.
CESL (2014). About Center for English a Second Language. Retrieved March 17th, 2016,
from http://www.cesl.arizona.edu/content/about-cesl
Chapelle, C. A. (1999). Validity in language assessment. Annual Review of Applied Linguistics,
19, 254-272.
Chapelle, C. (2001). Computer applications in second language acquisition. Cambridge, UK:
Cambridge University Press.
Chapelle, C. A., Jamieson, J., & Hegelheimer, V. (2003). Validation of a web-based ESL test.
Language Testing, 20(4), 409-439.
Chapelle, C. A., Chapelle, C. A., Enright, M. K., & Jamieson, J. M. (Eds.). (2011). Building a
validity argument for the Test of English as a Foreign LanguageTM. New York:
Routledge.
Clapham, C. (2000). Assessment and testing. Annual Review of Applied Linguistics, 20, 147-
161.
Conway, T., Mackay, S., & Yorke, D. (1994). Strategic planning in higher education: Who are
the customers. International journal of educational management, 8(6), 29-36.
Coppel, D. B. (2011). Use of neuropsychological evaluations. Physical medicine and
rehabilitation clinics of North America, 22(4), 653-664.
Council for Higher Education Accreditation (CHEA). (2002), The Fundamentals of
Accreditation, CHEA, Washington DC, USA.
Council of Europe (1996b). Modern languages: Learning, teaching, assessment. A Common
European Framework of Reference. Draft 2 of a Framework proposal. Strasbourg:
Council of Europe.
195
Council of Europe (2001). Common European Framework of Reference for Languages:
learning, teaching, assessment. Cambridge: Cambridge University Press.
Council of Europe (2015). Education and Languages, Language Policy. Retrieved March
23rd, 2016, from http://www.coe.int/t/dg4/linguistic/cadre1_en.asp
Cronbach, L. (1963). Course improvements through evaluation. The Teachers College Record,
64(8), 672-672.
Cumming, A. (2009). Language assessment in education: Tests, curricula, and teaching. Annual
Review of Applied Linguistics, 29, 90-100.
Cumming, A. (2008). Assessing oral and literate abilities. In E. Shohamy Encyclopedia of
language and education (pp. 2146-2163). US: Springer.
Darandari, E. Z., Al‐Qahtani, S. A., Allen, I. D., Al‐Yafi, W. A., Al‐Sudairi, A. A., &
Catapang, J. (2009). The quality assurance system for post‐secondary education in Saudi
Arabia: A comprehensive, developmental and unified approach. Quality in Higher
Education, 15(1), 39-50.
Dornyei, Z. (2001). Teaching and researching motivation. London: Pearson Education Ltd.
Development Society of Australasia (HERDSA).
Douglas, D. (2014). Understanding language testing. London: Hodder-Arnold.
Eaton, J. S. (2006). An Overview of US Accreditation. Washington DC. Council for Higher
Education Accreditation.
Eckes, T., & Grotjahn, R. (2006). A closer look at the construct validity of C-tests. Language
Testing, 23(3), 290-325.
Educational Testing Service. (2010). Linking TOEFL iBT scores to IELTS scores: A
196
research report. Princeton, NJ: ETS. Retrieved February 21st, 2016, from
https://www.ets.org/s/toefl/pdf/linking_toefl_ibt_scores_to_ielts_scores.pdf
Edwards, J. R. (2003). Construct validation in organizational behavior research. Chapel Hill:
University of North Carolina.
Ekbatani, G., & Pierson, H. D. (Eds.). (2000). Learner-directed assessment in ESL. Mahwah, NJ:
Erlbaum.
Ekbatani, G. (2000). Moving toward learner-directed assessment. In G. Ekbatani & H. Pierson
(Eds.), Learner-directed assessment in ESL. (pp. 1-11). Mahwah, NJ: Erlbaum.
Ellis, R. (1993). Quality Assurance for University Teaching. Buckingham: Open University
Press.
Elton, L. 1999. Facilitating change through self-assessment. Paper presented at the national
conference teaching and learning for the new millennium: Facilitating change through
self-assessment, September 9-10, in University of Bristol, Bristol, UK.
Fox, J. D. (2009). Moderating top-down policy impact and supporting EAP curricular renewal:
Exploring the potential of diagnostic assessment. Journal of English for Academic
Purposes, 8(1), 26-42.
Fulcher, G. (1997). An English language placement test: issues in reliability and validity.
Fulcher, G. (1999). Computerizing an English language placement test. English Language
Teaching Journal 53, 4, 289 - 299.
Fulcher, G., & Davidson, F. (2007). Language testing and assessment. London, NY:
Routledge.
Fulcher, G. (2013). Practical language testing. London: Hodder Education.
197
Garrison, D. R. (1997). Self-directed learning: Toward a comprehensive model. Adult
Education Quarterly, 48(1), 18-33.
Gass, S. M., & Varonis, E. M. (1994). Input, interaction, and second language production.
Studies in second language acquisition, 16(03), 283-302.
Gitlin, A., & Smyth, J. (1989). Teacher evaluation: Educational alternatives. Lewes: Falmer
Press.
Gredler, M. E. (1996). Program evaluation. NJ: Prentice Hall.
Green, A. Weir, J. (2004). Can placement tests inform instructional decisions? Language
Testing, 22(4), 467-494.
Green, C. (2005). Integrating extensive reading in the task-based curriculum. ELT Journal,
59(4), 306-311.
Halliday, J. (1994). Quality in education: Meaning and prospects. Educational Philosophy and
Theory, 26(2), 33-50.
Harden, R. M. (2002). Learning outcomes and instructional objectives: Is there a difference?
Medical teacher, 24(2), 151-155.
Harris, M. (1997). Self-assessment of language learning in formal settings. ELT journal,
51(1), 12-20.
Holland, P. W., & Dorans, N. J. (2006). Linking and equating. Educational measurement, 4,
187-220.
Holec, H. (1981). Autonomy and foreign language learning. Oxford: Pergamon Press.
Hughes, A., Weir C. and Porter, D. (1995). The Global Placement Test. Reading: Centre for
Applied Language Studies, University of Reading.
Hughes, A. (2003). Testing for language teachers, (2nd edition). Cambridge: Cambridge
198
University Press.
Hughes, K. L., & Scott-Clayton, J. (2011). Assessing developmental assessment in
community colleges. Community College Review, 39(4), 327-351.
Huhta, A., Luoma, S., Oscarson, M., Sajavaara, K., & Teasdale, A. (2002). DIALANG, A
diagnostic language assessment system for adult learners. Common European
framework of reference for languages: Learning, teaching, assessment. Case studies.
Institute of International Education (2014). Intensive English Programs: Leading Places of
Origin. Retrieved March 16th, 2016, from http://www.iie.org/Research-and-
Publications/Open-Doors/Data/Intensive-English-Programs/Leading-Places-of-
Origin/2013-14
Jang, E. E. (2008). A Framework for Cognitive Diagnostic Assessment. In C. A. Chapelle, Y.-R.
Chung, & J. Xu (Eds.), Towards Adaptive CALL: Natural Language Processing for
Diagnostic Language Assessment (pp. 117-131). Amer, IA: Iowa Scaner University.
Javid, C. Z., Al-thubaiti, T. S., & Uthman, A. (2013). Effects of English language proficiency
on the choice of language learning strategies by Saudi English-major undergraduates.
English Language Teaching, 6(1), 35.
Johnson, D. (2015). Saudi students and IEP teachers: converging and diverging perspectives
(Doctoral dissertation, University of Illinois at Urbana-Champaign).
Jones, N. (2002). Relating the ALTE Framework to the Common European Framework of
Reference, In: Council of Europe (Eds). Case Studies on the use of the Common
European Framework of Reference (pp. 167-183). Cambridge: Cambridge University
Press.
Kahn, A. B., Butler, F. A., Weigle, S. C., & Sato, E. Y. (1994). Adult ESL Placement Procedures
199
in California: A Summary of Survey Results. Adult ESL Assessment Project.
Kane, M. T. (2006). Validation. Educational measurement, 4(2), 17-64.
Kaplan, R. B. (2010). Whence applied linguistics: The twentieth century. Oxford handbook of
applied linguistics, 3-33. Kiely, R., & Rea-Dickens, P. (2005). Program evaluation in
language education. Palgrave: Macmillan.
Kiely, R. (2001). Classroom evaluation-values, interests and teacher development. Language
teaching research, 5(3), 241-261.
Kirsch, I., Jamieson, J., Taylor, C., & Eignor, D. (1998). Computer familiarity among TOEFL
examinees. ETS Research Report Series, 1998(1), 1-23.
Kumaravadivelu, B. (2006). TESOL Methods: Changing Tracks, Challenging Trends.
TESOL Quarterly, 40(1), 59-81.
Kunnan, A. (2000). Fairness and justice for all. Fairness and validation in language
assessment (pp. 1-13). Cambridge, MA: Cambridge University Press.
Kunnan, A. J. (2008). Large-scale language assessments. In E. Shohamy Encyclopedia of
Kunnan, A. J. (1998). Approaches to validation in language assessment. In A. J. Kunnan (Ed.),
Validation in language assessment (pp. 1–16). Mahwah, NJ: Lawrence Erlbaum
Associates.
Lantolf, J. P., & Poehner, M. E. (2008). Dynamic assessment. In Encyclopedia of language and
education (pp. 2406-2417). Springer US.
Larsen-Freeman, D. (1986). Techniques and principles in language teaching. Oxford: Oxford
University Press.
Lee, Y. J., & Greene, J. (2007). The Predictive Validity of an ESL Placement Test A Mixed
200
Methods Approach. Journal of Mixed Methods Research, 1(4), 366-389.
Lewis, J. (1990) 'Self-assessment in the classroom: a case study'. In G. Brindley (ed.) The
Second Language Curriculum in Action. Sydney: National Centre for English Language
Teaching and Research.
Little, D., Lazenby Simpson, B., & O’Connor, F. (2002). Meeting the English language needs
of refugees in Ireland. Common European Framework of Reference for Languages:
Learning, Teaching, Assessment. Case Studies, Strasbourg: Council of Europe, 53-67.
Little, D. (2006). The Common European Framework of Reference for Languages: Content,
purpose, origin, reception and impact. Language Teaching, 39(03), 167-190.
Long, M. H. (1981). Input, interaction, and second‐language acquisition. Annals of the New
York Academy of Sciences, 379(1), 259-278.
Lynch, B. K. (1996). Language program evaluation: Theory and practice. Cambridge:
MacKay, R. (1998). Program evaluation and quality control. TESL Canada Journal, 5(2), 33-42.
McDonald, B., & Boud, D. (2003). The impact of self-assessment on achievement: the effects
of self-assessment training on performance in external examinations. Assessment in
Education: Principles, Policy & Practice, 10(2), 209-220.
McNamara, M., & Deane, D. (1995). Self-assessment activities toward autonomy in language
learning. TESOL Journal, 5, 18-23.
McNamara, T. F. (1996). Measuring second language performance. New York: Longman.
McNamara, F. & Roever, C. (2006). Language Testing: The Social Dimension. Oxford:
Blackwell.
Menges, R. J., & Weimer, M. (1996). Teaching on Solid Ground: Using Scholarship to Improve
201
Practice. San Francisco: Jossey-Bass Inc.
Mercado, L., A. (2012). Guarantor of quality assurance. In M. A. Christison and F. L. Stoller
(Eds.) A handbook for language program administrators (2nd Edition), (pp. 117-136).
Miami, FL: Alta Book Center Publishers.
Messick, S. (1989). Validity. In Linn, R. L. (ed.), Educational Measurement. New York:
American Council on Education/Macmillan, 13–103.
Messick, S. (1995). Validity of psychological assessment: validation of inferences from persons'
responses and performances as scientific inquiry into score meaning. American
psychologist, 50(9), 741.
Messick, S. (1996). Validity and washback in language testing. Language Testing. 13, 241-
256.
Ministry of Economy and Planning (2015). The Saudi Ninth Development Plan. Retrieved
March 14th, 2016, from
http://services.mep.gov.sa/themes/GoldenCarpet/index.jsp#1457986308681
Mitchell, C. B. & Vidal, K. E. (2001). Weighing the Ways of the Flow: Twentieth Century
Instruction. Modern Language Journal, 85(1), 26-38.
MOHE. (2015). Saudi Higher Education Institutions. Retrieved March 8th, 2016, from
http://www.moe.gov.sa/ar/Pages/default.aspx
MOHE. (2014). The History of Study Abroad Scholarships in the Kingdom. Retrieved March
15th, 2016, from https://www.mohe.gov.sa/en/Ministry/General-administration-for-
Public-relations/BooksList/book2eng.pdf
Mohr, L. B. (1995). Impact analysis for program evaluation. Thousand Oaks, CA: Sage
Publications, Inc.
202
Morphew, C. C., & Hartley, M. (2006). Mission statements: A thematic analysis of rhetoric
across institutional type. The Journal of Higher Education, 77(3), 456-471.
NCAAA. (2013). Standards for Quality Assurance and Accreditation of Higher Education
Institutions. Retrieved February 14, 2015, from
http://www.ncaaa.org.sa/en/Releases/Pages/Handbooks.aspx
NCAAA. (2014). Mission, Vision & Values. Retrieved March 08, 2016, from
http://www.ncaaa.org.sa/en/AboutUs/Pages/Vision.aspx
Norris, J.M., Davis, J.M., Sinicrope, C. & Watanabe, Y. (Eds.). (2009) Toward Useful Program
Evaluation in College Foreign Language Education. Honolulu, HI: National Foreign
Language Resource Center. (TUPE) (Case studies)
Norris, J. M. (2009). Understanding and improving language education through program
evaluation: Introduction to the special issue. Language Teaching Research, 13(1), 7-13.
Norris, J. M. (2016). Language Program Evaluation. The Modern Language Journal, 100(S1),
169-189.
North, B. (2007). The CEFR illustrative descriptor scales. The Modern Language Journal,
91(4), 656-659.
Nunan, D. (1988). The learner-centered curriculum: A study in second language teaching.
Cambridge, England: Cambridge University Press.
Nunan, D. (1989). Designing tasks for the communicative classroom. Cambridge, England:
Oscarson, M. (1989). Self-assessment of language proficiency: Rationale and applications.
Language Testing,6(1), 1-13.
Oscarson M (1997). Self-assessment of foreign and second language proficiency. In C.
203
Clapham and D. Corson (Eds), Encyclopedia of language and education, Volume 7:
Language testing and assessment (pp. 175–187). Dordrecht, Netherlands: Kluwer
Academic.
Pennington, M. C., and Hoekje, B. (2010). Language program leadership in a changing
world: An ecological model. London: Emerald.
Posavac, E. (2015). Program evaluation: Methods and case studies. New York: Pearson Prentice
Hall.
Rea-Dickins, P. (1994). Evaluation and English language teaching. Language Teaching, 27(02),
71-91.
Reed, D. J., & Stansfield, C. W. (2004). Using the Modern Language Aptitude Test to Identify a
Foreign Language Learning Disability: Is it Ethical? Language Assessment Quarterly,
1(2-3), 161-176.
Reid, J. M. (1995). Learning Styles in the ESL/EFL Classroom. Florence, KY: Heinle & Heinle.
Roever, C. (2001). Web-based language testing. Language Learning & Technology, 5(2), 84-94.
Ross, S. (1998). Self-assessment in second language testing: A meta-analysis and analysis of
experiential factors. Language testing, 15(1), 1-20.
Royse, D., Thyer, B., & Padgett, D. (2009). Program evaluation: An introduction. Belmont, CA:
Brooks-Cole.
Saba, M. S. (2014). Writing in a New Environment: Saudi ESL Students Learning Academic
Writing. (Doctoral dissertation). Retrieved from
https://vtechworks.lib.vt.edu/bitstream/handle/10919/54012/Saba_MS_D_2014.pdf?sequen
ce=1&isAllowed=y
SACM (2013). Overview on the ESL Department. Retrieved March 15th, 2016, from
204
http://www.sacm-usa.gov.sa/Departments/ESL/about.aspx
SACM (2013). Recommended ESL Schools in the U.S. Retrieved March 18th, 2016, from
http://esllist.sacm.org/
Sallis, E. (2002). Total quality management in education. London: Kogan Page.
Scriven, M. (1967). The Methodology of Evaluation. Washington, DC: American Educational
Research Association.
Scriven, M. (1991). Evaluation thesaurus. Newbury Park, CA: Sage.
Sekely, A. (2014). Assessing the Validity of the Trauma Symptom Inventory on Military Patients
with PTSD (Doctoral dissertation). Retrieved from http://militarytbi.org/wp-
content/uploads/2014/09/Thesis05.04.2014.pdf
Shawer, S. F. (2013). Accreditation and standards-driven program evaluation: implications for
program quality assurance and stakeholder professional development. Quality &
Quantity, 47(5), 2883-2913.
Shohamy, E., Gordon, C., & Kraemer, R. (1992). The effect of raters’ background and training
on the reliability of direct writing tests. The Modern Language Journal, 76 (1), 27-33.
Smith, L., & Abouammoh, A. (2013). Higher Education in Saudi Arabia. The Netherlands:
Springer.
Spolsky, B. (2008). Language assessment in historical and future perspective. In Encyclopedia of
Tannenbaum, R. J., & Wylie, E. C. (2008). Linking English-Language Test Scores Onto the
Common European Framework of Reference (CEFR): An Application of Standard-
Setting Methodology. ETS Research Report Series, 2008(1), i-75.
Taras, M. (2001). The use of tutor feedback and student self-assessment in summative
205
assessment tasks: Toward transparency for students and for tutors. Assessment and
Evaluation in Higher Education, 26, 605-614.
Taras, M. (2008). Issues of power and equity in two models of self-assessment. Teaching in
Higher Education, 13(1), 81-92.
Taras, M. (2010). Student self-assessment: processes and consequences. Teaching in Higher
Education, 15(2), 199-209.
Taylor, C. Albasri, W. (2014). The Impact of Saudi Arabia King Abdullah’s Scholarship
Program in the US. Open Journal of Social Sciences, 2(10), 109.
TESOL (2010). Position Statement on the Acquisition of Academic Proficiency in English at the
Postsecondary Level. Retrieved from http://www.tesol.org/docs/pdf/13489.pdf?sfvrsn=0
March 1st 2016.
Van Damme, D. (2004). Standards and Indicators in Institutional and Program Accreditation in
Higher Education: A Conceptual Framework and a Proposal. Studies on Higher
Education, 127-159.
van Teijlingen, E., & Hundley, V. (2001). The importance of pilot studies. Social research
update, (35), 1-4.
Wall, D., & Alderson, J. C. (1993). Examining washback: the Sri Lankan impact study.
Wall, D., Clapham, C., & Alderson, J. C. (1994). Evaluating a placement test. Language
Testing, 11(3), 321-344.
Wesche, M. B. (1983). Communicative Testing in a Second Language*. The Modern Language
Journal, 67(1), 41-55.
Weigle, S. C. (1994b). Effects of training on raters of ESL compositions. Language Testing,
206
11(2), 197-223.
Weir, C. J. (2005). Language testing and validation. England: Palgrave Macmillan.
Weir, C. J. (2005). Limitations of the Common European Framework for developing
comparable examinations and tests. Language Testing, 22(3), 281-300.
Wenden, A. L. (1995). Learner training in context: A knowledge-based approach. System,
23(2), 183-194.
Westerheijden, D. F., Stensaker, B., & Rosa, M. J. (2007). Quality assurance in higher
education: Trends in regulation, translation and transformation (Vol. 20). Dordrecht:
Springer Science & Business Media.
Wiggins, G. (1993). Assessment: Authenticity, context, and validity. Phi Delta Kappan,
75(3), 200-08.
Wolochuk, A. (2009). Adult English learners' self-assessment of second language
proficiency: Contexts and conditions (Doctoral dissertation). Retrieved from ProQuest
Dissertations & Theses - Gradworks http://gateway.proquest.com/openurl?url_ver=Z39.88-
2004&res_dat=xri:pqdiss&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&rft_dat=xri:pqdi
ss:3346271
207

Azu Etd 14700 Sip1 M

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Azu Etd 14700 Sip1 M

Uploaded by

Copyright:

Available Formats

Testing, Assessment, and Evaluation in Language Programs

Item Type text; Electronic Dissertation

Authors Alobaid, Adnan Othman

Publisher The University of Arizona.

Rights Copyright © is held by the author. Digital access to this material

Download date 09/03/2021 02:50:03

Link to Item http://hdl.handle.net/10150/613422

Copyright © Adnan Alobaid 2016

A Dissertation Submitted to the Faculty of the

GRADUATE INTERDISCIPLINARY PROGRAMS

IN SECOND LANGUAGE ACQUISITION AND TEACHING

In Partial Fulfillment of the Requirements

For the Degree of

In the Graduate College

THE UNIVERSITY OF ARIZONA

____________________________________________________________ Date: 04/14/2016

____________________________________________________________ Date: 04/14/2016

____________________________________________________________ Date: 04/14/2016

____________________________________________________________ Date: 04/14/2016

____________________________________________________________ Date: 04/14/2016

be made available to borrowers under rules of the Library.

extended quotation from or reproduction of this manuscript in whole or in part may be

granted by the copyright holder.

SIGNED: Adnan Alobaid

writing this dissertation.

Moreover, much gratitude goes to my committee members: Drs. Beatrice Dupuy,

Similarly, I would like to express my appreciation to Dr. Muhammad Al-Fuhaid, who

LIST OF FIGURES ...................................................................................................................... 11

LIST OF TABLES ........................................................................................................................ 12

DISSERTATION THEME ........................................................................................................... 14

CHAPTER 1 INTRODUCTION .................................................................................................. 15

CHAPTER 2 SAUDI STUDENT PLACEMENT INTO ESL PROGRAM LEVELS: ISSUES

BEYOND TEST CRITERIA ........................................................................................................ 19

ESL Placement Test Item Formats .............................................................................................. 24

ESL Placement Test Delivery Formats........................................................................................ 25

Test-takers Intentionally Failing Exams: Myth or Reality? ........................................................ 26

Student Intentions to Purposefully Underperform on a Language Test ...................................... 27

Saudi Students Studying Abroad ................................................................................................. 30

The Saudi Arabian Cultural Mission (SACM) ............................................................................ 30

Anecdotal Evidence and Pilot Study ........................................................................................... 31

Data Collection Background on Site..................................................................................... 32

Positionality of the Researcher .................................................................................................... 34

Research Questions ...................................................................................................................... 36

Research Instruments ................................................................................................................... 36

Purposefully Underperforming on ESL Placement Tests: A Major or Minor Issue?.................. 48

DISCUSSION AND CONCLUSION....................................................................................... 49

Pedagogical Implications ............................................................................................................. 54

Implications for Scholarship Agencies ........................................................................................ 55

Implications for Both Scholarship Agencies and ESL Administrators ....................................... 56

CHAPTER 3 INTEGRATING SELF-ASSESSMENT TECHNIQUES INTO L2

CLASSROOM ASSESSMENT PROCEDURES......................................................................... 58

Benefits and Limitations of Self-assessment ............................................................................... 61

LITERATURE REVIEW ......................................................................................................... 62

Impact of Classical Teaching Methods on Self-assessment ........................................................ 62

Impact of Communicative Language Teaching on Self-assessment ........................................... 62

Accuracy of Self-assessment ....................................................................................................... 64

How Can the Accuracy of Self-assessment Be Achieved? ......................................................... 66

Common European Framework of Reference (CEFR) ............................................................... 67

Validity of the CEFR Self-assessment Rubric ............................................................................ 68

Significance of the Study ............................................................................................................. 69

Research Questions ...................................................................................................................... 69

Background on Site ..................................................................................................................... 71

Discussion and Conclusion .......................................................................................................... 83