Download as pdf or txt
Download as pdf or txt
You are on page 1of 208

Testing, Assessment, and Evaluation in Language Programs

Item Type text; Electronic Dissertation

Authors Alobaid, Adnan Othman

Publisher The University of Arizona.

Rights Copyright © is held by the author. Digital access to this material


is made possible by the University Libraries, University of Arizona.
Further transmission, reproduction or presentation (such as
public display or performance) of protected items is prohibited
except with permission of the author.

Download date 09/03/2021 02:50:03

Link to Item http://hdl.handle.net/10150/613422


TESTING, ASSESSMENT, AND EVALUATION IN LANGUAGE PROGRAMS

by

Adnan Alobaid

__________________________

Copyright © Adnan Alobaid 2016

A Dissertation Submitted to the Faculty of the

GRADUATE INTERDISCIPLINARY PROGRAMS

IN SECOND LANGUAGE ACQUISITION AND TEACHING

In Partial Fulfillment of the Requirements

For the Degree of

DOCTOR OF PHILOSOPHY

In the Graduate College

THE UNIVERSITY OF ARIZONA

2016
THE UNIVERSITY OF ARIZONA
GRADUATE COLLEGE

As members of the Dissertation Committee, we certify that we have read the dissertation
prepared by Adnan Alobaid, titled Testing, Assessment, and Evaluation in Language
Programs and recommend that it be accepted as fulfilling the dissertation requirement for the
Degree of Doctor of Philosophy.

____________________________________________________________ Date: 04/14/2016


Suzanne Panferov

____________________________________________________________ Date: 04/14/2016


Beatrice Dupuy

____________________________________________________________ Date: 04/14/2016


Peter Ecke

____________________________________________________________ Date: 04/14/2016


Edmond White

Final approval and acceptance of this dissertation is contingent upon the candidate’s
submission of the final copies of the dissertation to the Graduate College.

I hereby certify that I have read this dissertation prepared under my direction and recommend
that it be accepted as fulfilling the dissertation requirement.

____________________________________________________________ Date: 04/14/2016


Dissertation Director: Suzanne Panferov

2
STATEMENT BY AUTHOR

This dissertation has been submitted in partial fulfillment of the requirements for an

advanced degree at the University of Arizona and is deposited in the University Library to

be made available to borrowers under rules of the Library.

Brief quotations from this dissertation are allowable without special permission, provided

that an accurate acknowledgement of the source is made. Requests for permission for

extended quotation from or reproduction of this manuscript in whole or in part may be

granted by the copyright holder.

SIGNED: Adnan Alobaid

3
ACKNOWLEDGEMENTS

This dissertation could not have been accomplished without the assistance and

support of many individuals. First and foremost, I would like to express my deep and sincere

gratitude to my supervisor Suzanne Panferov, who has been more than generous with her

assistance, patience, and expertise in addition to guiding me through the entire process of

writing this dissertation.

Moreover, much gratitude goes to my committee members: Drs. Beatrice Dupuy,

Peter Ecke, and Edmond White who have been substantially supportive, encouraging, and a

great source of inspiration. I also would like to thank Dr. Nicholas Ferdinandt with whom I

took program evaluation class. Dr. Ferdinandt has been very supportive and helpful in

providing me with all necessary advice, references, and information pertaining to the program

evaluation industry.

Similarly, I would like to express my appreciation to Dr. Muhammad Al-Fuhaid, who

has been very instrumental in providing me with thoughtful insights in addition to discussing

crucial issues relevant to my dissertation topics. Furthermore, I would like to thank Ms.

Amani Suleiman Al-Samhan who helped me collect data from the study site. She has been

very generous with her time and assistance carrying out the interview part of one of this

dissertation articles.

Last but not least important, my deepest debt of gratitude goes to my family members,

especially my parents, Othman and Lulwa, for their prayers, support, and encouragement. I

also would like to thank my wife, Reem, and my two children, Wissam and Tala, for their

support, patience, and assistance in that without whom I could not have been able to complete

this dissertation.

4
TABLE OF CONTENTS

ACKNOWLEDGEMENTS ............................................................................................................ 4  

LIST OF FIGURES ...................................................................................................................... 11  

LIST OF TABLES ........................................................................................................................ 12  

ABSTRACT .................................................................................................................................. 13  

DISSERTATION THEME ........................................................................................................... 14  

CHAPTER 1 INTRODUCTION .................................................................................................. 15  

CHAPTER 2 SAUDI STUDENT PLACEMENT INTO ESL PROGRAM LEVELS: ISSUES

BEYOND TEST CRITERIA ........................................................................................................ 19  

ABSTRACT .............................................................................................................................. 20  

INTRODUCTION .................................................................................................................... 21  

A Historical Review on ESL Placement Tests: Traditional ESL Placement Tests ..................... 23  

ESL Placement Test Item Formats .............................................................................................. 24  

ESL Placement Test Delivery Formats........................................................................................ 25  

Test-takers Intentionally Failing Exams: Myth or Reality? ........................................................ 26  

Student Intentions to Purposefully Underperform on a Language Test ...................................... 27  

Saudi Students Studying Abroad ................................................................................................. 30  

The Saudi Arabian Cultural Mission (SACM) ............................................................................ 30  

METHODOLOGY ................................................................................................................... 31  

Anecdotal Evidence and Pilot Study ........................................................................................... 31  

Data Collection Background on Site..................................................................................... 32  

Positionality of the Researcher .................................................................................................... 34  

5
Significance of the Study ............................................................................................................. 35  

Research Questions ...................................................................................................................... 36  

Research Instruments ................................................................................................................... 36  

Participants .................................................................................................................................. 37  

Procedures ................................................................................................................................... 38  

FINDINGS ................................................................................................................................ 39  

Purposefully Underperforming on ESL Placement Tests: A Major or Minor Issue?.................. 48  

DISCUSSION AND CONCLUSION....................................................................................... 49  

LIMITATIONS ......................................................................................................................... 52  

IMPLICATIONS ...................................................................................................................... 54  

Pedagogical Implications ............................................................................................................. 54  

Implications for Scholarship Agencies ........................................................................................ 55  

Implications for Both Scholarship Agencies and ESL Administrators ....................................... 56  

CHAPTER 3 INTEGRATING SELF-ASSESSMENT TECHNIQUES INTO L2

CLASSROOM ASSESSMENT PROCEDURES......................................................................... 58  

ABSTRACT .............................................................................................................................. 59  

INTRODUCTION .................................................................................................................... 60  

Definition ..................................................................................................................................... 61  

Benefits and Limitations of Self-assessment ............................................................................... 61  

LITERATURE REVIEW ......................................................................................................... 62  

Impact of Classical Teaching Methods on Self-assessment ........................................................ 62  

Impact of Communicative Language Teaching on Self-assessment ........................................... 62  

Learner-centeredness ................................................................................................................... 63  

6
Self-directed Learning ................................................................................................................. 63  

Self-assessment ............................................................................................................................ 64  

Accuracy of Self-assessment ....................................................................................................... 64  

Correlations ................................................................................................................................. 65  

Discrepancies ............................................................................................................................... 65  

How Can the Accuracy of Self-assessment Be Achieved? ......................................................... 66  

Common European Framework of Reference (CEFR) ............................................................... 67  

Validity of the CEFR Self-assessment Rubric ............................................................................ 68  

METHODOLOGY ................................................................................................................... 69  

Significance of the Study ............................................................................................................. 69  

Research Questions ...................................................................................................................... 69  

Research Tools............................................................................................................................. 70  

Participants .................................................................................................................................. 70  

Background on Site ..................................................................................................................... 71  

Procedures ................................................................................................................................... 72  

FINDINGS ................................................................................................................................ 74  

Discussion and Conclusion .......................................................................................................... 83  

TOEFL-related Limitations ......................................................................................................... 84  

Implications for ESL/EFL Program Administrators.................................................................... 85  

Implications for ESL/EFL Teachers ............................................................................................ 86  

Implications for Future Research................................................................................................. 86  

CHAPTER 4 QUALITY ASSURANCE AND ACCREDITATION AS FORMS FOR

LANGUAGE PROGRAM EVALUATION: A CASE STUDY OF TWO EFL

DEPARTMENTS IN A SAUDI UNIVERSITY .......................................................................... 87  

7
ABSTRACT .............................................................................................................................. 88  

INTRODUCTION .................................................................................................................... 89  

What Is Program Evaluation? ...................................................................................................... 90  

LITERATURE REVIEW ......................................................................................................... 91  

Foreign Language Program Evaluation ....................................................................................... 92  

Evaluation Paradigms .................................................................................................................. 94  

Formative vs. Summative ............................................................................................................ 96  

External Experts vs. Internal Stakeholders .................................................................................. 97  

Field vs. Laboratory Research ..................................................................................................... 98  

Ongoing vs. Short-term Evaluations ........................................................................................... 98  

Qualitative vs. Quantitative ......................................................................................................... 99  

Process vs. Product .................................................................................................................... 100  

Program Evaluation through Quality Assurance ....................................................................... 101  

Accreditation-based Program Evaluation .................................................................................. 102  

Quality Assurance and Accreditation in Saudi Arabia .............................................................. 103  

NCAAA ..................................................................................................................................... 104  

Significance of the Study ........................................................................................................... 104  

METHODOLOGY ................................................................................................................. 105  

Background on Site ................................................................................................................... 106  

Research Objectives................................................................................................................... 106  

Research Questions .................................................................................................................... 107  

Participants ................................................................................................................................ 107  

Research Tools........................................................................................................................... 108  

Integrating NCAAA and CEA Standards .................................................................................. 108  

Procedures ................................................................................................................................. 110  

8
Data Analysis ............................................................................................................................. 112  

FINDINGS .............................................................................................................................. 112  

Standard One: Mission/Goals/Objectives .................................................................................. 112  

Evaluating the Mission Statements of the Two EFL Departments............................................ 113  

Evaluating the Goals and Objectives of the Two EFL Programs .............................................. 116  

Students’ Attitudes towards Course Objectives and Requirements .......................................... 116  

Standard Two: Curriculum ........................................................................................................ 117  

Student Learning Outcomes ...................................................................................................... 118  

Teaching Strategies .................................................................................................................... 119  

Assessment Methods ................................................................................................................. 121  

Standard Three: Student Achievement ...................................................................................... 122  

Standard Four: Program Development, Planning, and Review ................................................. 123  

Quality Assurance Coordinator of the Male-section Program .................................................. 123  

Student Dissatisfaction with the Two EFL Departments .......................................................... 126  

Student Satisfaction with the Practices of the Two Departments .............................................. 128  

Good Quality Assurance Practices ............................................................................................ 130  

Poor Quality Assurance Practices .............................................................................................. 131  

Assessment-related Dilemmas ................................................................................................... 132  

Quality Assurance-related Dilemmas ........................................................................................ 132  

DISCUSSION AND CONCLUSION..................................................................................... 133  

Implications .................................................................................................................................... 134  

Standard One: Mission/Goals/Objectives. ................................................................................. 134  

Standard Two: Curriculum. ........................................................................................................ 135  

Standard Three: Student Achievement. ...................................................................................... 135  

Standard Four: Program Development. ...................................................................................... 135  

9
CHAPTER 5: CONCLUSION ................................................................................................... 137  

Appendix A: Pilot Study Findings .............................................................................................. 141  

Appendix B: Survey Questions ................................................................................................... 144  

Appendix C: Interview Questions ............................................................................................... 150  

Appendix D: ESL Administrators & Assessment Coordinators ................................................. 153  

Appendix E: Study Participants .................................................................................................. 155  

Appendix F: CEFR, TOEFL, and IELTS Equivalency Table .................................................... 156  

(Tannenbaum & Wylie, 2007) .................................................................................................... 156  

Appendix G: CEFR Self-assessment Rubric Descriptors ........................................................... 157  

Appendix H: Interview Questions .............................................................................................. 167  

Appendix I .................................................................................................................................. 168  

Appendix J: Color-coded CEFR, TOEFL, and IELTS Equivalency Table (Tannenbaum &

Wylie, 2007) ............................................................................................................................... 170  

Appendix K: Participants’ Self-rated Scores .............................................................................. 171  

Appendix L: Consent Form ........................................................................................................ 172  

APPENDIX M: Web-based STUDENT SURVEY .................................................................... 173  

APPENDIX O: RESEARCHER’S EVALUATION CHECKLIST ........................................... 181  

APPENDIX P: COLLEGE MISSION ........................................................................................ 189  

APPENDIX Q: MALE DEPARTMENT OBJECTIVES ........................................................... 190  

REFERENCES ........................................................................................................................... 191  


10
LIST OF FIGURES

Figure 2.1. Participants' levels of English proficiency by gender. .......................................... 37  

Figure 2. Scatterplot (BIVAR)=TOEFL with Ratio by Level of Proficiency for Male and

Female. ..................................................................................................................................... 80  

Figure 3. Student learning outcomes cover all language skills. ............................................ 119  

Figure 4. Teaching strategies are proper for various learning styles. .................................... 120  

Figure 5. Instructors communicate with you in English........................................................ 120  

Figure 6. You are engaged presentations & leading discussions .......................................... 120  

Figure 7. You are engaged in research projects. ................................................................... 120  

Figure 8. Are assessment methods communicated in advance? ............................................ 121  

Figure 9. Are these methods used consistently? .................................................................... 122  

Figure 10. Students’ responses to how often their work is graded and returned to them

promptly. ................................................................................................................................ 128  

Figure 11. Students’ responses to the extent to which SLOs meet their expectations. ......... 129  

11
LIST OF TABLES

Table 1 ..................................................................................................................................... 38  

Table 2 ..................................................................................................................................... 40  

Table 3 ..................................................................................................................................... 41  

Table 4 ..................................................................................................................................... 42  

Table 5 ..................................................................................................................................... 44  

Table 6 ..................................................................................................................................... 71  

Table 7 ..................................................................................................................................... 73  

Table 8 ..................................................................................................................................... 75  

Table 9 ..................................................................................................................................... 76  

Table 10 ................................................................................................................................... 77  

Table 11 ................................................................................................................................... 82  

Table 12 ................................................................................................................................. 110  

Table 13 ................................................................................................................................. 110  

12
ABSTRACT

This three-article dissertation addresses three different yet interrelated topics:

language testing, assessment, and evaluation. The first article (Saudi Student Placement into

ESL Program Levels: Issues beyond Test Criteria) addresses a crucial yet understudied issue

concerning why lower-level ESL classes typically contain a disproportionate number of

Saudi students. Based on data obtained from different stakeholders, the findings revealed that

one-third of the study students intentionally underperformed on ESL placement tests.

However, ESL administrators participating in this study provided contradicting findings.

The second article explores the efficacy of (Integrating Self-assessment Techniques

into L2 Classroom Assessment Procedures) by examining the accuracy of CEFR self-

assessment rubric compared to students’ TOEFL scores, and the extent to which gender and

levels of language proficiency cause any potential score underestimation. By obtaining data

from 21 ESL students attending the Center for English as a Second Language at University of

Arizona, the findings revealed no statistically significant correlations between participants’

self-assessed scores and their TOEFL scores. However, the participants reported that the

CEFR self-assessment rubric is accurate in measuring their levels of language proficiency.

On the other hand, the third article (Quality Assurance and Accreditation as Forms for

Language Program Evaluation: A Case Study of Two EFL Departments in A Saudi

University) provides a simulated program evaluation based on an integrated set of standards

of the NCAAA (the National Commission for Academic Accreditation and Assessment) and

CEA (the Commission on English Language Program Accreditation). The findings indicated

that the standards of the mission, curriculum, student learning outcomes, and program

development, planning, and review, were partially met, whereas the standards of teaching

strategies, assessment methods, and student achievement were not.


13
DISSERTATION THEME

Program
Evaluation

Classroom
Alternative
Assessment

Placement Tests

14
CHAPTER 1 INTRODUCTION

Testing, assessment, and evaluation are an inseparable component of education

(Fulcher & Davidson, 2007), and they play a fundamental role in fulfilling a number of

functions in the educational process at different levels. Research stresses the important of

these three concepts in planning, guiding, and implementing modern educational processes.

For example, Fulcher (2010) asserts that tests are “vehicles by which society can implement

equality of opportunity or learner empowerment” (p. 1). They function as gatekeepers for

education through a process of evaluating student performance to ensure they satisfy or meet

a set of standards or requirements, respectively. Furthermore, tests are tools, which are used

to verify the extent to which an individual has mastered a specific skill(s). Assessment, on the

other hand, is concerned with documenting the target knowledge, skills, and attitudes in that a

performance is observed, described, documented, and interpreted for improvement purposes.

Lastly, evaluation is a process of making judgments to a performance based on evidence.

These three concepts are an indispensable part of L2 learning. For example, Carroll

(1961) indicates that language tests are key to formal language learning as they “render

information to aid in making intelligent decisions about possible courses of action” (p. 314).

This suggests that verifying the extent to which L2 student learning outcomes are achieved is

unlikely to be made without any form of testing. Moreover, language assessment plays a vital

role not only in L2 learning but also in applied linguistics, by “operationalizing its theories

and supplying its researchers with data for their analysis of language knowledge or use”

(Clapham, 2000, p. 148). Furthermore, program evaluation has become key to many L2

language programs as “no curriculum should be considered complete without some form of

program evaluation” that corresponds with the program emerging developments (Brow, 2007,

p. 158). This three-article dissertation addresses language testing, assessment, and evaluation.
15
The first article, “Saudi Student Placement into ESL Program Levels: Issues beyond

Test Criteria”, attempts to account for why some Saudi ESL students studying in the United

States are placed into lower than-expected levels. In other words, one of the IEP programs in

the Southwestern United States has received some complaints raised by the Saudi Arabia

Cultural Mission (SACM), a Saudi student sponsor in the US, about Saudi students being

placed into lower ESL levels than expected. Of course, this placement may be attributed to

numerous factors including students’ low level of language proficiency, fatigue, lack of

adequate background on ESL placement tests, and so forth. Nevertheless, based on initial

anecdotal evidence found by the researcher, it was found that there are other reasons causing

this issue. That is, many Saudi ESL students indicated that they purposefully underperform

on ESL placement tests for several reasons including but not limited to having more time to

learn English, adapting to the target educational and environmental contexts, preparing well

for admission-related tests (TOEFL, IELTS, and GRE), and so on. To examine whether this

issue exists and therefore represents a dilemma for some or all stakeholders (language

programs, scholarship agencies, students, teachers), a set of different data were collected

from a larger, more diverse group of Saudi and Gulf Cooperation Council (GCC) students.

The article will help solve this issue, to whatever extent it exists, by first obtaining the

opinions of various stakeholders. Then, based on their responses and attitudes, implications

will be provided to help mitigate the issue. The findings of this article; however, are not

intended only for Saudi or GCC students; they are also intended for ESL students of other

nationalities. Thus, the article will be a unique contribution to possibly avoiding any form of

negative impact caused by this issue on stakeholders. That is, having a classroom with a large

number of self-misplaced students may pose greater issues for ESL teachers in particular, for

it would be expected to have severe discrepancies among students (Hughes, 1989).


16
The second article, “Integrating Self-assessment Techniques into L2 Classroom

Assessment Procedures”, in contrast, is an endeavor to explore the integration of alternative

assessment techniques into ESL/EFL classroom assessment measures. In other words, as a

response to a pedagogical paradigm shift from teacher- to student-centered teaching, this

article examines the accuracy of CEFR self-assessment rubric in reflecting L2 students’

levels of language proficiency. To do so, self-assessed scores of a group of ESL students are

compared against their TOEFL/IELTS scores in order to identify the extent to which this self-

assessment measure can be used in L2 contexts for proficiency purposes. The reason for

choosing self-assessment in particular among many other alternative assessment techniques is

twofold. First, as a previous EFL student, the researcher never used any self-assessment

measures during his entire undergraduate studies. This lack of experience led him to

encounter several difficulties when he first pursued his higher studies in the United States.

Second, research has empirically documented the importance of self-assessment in promoting

student-centeredness (Nunan, 1988), enhancing student learning (Boud 1995; Taras, 2010),

and augmenting learner autonomy (Benson, 2012).

This article emphasizes the importance of incorporating self-assessment rubrics into

ESL/EFL classroom assessment processes as an alternative or supplementary tools to

traditional assessment techniques. That is, it proposes endeavors be made to use CEFR or any

other valid self-assessment rubrics not only in L2 programs in Saudi Arabia, but also in many

ESL programs in general. More importantly, the findings of this article are hoped to

encourage university students to develop their autonomy through self-assessment. At the end

of the paper, EFL departments in Saudi Arabia in particular and other L2 practitioners in

general will be provided with implications on how self-assessment techniques can be

effectively utilized in L2 contexts.


17
First, as many Saudi universities seek to obtain institutional or program accreditation

from the National Commission for Academic Accreditation and Assessment (NCAAA), this

article provides a simulated evaluation of two EFL departments at a Saudi University to serve

as an effective model and benchmark for other language programs that seek to obtain

accreditation. Another salient gap that needs to be addressed is that through its Ninth

Development Plan (2014), Saudi Arabia considers accreditation to be a national strategic

aspect of its policies (Ministry of Economy and Planning, 2015). Hence, this article

endeavors to promote this aspect by providing a practical example of program evaluation.

Another main gap that this article attempts to fill is that program evaluation does not

receive substantive attention as opposed to “program design, teacher training, or classroom

techniques”, for it is deemed “as of a lower priority than the more obviously immediate

activities associated with design and planning” (MacKay, 1998, p. 33). In other words, as

MacKay notes, some program administrators argue that once a program is planned, designed,

and implemented, program evaluation processes will become natural and spontaneous as a

response to program changing circumstances. As a consequence, this article aims to raise

program administrators’ awareness of the significance of maintaining periodic program

evaluation and review, and that program evaluation is not, under any circumstances, “an ad

hoc, unprincipled, or arbitrary activity” but rather it is a planned, organized, and guided

process (MacKay, 1988, p. 34). Therefore, by evaluating two university English language

programs in Saudi Arabia through a set of integrated standards of the NCAAA and CEA, this

article explores program evaluation in general yet simulating through other lenses.

18
CHAPTER 2 SAUDI STUDENT PLACEMENT INTO ESL PROGRAM LEVELS:

ISSUES BEYOND TEST CRITERIA

19
ABSTRACT

ESL Placement tests are used “to place students at the stage of the teaching program

most appropriate to their abilities” (Hughes, 1989, p. 14). Placement decisions should thus

reflect students’ actual levels of language proficiency; otherwise, improper placements may

affect their learning achievement (Bachman & Palmer, 1996). Although most ESL placement

tests adhere to testing theoretical underpinnings, another issue may exist. That is, some Saudi

ESL students studying in the United States are placed into lower than-expected levels, which

might be attributed to several factors (e.g., students’ low level of language proficiency); yet,

there are other reasons for this issue. To gain initial insight into this issue, a piloted survey

was circulated to 27 Saudi students studying in the U.S. The findings revealed that 40% of

them purposefully underperformed on university ESL placement tests for various reasons.

To examine the extent to which this issue exists, a survey was circulated to a

randomly selected group of Saudi ESL students attending three ESL programs in the U.S.

This was followed by conducting semi-structured interviews with a randomly selected sample

of participants. The findings showed that 20% of these participants intentionally

underperformed on ESL placement tests. Then, the same survey was circulated to GCC (Gulf

Cooperation Council) students studying in the U.S. The findings showed that 18% of them

also deliberately underperformed on ESL placement tests. Finally, some ESL administrators

were surveyed about this issue and provided contradicting percentages of students

purposefully underperforming on ESL placement tests. This article, therefore, provides some

suggestions that help overcome or mitigate this issue.

Key Words: Test validity, test reliability, test authenticity, test practicality,

purposefully underperforming on ESL placement tests, self-misplaced students, Saudis, GCC,

SACM, KSA, IEP


20
INTRODUCTION

Language testing is an important field in the literature regarding second language

teaching, for it addresses key issues pertaining to language evaluation and assessment such as

test design, test administration, scores interpretation and so forth (Bachman & Palmer, 1996;

Brown, 2010; Hughes & Scott-Clayton, 2011; Kunnan, 1998; McNamara, 2006). ESL

placement tests are widely used by universities, community colleges, and language programs

for the purpose of placing prospective students into homogeneous groups based on accurate

measures of their levels of language proficiency (Bachman, 1990; Bachman & Palmer, 1996;

Fulcher, 1997; Wall, Clapham, & Alderson, 1994). The main purpose of ESL placement tests

is to provide accurate measures of students’ language abilities in order to place them in the

appropriate level classes (Kunnan, 2000). However, failure to report accurate scores that

reflect students’ actual language proficiency can lead to placing them into incorrect levels,

which can impact their current and future learning (Hughes, 1989).

Research has approached language placement testing from different perspectives. For

example, Fulcher (1997) points out that early placement testing publications (e.g., Goodbody,

1993; Schmitz & DelMas, 1990; Schwartz, 1985; Truman, 1992) were “concerned with the

placement of linguistic minority students” into language programs (p. 113). Before the 1990s,

researchers rarely examined the validity of ESL placement tests compared with other studies

conducted on first language and standardized foreign language tests because placement tests

are often low stakes as opposed to other language tests such as exit tests (Brantmeier, 2006).

This paper investigates another, but as of yet understudied, issue concerning the reasons why

many Saudi ESL students are placed into lower-than-expected levels in intensive university

ESL programs abroad and the extent to which this behavior affects program placement

systems.
21
In this paper, it is assumed that the placements of many Saudi ESL students are lower

than expected for numerous reasons. First, the Saudi Arabian Cultural Mission (SACM) has

long complained about placing Saudi ESL students into lower levels (ESL program

administrator, personal communication, March 12, 2015). SACM is concerned with the

reasons why lower-level ESL classes typically contain a disproportionate number of Saudi

students, and SACM ESL coordinators have attempted to identify the factors causing this

issue. They wonder if it is a language issue, a testing issue, a lack of experience in taking

placement tests, or a combination of issues. Second, research has reported that Saudi ESL

students’ oral skills are better than their writing skills. For example, Johnson (2015) found

that Saudi students perform better on the University of Illinois entry interviews compared

with their written placement scores. Saba’s (2014) longitudinal study also found that within a

short period of time, some Saudi students achieved TOEFL scores higher than their scores on

the Virginia Tech Language and Cultural Institute (VTLCI) placement test.

The discrepancies between Saudi ESL students’ placement scores and their actual

level of English proficiency can be ascribed to numerous factors such as lack of testing skills,

test-takers’ varied abilities in taking computerized test, test anxiety, and so on. Nevertheless,

more tangible evidence is needed to support this extrapolation. Thus, the researcher collected

anecdotal evidence from a randomly selected group of Saudi ESL students, all who indicated

that they purposefully performed poorly on ESL placement tests for various reasons, which

will be discussed below. This finding encouraged the researcher to review literature

addressing this issue in order to identify if some ESL students intentionally underperformed

on ESL placement tests, and if so, to what extent? Unfortunately, the researcher was unable

to find adequate studies addressing the issue. Hence, a historical review on ESL placement

tests and an in-depth discussion about test criteria will be provided in detail instead.
22
A Historical Review on ESL Placement Tests: Traditional ESL Placement Tests

Due to the dominance of SLA theories that emerged during the 70’s, 80’s and 90’s,

ESL placement tests have undergone profound changes (Green & Weimer, 2004). For

example, ESL placement tests of earlier years were greatly impacted by grammar-translation

approaches, resulting in tests that focused mainly on language structures (Brown, 1989).

Hughes, Weir, and Porter (1995) argue that traditional ESL placement tests merely test

learners’ grammatical knowledge to “mirror course content” in order to be placed into a

homogeneous grouping (p. 13). Later, the interactionalist paradigms emerged (Gass &

Varonis, 1994; Long, 1981; Mackey, 1999), which emphasized the importance of presenting

test items with rich contextual clues through which “interactional adjustments” could be

triggered (Gass, 2010, p. 221). This contextualized ESL placement tests items, “whose recall

is promoted by information in the discourse” (Bachman, 1990. p. 132). However, traditional

ESL placement tests completely neglected other language skills such as listening and

speaking.

Communicative language teaching (CLT), which emerged in the early 1970s (Hymes,

1972) and 1980s (Wesche, 1983), stressed that oral skills be included in ESL placement tests

(Brown, 1989). Consequently, this shifted testing from focusing on language structure (e.g.,

cloze tests) to addressing test-takers’ communicative competence instead (Brown, 2010).

This paradigm shift resulted in the inclusion of more authentic texts (reading), interactive

tasks (speaking), lecture note-taking tasks (listening), and integrative-based tasks (writing).

The CLT approach has also introduced performance- and task-based test items, which made

ESL placement tests move away from solely testing L2 learners’ grammatical knowledge to

testing their communicative competence.

23
ESL Placement Test Item Formats

Traditional placement test formats include essay-based, cloze, gap-filling, matching,

true/false, error correction, and so forth (Bachman, 1990). Although multiple-choice formats

have been used for quite some time, they were initially integrated into ESL placement tests

during the early 1990s, primarily as a reaction to the widespread and popular item response

theory research (IRT) (Bock, 1997). Researchers have addressed the issue of test format and

the extent to which it affects test-takers’ performance. For example, Bridgeman and Lewis

(1994) argue that some test-takers are “relatively strong on essays and weak on multiple-

choice question and vice versa” (p. 133), implying that the nature of the test construct has a

great impact on test-takers’ scores. This suggests that test format should take into careful

account during designing ESL placement tests.

Another issue that test designers should bear in mind when designing a placement test

is the extent to which ESL placement test format affects test-takers’ performance. Bridgeman

and Morgan (1996) conducted a study to validate an ESL placement test designed with two

different formats (multiple-choice vs essay), which they administered to a randomly selected

sample of ESL students in two separate sessions. The results indicated that some students,

who achieved higher scores in the essay-based test, obtained lower scores in the multiple-

choice format and vice versa. In the same vein, in validating the Lancaster University EAP

test, Wall et al. (1994) found evidence that multiple-choice items were fairly easy for many

test-takers, with a total mean scores ranging from 70% to 76%, whereas the mean scores of

the essay-based test was lower (57.91%). According to Lancaster University’s policies,

students who achieved higher scores met the targeted program’s requirements, while those

who achieved lower scores were required to take pre-sessional or remedial courses to help

them improve their language skills (Wall et al., 1994).


24
ESL Placement Test Delivery Formats

In addition, delivery formats of ESL placement tests (paper-based vs computerized)

may also have an impact on test-takers’ performance. According to Chapelle (2001),

beginning the early 1980s, ESL placement tests were delivered and graded more and more by

computers, known as computer-based tests (CBTs), which slowly replaced human raters.

Brown (1997) lists three key benefits of CBTs: 1) they can be “individually administered,

even on a walk-in basis”, 2) they do not involve many proctors, and 3) they can be kept for

future use, review, and adaptation (p. 45). Despite these benefits, it was questioned whether

the two formats would report equivalent results. Based on a Cronbach’s alpha analysis,

Fulcher (1999) found that the scores of a CBT correlated significantly with those of a PBT,

where the former’s reliability was 0.90, while the latter’s was 0.95. This implies that both

formats can report consistent scores provided test-takers are familiar with using computers.

Currently, many ESL placement tests are being delivered online, a dramatic change in

the placement test industry compared with their PBT and CBT counterparts (Roever, 2001).

That is, the Internet provides impetus opportunities for “test-taking at the learner’s

convenience and providing immediate and specific feedback to learners” (Chapelle, Jamieson

& Hegelheimer, 2003, p. 409). In fact, many ESL placement tests today are effectively

delivered and graded online despite the fact that some technical glitches may occur during

taking the test. One question that arises is to what extent do different delivery test formats

yield consistent scores? Research has revealed conflicting results; however, regardless of

delivery or design formats, the consensus is that all test-takers should have equal

opportunities in taking the test (Brantmeier, 2006; Brown, 1997; Fulcher, 1999; Kirsch,

Jamieson, Taylor & Eignor, 1998). The next section sheds some light on test criteria that

should be taken into account during the design process of ESL placement tests.
25
Test-takers Intentionally Failing Exams: Myth or Reality?

To better understand students’ intentions to fail a test, it is inevitably crucial to

identify similar cases, from any discipline, that show people making conscious attempts to

purposefully perform poorly on an exam. In other words, do people have legitimate motives

to intentionally perform poorly on a test? Literature from various disciplines has reported that

some people fail a test to achieve desired goals. For example, based on neuropsychological

data, Coppel (2011) found that some high school “athletes have been known to deliberately

perform poorly on baseline to create a low comparison point to evaluate change on post-

concussion follow-up” (p. 658). This suggests that purposefully performing poorly on a

baseline exam will enable high school athletes to be excused from practice, take longer

leaves, receive lighter assignments, and be exempt from doing their homework.

Another example of intentionally underperforming on a test is evident in the case of

malingering. That is, malingers have been found to deliberately perform poorly during

clinical examinations by feigning memory problems caused by traumatic events hoping that

the results would lead them to achieve specific external incentives (Sekely, 2014). For

example, their intentional deception of providing false statements is aimed at avoiding

“military duty or work, obtaining financial compensation, evading criminal prosecution, or

obtaining drugs” (Diagnostic & Statistical Manual of Mental Disorders 4th ed. American

Psychiatric Association, 2000, p. 739). In academic contexts, Ahlgren Reddy and Harper

(2013) reported that some prospective mathematics students often “intentionally perform

poorly on a placement examination so as to take a presumably easier course and there is no

easy way to control for this” (p. 688). Based on the examples above, one may realize that

some people seem to be generally motivated to perform poorly on certain tests or

examinations in order to achieve specific external incentives.


26
Student Intentions to Purposefully Underperform on a Language Test

Only one research article was found that addressed the motives of ESL students who

deliberately perform poorly on ESL placement tests. Kahn, Butler, Weigle and Sato (1994)

found that some ESL students tend to “intentionally perform poorly in oral interviews in

order to remain at a lower level...to avoid missing anything, or when they want to stay with

friends or an instructor at a lower level” (p. 38). Yet, no other evidence pertinent to this issue

has been reported, which can be ascribed to the assumption that ESL students might find no

benefit of revealing their intentions to throw any test. Moreover, as a Saudi, I assume that

many Saudi ESL students might be reluctant to acknowledge such behavior due to the fear of

losing their scholarship. Thus, this leads us to seek additional language-related cases of ESL

students’ conscious attempts to deliberately underperform on tests.

Brown and Hudson (2002) found cases of purposefully underperforming on L2

diagnostic tests, so that “some students may actually be motivated to fail or at least to get a

low score on criterion-referenced tests” (p. 287). This indicates that in some cases, students

may intentionally underperform on tests for different reasons. Brown and Hudson (2002)

provide an interesting yet inappropriate choice of students’ attempts to fail a test on purpose

in that some “cynical students may try to outsmart the criterion-referenced testing process

when they understand what is going on” (p. 287). That is, taking the test at the beginning of

the course for diagnostic purposes “and are told by the teacher (or guess)” that they will

retake it at the end of the course for achievement purposes, students tend to deliberately

underperform on the pre-test, and then take the post-test seriously (p. 87). By doing so, they

will be able to make perceived learning progress due to the difference between their pre- and

post-test scores without any learning actually occurring. Such deliberate failures have been

found in criterion-referenced tests, especially in pre-post assessment (Jang, 2008).


27
Furthermore, Reed and Stansfield (2004) also found evidence of students intentionally

failing a test. Similarly, although the Modern Language Aptitude Test (MLAT) is often used

to predict students’ ability to learn a foreign language (Carroll & Sapon, 2002), it has been

recently used to diagnose students who claim that they have learning disabilities that may

impede them from learning a foreign language. The findings indicated that some students

intentionally failed the test, so they could waive or postpone a foreign language course. To

address this issue, some schools hire counselors who conduct interviews with test-takers to

ensure that their MLAT reported scores are accurate. Despite all the efforts exerted by

schools to overcome or at least alleviate the aforementioned behavior, many cases of students

failing the MLAT test still occur (Carroll & Sapon, 2002).

Another interesting example of students failing a test on purpose was mentioned in a

meeting organized by the NY Public Education Reform Commission. Leonardatos (2012),

who was one of the attendees, reported that one of the presenters suggested that a new teacher

evaluation system be imposed as students’ reported scores are inaccurate. That is, since

students know that they will go through a pre- and post-test evaluation process to identify

their progress in a course, they deliberately fail a test to reflect a rapid achievement based on

their post scores. However, their motives for failing a test intentionally are not always for

their own benefit but rather for their teachers’. For example, knowing that their teachers are

evaluated based on their achievement, these high school students intentionally fail a

diagnostic pre-test so that the teacher(s) they like can obtain higher scores on teacher

evaluation (Leonardatos, 2012). This assumes that the students would achieve higher scores

on the achievement post-test. Conversely, students might perform well on the diagnostic pre-

test so that the teacher(s) they dislike can gain lower scores on teacher evaluation.

28
In the previous sections, a number of significant issues pertaining to ESL placement

tests were discussed in detail including how these tests have developed over time, with

particular focus on item formats, delivery modes, theoretical underpinnings, and so forth. It

was also been emphasized that regardless whether ESL placement tests have diagnostic

(institutional-oriented such as Michigan Test) or course-content orientations, they should

ultimately reflect test-takers’ actual levels of English proficiency. This is due to the fact that

they are originally developed “to assess students’ level of language ability so that they can be

placed in the appropriate course or class” (Alderson, Clapham & Wall, 1995, p. 11). Such

purposes can be achieved by ensuring a set of test criteria in order to make the test as useful

as possible. In some cases, however, despite being valid, reliable, practical, and even

authentic, some placement tests may fail to report test-takers’ actual levels of language

proficiency for numerous and possibly test-irrelevant reasons.

As reported by the study’s anecdotal evidence and pilot study, it was discovered that

some Saudi ESL students purposefully underperformed on ESL placement tests for certain

planned or personal reasons. This paper examines the accuracy of these findings by surveying

a larger number of Saudi and GCC ESL students. The reason for choosing Saudi ESL

students in particular is two-fold. First, as a Saudi, the researcher has witnessed a wide range

of Saudis intentionally performing poorly on ESL placement tests. That is, whenever he

attends Saudi gatherings, meets with Saudi friends, or even has some discussions with other

Saudi students, he notices that some of them encourage each other to fail placement tests on

purpose for many reasons, most importantly to have more time to seek university admission.

These students believe that underperforming on a placement test will help them stay longer in

the U.S., for if they are placed into higher ESL levels and finish the ESL program in a short

period of time, they are required to obtain admission sooner or be forced to leave the country.
29
Saudi Students Studying Abroad

The first study abroad scholarships began in 1927, when six students were sent to

Egypt to pursue higher degrees at the Kingdom’s expenses (MOHE1, 2014). Since that time,

study abroad scholarships have grown exponentially. According to Alqahtani (2014), one of

the most prosperous eras of these scholarships was the launch of King Abdullah Scholarship

Program in 2005, which was established based on an agreement between King Abdullah and

George W. Bush (Taylor & Albasri, 2014). The program sends qualified Saudi students to

“the best universities worldwide for further studies leading to academic degrees (Alqahtani,

2014, p. 33). As of 2014, the number of Saudi students studying abroad dramatically

increased to 200,000 students, approximately 111,000 of them in the U.S. (MOHE, 2014).

The Saudi Arabian Cultural Mission (SACM)

The Saudi Arabian Cultural Mission (SACM) is a Saudi agency in the U.S. that

oversees the King Abdullah Scholarship Program, represents Saudi universities, and provides

scholarship Saudi students with academic and financial support (MOHE, 2014). The ESL

Department at SACM functions as an intermediary between ESL programs and ESL Saudi

students (SACM, 2013). There is no required level of English proficiency for obtaining a

scholarship. Instead, students should: 1) be Saudi, 2) be unemployed, and 3) meet age

requirements (MOHE, 2014). Thus, in most cases, Saudi students’ levels of English

proficiency are generally based on their high school English backgrounds, which range from

beginner to intermediate with some exceptions. Some students, like those from ARAMCO

who are taught in English in KSA, generally have higher levels of English proficiency, while

those with very limited exposure to English are at much lower levels (Al Murshidi, 2014).

1
Ministry of Higher Education

30
METHODOLOGY

The key motivation for conducting this study was to identify the reasons why many

Saudi ESL students are placed into lower than-expected ESL levels. This requires carrying

out further investigations, recruiting a large number of participants, or even triangulating the

data from various resources. Therefore, data collection involved several stages. First, an

anecdotal confirmation of this behavior was obtained from some Saudi ESL students. Second,

based on the anecdotal findings, a pilot study was circulated to many Saudi ESL students.

Third, more data was collected from Saudi ESL students attending three ESL programs in the

southwestern United States. Finally, the study survey was circulated to a large number of

GCC students studying in the United States. The findings suggested numerous reasons

accounting for why many Saudi ESL students are placed into lower than-expected levels.

Anecdotal Evidence and Pilot Study

Being uncertain about why many Saudi ESL students are placed into lower than-

expected ESL levels, the researcher decided to collect some anecdotal evidence from a

randomly selected sample of Saudi students studying English in the United States. To

accomplish this, he conducted face-to-face interviews with seven of them and communicated

online via chat and email with nine of them. During the interviews, the researcher noticed

consistent themes about the issue at hand. Some Saudi students indicated that they

deliberately performed poorly on ESL placement tests. Based on these themes, a pilot study

consisting of a twelve-item survey was then administered online through Twitter (See

Appendix A). Then, it was sent via Whatsapp to a group of Saudis who fit into the targeted

category. After asking them about some introductory questions about their experience in

taking ESL placement tests, the participants were then directly asked if they have, in one way

or another, ever intentionally performed poorly on any ESL placement test.


31
The findings of the pilot study showed that 11 out of 27 participants, or nearly 40%,

indicated that they had indeed failed an ESL placement test on purpose for several reasons,

which will be discussed thoroughly in the Findings section. Although this is a pilot study with

a limited number of subjects, the findings are still stunning. That is, almost half of the

randomly selected ESL learners made a conscious attempt to fail a placement test, a problem

worth further investigation, especially given that the results are consistent with those of the

anecdotal evidence. To further explore the issue, in the subsequent data collection stages, a

more developed survey was circulated to a larger number of Saudi and GCC ESL students in

addition to interviewing a random sample of them in order to allow them to expand more on

their motives behind intentionally underperforming on ESL placement tests.

DATA COLLECTION BACKGROUND ON SITE

The IEP programs examined in this study are located in the southwestern United

States. In order to maintain anonymity, they are labeled Program A, Program B, and Program

C throughout this article. Program A consists of six ESL levels: Basic (1 & 2), Intermediate

(1 & 2), and Advanced (1 & 2). If students obtain a GPA of 3.0 and above in Advanced 2,

then they meet the university language requirement. Program B, on the other hand, consists

of seven levels often ending with an optional Bridge Program. Each level takes eight weeks

to complete unless a student fails two or more classes, in which case they must repeat the

level. In the Bridge Program, academic credit-bearing courses are combined with some ESL

courses to develop students’ English skills and to engage them in a university course in their

areas of specialization. If students complete level 7 successfully, they will fulfill the

university’s language requirements for entry into a degree-seeking program. Program C

consists of six levels. Students can obtain an endorsement for university entrance if they earn

grades of A, B, or C in every level 5 or 6 class in which they are enrolled.


32
It can be assumed that the placement tests of these programs are valid for several

reasons. Unfortunately, the researcher did not have access to Program A’s placement test. He

believes that the placement tests of Programs B and C are valid since the former’s was

abstracted from a previously validated test, the International Test of English Proficiency or

iTEP, and students are reassigned a level whenever misplacements are detected (Program B

assessment coordinator, personal communication, November, 23, 2014). As for Program C,

its levels were compared against the TOEFL IBT proficiency descriptors on which placement

decisions are based (Program C Handbook). Hence, prospective students are placed into a

level of instruction based on their TOEFL or in-house placement scores. Moreover, the first

two weeks are a period of provisional placement for all students. If a student is misplaced,

s/he is then reassigned to the proper level. Given that two of the study placement tests are

valid, it would be unusual if some Saudi ESL students are placed into lower than-expected

levels unless there are other external factors causing these persistent misplacements.

At first glance, these misplacements could be attributed to several factors including

but not limited to test-takers’ potential low levels of language proficiency, lack of adequate

experience in taking language placement tests, and even fatigue when taking the test. In cases

of misplacement, students often find themselves in inappropriate levels, leading to boredom

when the classes are too easy or frustration when they are too challenging. Sometimes, these

misplaced students show neither boredom nor frustration to reflect their dissatisfaction with

misplacement. In these cases, one may assume that there are other factors causing

misplacements beyond test criteria, test-takers’ levels of proficiency, inadequate background

in placement tests, or any other test-related factors. To examine this issue prior to conducting

this study, anecdotal evidence was obtained and a pilot study was conducted from which

initial assumptions about misplacements can be drawn, as discussed earlier.


33
Positionality of the Researcher

One of the main principles in practice-based research is the positionality of the

researcher. Positionality is defined as the “self-conscious awareness of the relationship

between a researcher and another” (Bourke, 2014, p. 2). It mainly deals with a researcher’s

insider/outsider relationships with the participants and professionals in the same or

interrelated fields. For purposes of full disclosure, the researcher’s nationality is the same as

the participants, Saudi Arabian. A researcher is expected to collect and analyze data as an

independent researcher without consciously or unconsciously attempting to lead the

participants towards certain desired findings. The researcher’s physical, language, and

cultural access to the participants provided him with an emic position. Both the researcher

and participants are scholarship students sponsored by SACM to pursue their desired degrees.

This enabled him to meet with several Saudi students at the Saudi Student Club, weekend

gatherings, Facebook, and so forth.

The researcher’s emic advantage helped him obtain much information from the

participants, which might not have been accessible by researchers of other nationalities.

Although the participants appeared to be comfortable discussing this issue with him as

opposed discussing it with their ESL teachers or classmates, his objectivity was challenged

during some stages of data collection process. For example, during one interview, the

researcher discussed with a participant some issues about ESL placement tests in order to

allow the issue at focus, purposefully underperforming on ESL placement tests, to enter into

the conversation naturally. However, the participant kept discussing the accuracy of ESL

placement tests in detail, and the researcher repeatedly attempted to redirect the discussion to

the study issue. This may have impacted some of the participants’ responses and led them to

desired findings, which will be discussed fully in the limitations section.


34
Significance of the Study

This study is a substantial endeavor in accounting for why some Middle Eastern,

especially Saudi, ESL students are placed into lower than-expected levels. In addition, based

on the literature, it examines students’ suggestions and ESL administrators’ views to provide

implications to address this issue effectively. According to the anecdotal evidence and pilot

study findings, failing an ESL placement test on purpose, for any reason, is likely to lead to

further negative consequences. Assuming that 40% of the approximately 10,000 Saudi

students, who are granted governmental scholarships on an annually basis to study abroad,

intentionally underscore on placement exams, then 4,000 students will fail their ESL

placement test. This will not only affect the outcomes of Saudi students studying abroad, but

it will also have a potentially negative impact on ESL programs as well.

A large number of misplaced students can pose greater problems for ESL teachers, as

there might be severe discrepancies among students (Hughes, 1989). Furthermore, although

we do not have accurate statistics of scholarship ESL students who were unable to obtain

university admission and thus returned to Saudi Arabia, this study will be helpful to

encourage not only Saudi but all ESL students to take the ESL placement tests seriously to

benefit from the ESL levels into which they are placed. As a consequence, the significance of

this study centers upon the hope that many Saudi scholarship administrators, ESL

administrators and teachers, and Saudi ESL students will find the results very important in

terms of best practices in managing massive scholarship programs, long-term program

effects, and general language pedagogy. Therefore, this study aims to answer the following

key question, do ESL Saudi students purposefully underperform on ESL placement tests?

Based on the findings of this question, the study also investigates any negative consequences

resulting from such behavior.


35
Research Questions

1. Do some Saudi ESL students purposefully underperform on ESL placement tests in

order to be placed into lower levels, and if so, why?

2. What are some reasons that make students deliberately underperform on the English

placement tests?

3. To what extent does this behavior affect the placement system of ESL programs?

4. Is this behavior a major issue worth addressing, or is it a minor issue?

5. What are some possible implications that can contribute to resolving or mitigating this

issue/phenomenon?

Research Instruments

A mixed-methodology approach was used to collect data from the three programs

studied because an understudied issue is being investigated. The research tools are a 20-item

survey (Appendix B) and semi-structured interviews (Appendix C). The survey was designed

based on the findings of the anecdotal evidence and pilot study. The last item of the survey

asked the participants to provide their emails if they wished to participate in the interviews.

On the other hand, the semi-structured interviews, which consisted of ten questions, were

designed in a way that helps the participants expand more fully on their responses to the

survey in order to identify their motives behind deliberately underperforming on their ESL

placement tests. Moreover, the same survey was circulated, with some modifications, to a

wide range of GCC students studying in the U.S. through several platforms to test the initial

findings of the study. The last research tool used in this study was a six-item survey

(Appendix D), which was designed and sent out randomly to some ESL administrators and

assessment coordinators (n= 17) in order to gain their views on the study’s issue.

36
Participants

The participants of this study were divided into three categories (See Appendix E).

The first group consisted of 71 randomly selected Saudi ESL students (50 male and 21

female) attending three university ESL language programs in the southwestern United States.

Their levels of language proficiency were identified in the survey, which first asked students

about their current ESL program levels, range from beginner to advanced (Figure 1).

Figure 2.1. Participants' levels of English proficiency by gender.

Although 127 students participated in the survey, only 71 of them completed the

entire survey, so the remaining participants were excluded from analysis. Regarding their

distribution across the three study programs, 25 participants are attending Program A, 36

attending Program B, 13 attending Program C, and one attending another (Table 1).

37
Table 1

Levels of Proficiency Distribution of Participants in the Three ESL Programs

Lower-
Upper- Advance Passed all
Answer Beginner intermediat Intermediate
intermediate d ESL Levels
e
Program A 3 0 5 5 8 4

Program B 5 2 7 8 8 3

Program B 3 1 1 1 1 6

Total 71 11 3 13 14 17 13

The second group of participants consisted of 216 GCC students studying in the

United States who received the same survey with some modifications. The third group

included 17 ESL administrators and assessment coordinators to obtain their attitudes towards

the study issue.

Procedures

Data was collected by circulating a web-based survey to the university emails of

Saudi students attending the target ESL programs. Unfortunately, the initial number of

participants was too small to draw any conclusions about the issue. Therefore, the researcher

asked some teachers from the study’s programs to encourage Saudi students to participate in

the survey. He also asked many Saudi students with whom he is acquainted to encourage

their friends or any other Saudi students they knew of who satisfied the parameters of the

focus population to take the survey. After three months, 127 survey responses were received,

71 of which were completed in their entirety. The responses were organized into three

themes: 1) general information about the participants, 2) the study’s main question (Have you

ever intentionally performed poorly on an ESL placement test?), and 3), suggestions for

addressing the issue.

38
As for the second round of data collection, those who provided their emails in the

survey were contacted and given the choice to take part in face-to-face, phone, or other voice

protocol interviews. Program B students were interviewed at their program site, whereas the

other students were interviewed via phone, Skype, and Tango2. Each interview session lasted

20 to 25 minutes. Due to cultural constraints, the researcher was unable to interview female

participants face-to-face; instead, the interviews were done via Whatsapp. All interviews

were carried out in English unless some participants found difficulty conveying their

messages, then Arabic was used. To make the findings generalizable to more ESL students

beyond the study population, the survey was sent to a number of GCC students studying in

the U.S. via Student Clubs, Whatsapp, and Twitter. Finally, a six-item survey was sent to a

random group of ESL program administrators to ask them about any impact that purposefully

underperforming on ESL placement tests may have on their program placement system.

FINDINGS

After being asked some introductory questions concerning their experiences in taking

ESL placement tests in order to avoid leading them to any desired findings (i.e., being

influenced by the wording of the study’s main question), the participants were directly asked

the following question: “Did you intentionally, in one way or another, perform poorly on the

ESL Placement Test to be placed in lower levels?” One noteworthy issue is that this key

question was introduced in both English and Arabic to ensure that all participants understood

it fully. The findings showed that 14 out of 71 (20%) of these Saudi participants reported that

they did deliberately perform poorly on an ESL placement test for various reasons (Table 2).

2
A social media platform

39
Table 2

Saudi Participants’ Responses to the Study Main Question

# Answer Response %
20%
1 Yes 14

80%
2 No 57

Total 71 100%

Although 20% represents only half that of the pilot study, it is still high. However, it

is remains difficult to decide whether ESL students’ motives to purposefully underperforming

on placement tests exist at this percentage. In other words, 20% of participants is likely to be

insufficient to make the findings generalizable. Moreover, there is no other supporting

evidence, such as observing participants’ classroom performance right after taking the

placement tests. Another salient issue is that the researcher had no access to the placement

systems used by the three programs that participated in the study in order to examine these

findings. As a consequence, the researcher decided to support these findings with other data

collected from a larger number population that meets the parameters of the target population.

Therefore, the same survey was circulated to numerous GCC students studying in the U.S..

In this round of data collection, almost 336 GCC students responded to the survey, yet

only 216 of them completed the survey. Although the survey targeted GCC students across

the United States, the vast majority of the participants are Saudis (202), with only (6)

Emiratis, (7) Kuwaitis, and (1) Omani as indicated in Table 3 below. This can be attributed to

the researcher’s limited access to GCC students, except for Saudis. In other words, the

40
researcher contacted Saudi students across the Unites States with whom he is acquainted and

asked them to help him distribute the survey.

Table 3

Saudi Participants’ Responses to the Study Main Question

# Answer Male Female


1 Bahraini 0 0

2 Emirati 4 2

3 Kuwaiti 4 3

4 Omani 1 0

5 Qatari 0 0

6 Saudi 153 49

Total 162 54

The findings revealed that 39 out of 216, or 18% of the GCC participants indicated

that they did intentionally underperform on ESL placement tests (Table 4). Although these

percentages (20% & 18%) represent around half of that of the pilot study (40%), they are still

high and concerning for scholarship agencies. Given that 10,000 Saudi scholarship students

are annually sent to study abroad, this means that 1800 to 2000 of them are likely to

deliberately perform poorly on ESL placement tests. These percentages can cost scholarship

agencies tens of millions of dollars, as students will take unnecessarily additional courses.

41
Table 4

GCC Participants’ Responses to the Study Main Question

Did you intentionally, in one way or another, perform poorly on the ESL Placement Test to be placed in lower levels?

Answer Response %
18%
Yes 39

82%
No 177

Total 216 100%

One may argue that these data are insufficient; hence, more evidence is still needed to

gain further insight into the issue. Therefore, a group of ESL administrators and assessment

coordinators were surveyed. They were first asked if they have ever noticed any case of

students purposefully underperforming on ESL placement tests. Ten of the 17 participants

indicated that they had encountered some form of conscious attempts made by students to

perform poorly on ESL placement tests, yet with a very low percentage (<5%). More

specifically, five of these participants stated that the percentage of students intentionally

underperforming on the placement test is very low, probably around 1%, while two of them

indicated that it is likely less than 5%. In contrast, one of them indicated that she has

personally witnessed this behavior only twice in her entire 25-year ESL teaching experience.

Another participant stated that although she found some students deliberately

underperforming on ESL placement tests, she maintained that it is difficult to tell the rate.

Interestingly, only six of these participants indicated that they have never witnessed a student

purposefully underperforming on a placement test.

42
Clearly, there is a large discrepancy between the responses of ESL administrators and

assessment coordinators and those of students. This poses another issue whether this behavior

exists or not. It is possible that some ESL administrators might be unaware of their students’

intentions to underperform on ESL placement tests, which could account for this discrepancy.

Of course, this argument makes sense if the study had a large pool of data collected through a

long-term period, or perhaps data were triangulated. Otherwise, the discrepancy between

these two sets of percentages may question the accuracy of these findings, especially since no

concrete evidence of students’ work was collected such as writing samples, classroom

observations, and subsequent tests. Nevertheless, even with these two sets of data, one may

wonder why these students would still intentionally perform poorly on ESL placement tests.

Only those participants who deliberately underperformed poorly on ESL placement

tests were asked about their motives. That is, if a participant selected no, indicating that s/he

has never intentionally performed poorly on ESL placement tests, the survey automatically

skipped to the next section. The participants’ provided different, yet interrelated, reasons

accounting for their intentions to purposefully underperform on placement tests as shown in

Table (5). Six of the 14 participants indicated that they had failed placements test

purposefully in order to a have longer time to practice English, as the more ESL classes they

take, the more exposure to English they will have. Moreover, five of them indicated that they

had placed themselves into lower ESL levels so that they could have less effort in carrying

out ESL tasks, which would allow them to focus more on preparing for admission-required

standardized tests (e.g., TOEFL, GRE).

43
Table 5

Why Students Purposefully Underperform on a Placement Test

Responses
Answers

6
To have more time for learning English.

To have adequate time to adapt to the program and


2
educational system.

To have adequate time to adapt to the city


0
environment.

To focus more on preparing for standardized tests


5
(TOEFL, IELTS, GRE).

Other reasons. 1

Moreover, two participants justified their reasons by stating that this would help them

spend extra time in the ESL program to have more time to become fully familiar with the

American educational system. However, only one participant ascribed his intentions to “other

reasons”, without providing additional information in the subsequent question. Having seen

these findings, one may consider students’ justifications logical to some extent, provided this

behavior does not affect their current or future learning. However, some qualitative data is

needed to explain the quantitative data more convincingly.

During the interviews, three of the participants who purposefully underperformed on a

placement test argued that if they were placed into classes reflecting their actual levels of

language proficiency (e.g., intermediate, upper-intermediate), this would force them to

complete the English language program in a shorter period of time. In such cases, they are

44
required per SACM policies to obtain immediate admission to a university or return to Saudi

Arabia. These two options are mandatory because once a student completes his/her ESL

classes, their I20 for language studies is cancelled. If a student decides to intentionally

underperform on an ESL the placement test to be placed into a lower level, this will provide

them more time to: 1) learn English, 2) prepare for standardized tests, and 3) correspond with

or even visit several universities before choosing one. Moreover, three of the participants

contend that being placed into a lower ESL level allowed them to start learning English from

scratch.

Who encouraged these 14 participants to deliberately underperform on a placement

test? Surprisingly, eight of them indicated that their friends who had taken ESL placement

tests prior encouraged them to fail the test, suggesting that more focus be given to obtaining

university admission. That is, their friends told them that the higher the ESL level, the more

demanding the class assignments will be. This may not allow them to have sufficient free

time to prepare for the TOEFL, IELTS, or GRE. On the other hand, four participants stated

that no one told them to purposefully underperform on the test; rather, this decision was

based on personal motives. Another interesting finding is that two of these participants

indicated that they had no predetermined intention to intentionally underperform on the

placement test; however, shortly before taking the test, some students with whom they took

the test encouraged them to fail it. As for GCC students, many indicated that they paid more

attention to applying to universities than to engaging in advanced ESL assignments, a motive

that made them deliberately fail the test.

This brings the researcher to another question. If those students had no scholarships,

would they still have intentionally performed poorly on their ESL placement tests? Twelve

out of 14 participants indicated that they would not, under any circumstances, have
45
purposefully underperformed on the ESL placement tests if they had not been granted a

scholarship, as they are unable to afford paying the tuition fees. Nonetheless, only two

participants insisted that they would still fail the test without having a scholarship. Hence,

this suggests that since their tuition fees are paid by SACM, these students are not concerned

with any financial costs resulting from taking unnecessary, additional ESL classes. In

contrast, the findings of GCC participants were considerably consistent with the above

mentioned in that 26 out of the 39 participants, or 67%, indicated they would not

intentionally underperform on a placement test if they were responsible for paying tuition

fees.

Having discussed students’ different motives for deliberately underperforming on

ESL placement tests, one may wonder to what extent this issue affects the placement system

as a whole. To answer this question accurately, statistical evidence is needed whether a

placement test purposefully underperformed by a number of ESL students can affect a

program’s placement system. However, the researcher unfortunately had no access to

replacement rates or any other similar statistics for the three programs involved in the study.

Alternatively, some ESL administrators and assessment coordinators were asked about the

potential impact of having 18 to 20% of their students misplaced in their placement systems.

The findings revealed that only one participant indicated that these percentages would not

impact the entire program but would only impact these particular students’ placements. She

argued that her program has five different measures for placement, including a face-to-face

interview, which makes it difficult for students to consistently underperform on all of the

assessments deliberately.

On the other hand, the remaining 16 ESL administrators and assessment coordinators

expressed their concerns with such high percentages. For example, 13 pointed out that these
46
percentages would be problematic especially if there were several scholarship students

intentionally underperforming on their ESL placement tests. For example, one of these ESL

administrators said, “We would need to move them to the appropriate section and would

possibly need to add a class/take away a class; this would cause changes in the teachers'

assignments, groups of students in each class, etc.” (personal communication, February 11,

2016). This would therefore cause additional work and frustration for all involved.

Furthermore, two participants argued that such high percentages would lead to decreasing

student motivations for the self-misplaced students would not display the proper motivation

and work ethic needed in the classes they attended. In other words, studying with misplaced

students is likely to demotivate other appropriately-placed students, as there would be great

differences between these students.

Another participant indicated that such behavior would change the entire classroom

dynamic and could ultimately affect other students' perception of the seriousness of the class.

In addition, another participant stated that if the content validity of their placement test was

very high, having scholarship students deliberately underperforming on ESL placement tests

would be a serious issue. For example, she pointed out that her program’s placement test was

mostly abstracted from their program curriculum in order to ensure high content validity.

However, she continued, if 18% of prospective scholarship students intentionally perform

poorly on ESL placement tests, this would “create level drift in our program” (personal

communication, February 10, 2016). In addition, the remaining participants argued that if

they had 18% self-misplaced students, they would take immediate action to reexamine the

procedures of the placement test. As a consequence, ESL administrators’ responses suggest

that if this issue exists at these rates, this will create a great challenge for ESL practitioners.

47
Purposefully Underperforming on ESL Placement Tests: A Major or Minor Issue?

One of the key research questions of this study is the extent to which intentionally

underperforming on ESL placement tests is a serious or minor issue. Now that multiple sets

of evidence (i.e., anecdotes, pilot study, Saudi students, GCC students, and ESL

administrators) have been provided, one can, at least based on this study’s population, draw

some conclusions about the study’s issues. One issue is the discrepancy between students’

reported rates (18 to 20%) and those provided by ESL administrators (5% at most). That is,

one may argue that having 20% out of 71 and 18% out of 216 participants claim that they had

once deliberately underperformed on an ESL placement test would be adequate evidence to

consider this behavior a negative widespread phenomenon. Potential proponents of this

position are likely to consider student assertion to be strong evidence that this behavior exists,

perhaps with higher percentages.

Nevertheless, students’ assertion that they purposefully performed poorly on ESL

placement tests might be considered inadequate to make these high percentages

generalizable. For example, without obtaining concrete evidence of participants’ work

collected two weeks past the placement test date, observing students’ performance, or

monitoring the study program placement systems, it is difficult to consider this behavior as

either widespread or detrimental to the students and/or the programs in which they are

enrolled. In fact, both arguments can be valid for two reasons. First, the discrepancy between

the students’ reported rates and those of ESL administrators does not necessarily mean that

the behavior is a minor issue simply because students are unlikely to inform their teachers of

their intentions to fail a test. In other words, these participants find no plausible reasons for

telling ESL administrators that they deliberately underperformed on ESL placement tests.

48
Second, the other argument seems also logical since drawing generalizable conclusions of

such issue requires stronger evidence.

As a compromise between these two positions, the Saudi researcher, who was a

previous ESL student and is currently an EFL teacher, contends that although the issue at

hand exists based on the study data, such data are inadequate to consider it a common

phenomenon. This requires a much larger number of participants to corroborate the data even

though the issue still occurs in several different contexts. For example, during the writing of

this study, the researcher discovered four other cases of Saudi students intentionally

underperforming on ESL placement tests that were not included into this data set; however,

documenting them poses another perplexing problem, student dishonesty. For example, one

of the study’s three ESL programs students stated that although he had completed an

intensive English course at ARAMCO and had obtained a TOEFL score that meets admission

requirements, he decided to fail the test unlike his classmates, who took it seriously and were

subsequently placed in inter-mediate ESL levels. He did this because he plans to prepare for

the GRE so that he can obtain admission from another university.

DISCUSSION AND CONCLUSION

According to the data released by the Institute of International Education (IIE) on

Intensive English Programs (IEP), the total number of Saudi IEP students enrolled in U.S.

programs reached 32,557, ranking them top on the list of places of origin of IEP students in

the country in 2014. Despite this large number, some Saudi ESL students are placed into

lower than-expected levels. This can be attributed to numerous factors such as students’ low

levels of language proficiency, students’ lack of adequate exposure to the target language,

and so forth. However, based initial anecdotal evidence and pilot study, the findings indicated

that 11 out of 27 (40%) Saudi ESL students pointed out that they had intentionally
49
underperformed on university ESL placement tests for several reasons. Both the anecdotal

evidence and pilot study helped the researcher to rationalize the study issue.

Surprised by these findings, the researcher decided to explore more tangible evidence

in the literature of ESL students deliberately performing poorly on tests. At first, he

encountered difficulty locating resources addressing the topic, so he decided to collect more

data from a larger population. The first set of data showed that 14 out of 71 (20%) of Saudi

participants reported that they did purposefully perform poorly on ESL placement tests. To

examine the accuracy of this percentage, 216 GCC ESL students were surveyed about the

same issue of which 39 of them (18%) indicated that they had intentionally underperformed

on ESL placement tests. In order to gain further insight into this issue, 17 ESL administrators

and assessment coordinators were surveyed. Ten out of 14 participants indicated that they had

seen some form of conscious attempts made by some ESL students to deliberately

underperform on ESL placement tests, however, at a very low percentage (<5%).

The present study revealed contradictory findings between students’ reported

percentages of purposefully underperforming on ESL placement tests and those encountered

by some ESL administrators and assessment coordinators. In other words, although the

findings of the pilot study, Saudi ESL students, and GCC students were high (40%, 20%, and

18% respectively), the ESL administrators reported significantly lower percentages. This

discrepancy between the percentages can be ascribed to numerous reasons. First, the behavior

may indeed exist at these high percentages, but the self-misplaced students did not inform

ESL administrators of their motives. Second, these percentages might be exaggerated due to

impacts caused by some of the survey’s statement wording, which may have led students to

desired findings. Such discrepancies suggest that further research studies be conducted in

order to recruit more participants, obtain concrete evidence of participants’ work collected
50
two weeks past the placement test date, and monitor placement systems in order to make data

more valid.

Although the study does not have evidence besides students’ reports to have

purposefully underperformed on ESL placement tests, such findings suggest that immediate

action be taken by SACM for several reasons. First, if such behavior exists at these rates, it

would cost scholarship agencies additional money without gaining the desired outcomes.

Second, this behavior is likely to lead students to not take ESL classes seriously because they

are taking ESL levels far below their English proficiency level in order to focus more time

and energy for obtaining university admission. Third, these students might not improve their

English skills by attending lower ESL levels. Fourth, some ESL programs offer English

language endorsements by which the minimum TOEFL score requirements for university

entry can be waived. In such cases, these students might delay this chance, as they have

placed themselves in lower levels, and it will take longer to reach the higher levels that allow

them to apply for endorsement.

Regardless of the differences between the percentages of the study’s participants, it

can be concluded that deliberately underperforming on ESL placement tests indeed exists and

needs to be addressed. Therefore, this study proposes a number of concrete implications that

will help mitigate or even eradicate the problem. Prior to implementing any of the subsequent

implications, it is highly recommended that SACM, the Saudi Ministry of Education, or any

other scholarship agencies survey their students to identify their language needs upon which

the implications can be implemented. In other words, it might not be effective to take any

action regarding this problem without carefully examining students’ motives behind

deliberately underperforming on ESL placement tests. For example, students should be first

asked about their experiences in taking ESL classes in general and ESL placement tests in
51
particular in addition to any issues they confronted during their ESL experiences. This will

certainly help provide concrete implications based on students’ real issues pertaining to ESL

placement tests.

LIMITATIONS

One of the primary limitations of this study is that it lacks supporting data with

concrete evidence of students’ language work. In other words, one may assume that it may be

insufficient to ask ESL students if they have ever deliberately underperformed on an ESL

placement test, but rather data should have also included samples of students’ classroom

work (e.g., writing samples). This would enable us to compare participants’ responses to their

responses to the survey and interview, provided that these samples were written within a

short period of time of taking the ESL placement tests. Unfortunately, the researcher was

unable to obtain students’ language work due to fear of placing substantial burden on

participants to take part in the study. As a consequence, this option was excluded from the

study. To overcome this limitation, a larger number of participants (GCC students) were

recruited in order to support the existing data.

Another limitation of this study is concerned with the wording of the survey

questions, especially that of the study’s main question, “Did you intentionally, in one way or

another, perform poorly on the ESL Placement Test to be placed in lower levels?” One may

argue that participants should not have been asked directly if they have ever purposefully

underperformed on an ESL placement test since such wording may lead them to the

researcher’s desired findings. Alternatively, the researcher should have let the participants

talk about their experiences so that any naturally arising issues pertaining to the study’s

targeted behavior could be documented and further explored. However, this strategy, albeit

successful, is time-consuming and sometimes difficult. Therefore, in case another researcher


52
decides to conduct the same study in the future, s/he should make sure to avoid directly

asking the participants about this behavior. Instead, s/he should find another effective strategy

that helps his/her yield some answers that arise naturally.

Another limitation is that one of the survey questions asked participants about the

reasons why they intentionally performed poorly on the ESL placement tests. They were

given a close-ended format of options consisting of four main reasons accounting for their

motives to deliberately underperform on a test. The options were as follows:

1) to have more time to learn English

2) to adapt to the program and educational system

3) to adapt to the city environment

4) to prepare for standardized tests (TOEFL, IELTS, GRE)

Although the participants were allowed to select all choices that applied, the wording of this

question, in addition to the limited choices, could still make participants’ responses very

restrictive. Moreover, giving limited choices might implant some ideas that push the

participants towards certain preferred answers. To address this limitation, the participants

were provided with a subsequent question, “If you have any other reasons, please mention

them here,” in order to allow them to provide reasons other than those listed.

In addition to the aforementioned limitations, another limitation was found. That is,

the researcher’s objectivity was challenged at some of the data collection stages. For

example, in one of the interviews, the researcher asked a participant about some introductory

questions about ESL placement tests so that the participant could not be led to any desired

findings. However, the participant was off topic, discussing the accuracy of ESL placement

tests in detail. Thus, the researcher repeatedly attempted to redirect the discussion to the

study’s issue, which may have impacted some of his responses. For example, the researcher
53
asked the participant to assume he had no scholarship, would he still throw the test? The

participant seemed to be very hesitant to respond, and he did not mention that he had ever

deliberately underperformed any ESL placement test with having a scholarship. Hence, the

researcher felt that the participant was trying to find the answer that best fit the question.

IMPLICATIONS

Pedagogical Implications

Based on participants’ responses, one of the most intriguing implications of this study

is the notion that scholarship students should be enrolled in English programs in their home

country prior to studying abroad. There are several successful examples of this approach. For

example, through the College Degree Program for Non-Employees, that Saudi Aramco

Petroleum Company grants scholarships for qualified high school graduates to pursue

bachelor degrees in certain majors. Before studying abroad, the students are enrolled in a ten-

month language preparation program (Saudi Aramco College Preparatory Program), where

they are taught intensive English and some preparatory courses in their intended field of

study (Al Murshidi, 2014). After completing the program successfully, students are then

sponsored to study abroad. Despite the criticisms leveled at ARAMCO for having students

“come directly to US universities without attending language preparation programs in the

US” (Al Murshidi, 2014, p. 41), these preparatory programs have been very helpful for

ARAMCO students before they study abroad.

This implication is consistent with the TESOL position statement that “It is important

that the sponsors either send students already at a high enough proficiency level to progress

sufficiently within the sponsor’s time limit or that the sponsors recognize that additional time

may be necessary for English language study” (TESOL, 2010, p. 2). Thus, it is recommended

that the Saudi Ministry of Education consider engaging scholarship students in intensive
54
English courses prior to studying abroad. If this implication is considered undesirable, Saudi

ESL students should then be given more time to learn English in the target country, which

will allow them adequate time to develop their English skills. Moreover, it will provide them

with sufficient time for preparing for admission-related tests such as the TOEFL, GRE, and

GMAT.

Implications for Scholarship Agencies

One of the survey statements asked students about what the Ministry of Education

should do to help their ESL learning experiences. The vast majority of the Saudi and GCC

students chose the abovementioned two suggestions, providing students with English

programs prior to studying abroad and extending ESL study period. However, one of the

proposed suggestions that many participants unexpectedly selected was that a certain TOEFL

or IELTS score should be required for obtaining a scholarship. At first, the researcher thought

that the participants had selected this suggestion because it had been introduced to them in a

structured-based survey; nevertheless, the participants’ responses to the following textbox

item actually showed the contrary. That is, 12 out of 71 Saudis and 28 out of 216 GCC

students indicated since they found difficulty obtaining the minimum scores on the TOEFL or

IELTS required to meet admission requirements, it would be more useful if a specific score

was required prior to studying abroad. Although one may doubt these students’ willingness to

accept this high-stake resolution, it is still worth considering.

Some of the participating students provided another effective and low-stake

suggestion that could help them overcome some of the difficulties they face during their ESL

learning experiences. They suggested that before studying abroad, they should be provided

with preparation programs for standardized tests (e.g., TOEFL, IELTS, GRE, GMAT). For

example, some participants stated that they lack the basic test-taking skills needed for high-
55
stake standardized tests, as opposed to, for example, Japanese, Korean, and Singaporean high

school students who are required to take exit exams. Other participants stated that it is quite

difficult to focus on ESL program assignments and admission language requirements at the

same time. As a result, they suggested that one of them should be provided in Saudi Arabia

prior to studying abroad so that students can concentrate well on both of them upon arrival in

the host country.

In case none of the aforementioned implications is applicable for one reason or

another, scholarship students should then be given mandatory exam preparation programs

about admission-related tests (e.g., TOEFL, GRE, GMAT). Although Saudi ESL students

have unlimited opportunities to take these tests at the expense of SACM, these exam prep

programs should be part of the scholarship requirements so that students will take them

seriously. Given that these programs are optional, some students indicated that they take them

to avoid visa issues rather than for developing their test taking skills. For example, one of the

participants pointed out that he had completed all ESL levels, so his I20 was about to expire.

To avoid having a hold placed on his scholarship, losing tuition fees and stipend, or worse,

being deported from the U.S., he took a GRE course to extend his I20.

Implications for Both Scholarship Agencies and ESL Administrators

Given that regular ESL placement decisions are not as accurate as desired (Hughes &

Scott-Clayton, 2010), purposefully underperforming on an ESL placement test may therefore

exacerbate this issue and cause dire consequences. To overcome this issue, substantive efforts

should be exerted to raise students’ awareness of the adverse impact of being placed into

improper ESL levels including but not limited to susceptibility to boredom, laziness,

demotivation, uncertainty of obtaining college admission, and delay of achieving academic

goals (Banegas, 2013). The awareness-raising process should be systematic in a way that
56
informs students of the negative consequences of intentionally underperforming on placement

tests. For example, this issue can be included in scholarship regulations, addressed during

orientation programs in Saudi Arabia, or constantly reminded and sent to students’ emails by

their sponsors. In conclusion, future researchers are advised to make sure that students do not

report having purposefully underperformed on ESL placement tests because they do not want

to appear weak in their English skills. Moreover, they should take into account that some

students might refrain from participating in such a study, as it may question their honesty.

57
CHAPTER 3 INTEGRATING SELF-ASSESSMENT TECHNIQUES INTO L2

CLASSROOM ASSESSMENT PROCEDURES

58
ABSTRACT

Traditional assessment instruments are often considered to be “the realm of the

teacher” (Chen, 2008, p. 238). However, to promote students’ learning autonomy and

involvement in classroom activities, other alternative assessment forms have been introduced

to L2 learning contexts including performance assessment, portfolio assessment, peer-

assessment, and journal assessment (Brown, 2010; Shohamy, 2001). One of the more widely

recognized techniques of alternative assessment is student self-assessment (Boud, 1991;

Ekbatani & Pierson, 2000). Unfortunately, most EFL departments in Saudi Arabia lack

integrating alternative assessment techniques into assessment processes (Al Asmari, 2013;

Javid, Al-thubaiti, & Uthman, 2013).

As a result, this article aims to explore the use of alternative assessment forms in

English as a Second Language (ESL) and English as a Foreign Language (EFL) classroom

contexts. It examines the accuracy of ESL students’ self-rated scores using CEFR self-

assessment rubric compared to their recently obtained TOEFL scores. It also explores

whether gender and levels of language proficiency are major influential factors for causing

score underestimation. The participants of this study are 21 ESL students who are attending

the Intensive English Program (IEP) at the Center for English as a Second Language at the

University of Arizona in Tucson. Based on CESL’s class system, their levels of proficiency

are intermediate (n=5), upper-intermediate (n=8), and advanced (n=8). The findings revealed

no statistically significant correlation between the participants’ self-assessed scores and their

TOEFL scores. However, based on the qualitative data, the participants reported that they

find CEFR self-assessment rubric accurate in reflecting their levels of language proficiency.

Key Words: Self-assessment, Common European Framework of Reference (CEFR),

ESL, EFL, language descriptors, ANOVA


59
INTRODUCTION

The past three decades have witnessed a paradigm shift in L2 contexts from teacher-

to student-centered teaching (Benson, 2001, 2012; Brown, 2007; Ellis, 1994; Nunan, 1988;

Tarone & Yule, 1989; Tudor, 1993, 1996). This new pedagogical approach originated as a

reaction to several criticisms raised by some L2 researchers who had suggested that

substantive attention be paid to L2 learners rather to language forms and structures (Bachman

2000; Ellis, 1994; Huba & Freed, 2002; Nunan, 1988). This has led ESL practitioners and

curriculum designers to “take account of learners’ needs or preferences” as a step for

involving L2 students in language learning processes (Benson, 2012, p. 33). Later, self-

directed learning was introduced into L2 learning contexts, as a stronger form of learner-

centered education (Ekbatani & Pierson, 2000).

In parallel with this paradigm shift, other researchers have suggested that learner

involvement in language learning not be limited to learning aspects; rather, it should also

include engaging them in assessment processes (Ekbatani & Pierson, 2000). For example,

Nunan (1988) argues that to make a learner-centered curriculum more effective, “both

teachers and learners need to be involved in evaluation” (p.116). Thus, several alternative

assessment approaches have been widely used in ESL contexts. Among these approaches is

learner’s self-assessment, which refers to involving learners in making decisions about their

own learning. The effectiveness of self-assessment has long been discussed in the literature

(Cassidy, 2001; Ekbatani, 2011; Harris, 1997; LeBlanc & Painchaud, 1985; McNamara,

1995). Hence, this paper investigates the extent to which self-assessment, as a key element of

self-directed learning, can be an accurate and reliable measure for ESL and EFL learners to

identify and monitor their levels of language proficiency and serve as a supplementary

technique to classroom language assessment tools.


60
Definition

Self-assessment refers to “all judgments by learners of their work” (Taras, 2010, p.

200). Similarly, O'Malley and Pierce (1996) define self-assessment as “an appraisal by an

individual of his or her own work or learning process” (p. 240). On the other hand, Andrade

(2007) expanded more on self-assessment and defined it as

a process of formative assessment during which students reflect on and evaluate the

quality of their work and learning, judge the degree to which they reflect explicitly

stated goals or criteria, identify strengths or weaknesses in their work and revise

accordingly. (p. 160)

Self-assessment can be used for placement (Bachman, 2000), achievement (McDonald &

Boud, 2003), and diagnostic (Andrade & Du, 2007) purposes. Moreover, it can be used “to

detect changes and patterns of development over time” (Dörnyei, 2001, p. 194).

Benefits and Limitations of Self-assessment

Research has documented a plethora of benefits and limitations of self-assessment.

For example, benefits of self-assessment include promoting students’ learning autonomy

(Ekbatani, 2011; Harris, 1997; McNamara & Deane, 1995). Moreover, Brown and Hudson

(1998) point out that self-assessment facilitates student involvement in their own learning,

promote their learning responsibility, and increases their motivation. Furthermore, it helps

prevent any form of cheating (LeBlanc & Painchaud, 1985) and enables students to monitor

their learning (Harris 1997). However, there are some limitations of self-assessment one of

which is subjectivity in that students are likely to be “either too harsh on themselves or too

self-flattering” (Brown, 2001, p. 145). Moreover, Tara (2008) argues that when students have

inadequate experience in self-assessment, they may “judge their own work within their own

limitations” and end up considering their performance appropriate (p. 86).


61
LITERATURE REVIEW

Impact of Classical Teaching Methods on Self-assessment

Before discussing self-assessment, it is important to explore how it originated in L2

learning contexts and how early teaching methodologies have significantly influenced its use.

Before the 1970s, most if not all earlier second language teaching methodologies were

primarily concerned with analyzing language forms and structures (Brown, 2007). For

example, grammar-translation method focuses particularly on teaching grammar explicitly,

memorizing isolated words, and translating literary texts (Larsen-Freeman, 1986; Mitchell &

Vidal, 2001). This classical method, in addition to the audio-lingual method, was deemed to

neglect the learners’ role in language learning and merely considered them to be a passive

entity. As a consequence, an alternative approach was needed to empower the learners’ role

in language learning.

Impact of Communicative Language Teaching on Self-assessment

Later, Communicative Language Teaching (CLT) emerged in the U.S. and across

Europe as an alternative approach to previous teaching methodologies (Brown, 2007;

Kumaravadivelu, 2006; Nunan, 1988; Widdowson, 1978). This dynamic approach focuses on

all components of language so that learners have the opportunity to demonstrate their

language abilities more effectively and be more active role players in language learning

(Brown, 2007; Lee & VanPatten, 2003; Nunan, 2003). Benefits of the communicative

language teaching approach led several L2 researchers and theorists to conduct an extensive

number of research studies aimed at promoting the vital role of L2 learners in language

learning processes (Ellis, 2003). Forms of CLT such as task- and process-based have both

indicated that learners are evidently capable of taking charge of their own learning, while

teachers can act as facilitators and model/scaffold providers (Johnson, 2003; Nunan, 2004).
62
Learner-centeredness

During the 1980s, research stressed the need for learner-focused teaching, a

pedagogical approach that places substantive emphasis on “learners and learning in language

teaching, as opposed to a focus on language and instruction” (Benson, 2012, p. 30). Learner-

focused teaching was initially proposed through the work of Nunan (1988), Tarone and Yule

(1989), and Tudor (1993). Their contributions were followed by numerous research studies

that examined the impact of learner-centered teaching on promoting L2 learning, thereby

leading to a radical paradigm shift from teacher-centered to learner-centered education.

Learner-centeredness encompasses several pedagogical approaches including “negotiated

curriculum” (Nunan, 1989), “needs analysis” (Richards, 2001), “learner training” (Wenden,

1995), and “learning styles” (Brown, 2007). Among these types of learner-centeredness is

self-directed learning (Nunan, 1988), a concept that is considered as the theoretical

underpinning of self-assessment (Ekbatani & Pierce, 2000).

Self-directed Learning

Prior to discussing the impact of self-directed learning on introducing self-assessment

into L2 learning, a key issue concerning potential misconceptions between self-centered and

self-directed learning is noteworthy. Self-centered learning “takes account of learners’ needs

or preferences” (Benson, 2012, p. 33); nevertheless, it does not necessarily involve giving

students some control on their learning nor does it include obtaining their consultation about

that learning (Brown, 2007). On the other hand, learner-directed learning involves learners’

planning, monitoring, and evaluating of their learning (Garrison, 1997). Benefits of self-

directed learning include empowering learner autonomy (Benson 1995; McNamara & Deane,

1995), increasing learner motivation (Dörnyei, 1994), and promoting learner self-confidence

(Taylor, 1995), which have introduced self-assessment into language learning (Brown, 2010).
63
Self-assessment

Self-assessment was derived from the theoretical underpinnings of learner autonomy,

a principle that was first coined by Henri Holec (1981). This principle stresses the importance

of learners’ ability to be accountable for their own learning including setting their learning

goals, monitoring their performance, taking learning decisions, and developing their

motivation, which all reflect the characteristics and values of self-assessment (McNamara &

Deane, 1995). Thus, self-assessment has dramatically influenced second language assessment

in many ways including contributing to placing L2 learners into proper levels, reporting their

potential areas of strengths and weaknesses, providing them with feedback, and assessing

their attitudes (Saito, 2003). However, in its initial uses, as Brown (2010) noted, self-

assessment was deemed as “an absurd reversal of politically correct power of relationships”,

contending that some L2 learners, especially novice learners, are unlikely to be capable of

reporting an accurate assessment of their own performance (p. 144). On the contrary, research

has outlined a myriad of empirically proven benefits of self-assessment in second language

learning, which will be discussed thoroughly in this paper. However, what best accounts for

the efficacy of self-assessment instruments depends on examining how accurate they are

compared to traditional assessments (e.g. teacher-made tests).

Accuracy of Self-assessment

Numerous studies have investigated the effectiveness and accuracy of self-assessment

in second language learning (Andrade & Valtcheva, 2009; Bachman & Palmer, 1989;

Blanche & Merino, 1989; Butler, 2010; Cassidy, 2001; Garrison, 1997; Jassen-van Dieten,

1989; LeBlanc & Painchaud, 1985; McDonald & Boud, 2003; Oscarson, 1997; Taras, 1995,

2001). Most of these studies were correlational-oriented, through which two forms of

assessment are compared in order to yield any potential correlation or lack of it.
64
Correlations

Drawing on Raasch’s (1979) study, von Elek (1982) argued that the validity of self-

assessment can be similar to or at least not substantially lower than that of traditional

assessment instruments. In the same vein, LeBlanc and Painchaud (1985) found evidence of a

relatively positive correlation between a self-assessment instrument and a standardized

English proficiency exam (r = .53). Likewise, in her pilot study, Jassen-van Dieten (1989)

compared a self-assessment format with that of a traditional placement test of Dutch as a

second language though she focused only on grammar and reading skills. Results

demonstrated a higher correlation between the two instruments (ranging from .60 to .79). She

concluded that, out of 25 students, 21 were able to place themselves in the same levels into

which the traditional placement test had placed them.

In the same manner, Bachman and Palmer (1989) suggested that self-assessment “can

be reliable and valid measures of communicative language abilities” (p. 22), indicating that

their techniques had demonstrated unexpected high reliability. Moreover, Cassidy (2001)

reported higher correlations (r = .87 to .97) between students’ self-reported scores and their

actual SAT scores. In addition, in his meta-analysis validation study, Ross (1998) found that

self-assessment techniques could provide high validity, suggesting that “the degree of

experience learners bring to the self-assessment context influences the accuracy of the

product” (p. 16). However, he also noticed that the participants were more accurate in

assessing their receptive skills (listening and reading) than those of their productive skills

Discrepancies

Other studies have reported conflicting conclusions. For example, unlike her

aforementioned study that was limited to assessing students’ grammar and reading skills,

Janssen-van Dieten (1989) later conducted a study on all four language skills - listening,
65
speaking, reading, and writing - to obtain a broader perspective of any potential correlation

between the two instruments. She reported little relationship between self-assessment and

previously validated proficiency tests. Moreover, Wesche, Morrison, Ready and Pawley

(1990) found that self-assessment did not show any statistically significant correlation with

traditional tests. Furthermore, Pierce, Swain and Hart (1993) found weak correlations

between self-assessment proficiency tests and traditional assessments of students attending

French immersion programs. In addition, Wesche (1993) found that “placement via self-

assessment was extremely unreliable” under any circumstances (p. 15).

Reasons accounting for the discrepancies between self- and traditional assessment

tools vary depending on certain factors including but certainly not limited to “the linguistic

skills and materials involved in the evaluations” (Blanche & Merino, 1989, p. 315). Other

factors may include the type of language tasks used in both instruments, the degree of task

authenticity, learners’ levels of proficiency, cultural differences, and exposure to self-

assessment (Blanche, 1988; Coombe, 1992; Oscarson, 1989; as cited in Wolochuk, 2009).

How Can the Accuracy of Self-assessment Be Achieved?

Despite the fact that self-assessment has received heated discussions in the literature,

many researchers have proposed effective implications that help validate self-assessment

techniques. Harris (1997), for example, suggests that self-assessment rubrics be clearly

specified in advance so that students can pinpoint their areas of strengths and weaknesses.

Moreover, students should receive adequate training on using self-assessment rubrics in order

for them to gain insight into the target evaluation criteria (Taras, 2003). However, Brown and

Hudson (1998) argue that self-assessment might be more effective when used for research

rather for placement or diagnostic purposes since the former is unlikely to make learners

over- or underestimate their performance.


66
Common European Framework of Reference (CEFR)

In 1996, the Council of Europe published the first version of the Common European

Framework of Reference (CEFR) (Council of Europe, 1996b). CEFR is an international

language standard that describes language proficiency “through a group of scales composed

of ascending level descriptors couched in terms of outcomes” distributed at six different

levels: A1 and A2, B1 and B2, C1 and C2 (Weir, 2005, p. 281). These six levels fall into

three broad levels of proficiency: Basic User (A1, A2), Independent User (B1, B2), and

Proficient User (C1, C2) (See Appendix F). Each of these levels “attempt[s] to specify as full

a range of language knowledge, skills and use as possible” through which CEFR users can

identify their levels of language proficiency (Council of Europe, 2001, p. 7). As of 2014, the

website of Council of Europe indicates that CEFR language proficiency descriptors have

been translated into approximately 39 different languages (Council of Europe, 2014).

In a long-term research project, which is widely recognized as the ‘Can-do’ Project,

the Association of Language Testers in Europe (ALTE) developed and validated user-

oriented and performance-related scales anchored to CEFR levels “to establish a framework

of ‘key levels’ of language performance, within which exams can be objectively described”

(Council of Europe, 2001, p. 244). This led to developing the CEFR self-assessment scales

with six different levels. Each level consists of Listening, Reading, Spoken Interaction,

Spoken Production, Strategies, Language Quality, and Writing. According to the CEFR self-

assessment rubric, in order for CEFR users to reach a given level, they would need to respond

as ‘I-can’ to more than 80% of the given level’s ‘Can-do’ statements. That is, after reading

CEFR self-assessment rubric language descriptive tasks (statements), users should assess

their abilities to perform these tasks by ticking either ‘I-can-do’ or ‘I-can-not-do’. If a user’s

overall number of ticked ‘I-can-dos’ are 80% or above, this places him/her in the given level.
67
Validity of the CEFR Self-assessment Rubric

The validity of the CEFR self-assessment rubric has long been investigated in the

literature. For example, Alderson et al. (2004) contend that although CEFR itself provides

comprehensive language descriptors, its ‘I-can-do’ “scales provide a taxonomy of behaviors

rather than a theory of development in listening and reading activities” (p. 3). In addition,

Huhta et al., (2002) argue that “the theoretical dimensions of people’s skills and language use

which CEFR discusses are on a very high level of abstraction.” (p. 133). In investigating the

validity of CEFR self-assessment used by refugees in Ireland, Little, Lazenby Simpson, and

O’Connor (2002) found that “evidence of the difficulty that learners encounter in using the

CEFR to maintain on-going reflective self-assessment suggests a need for more detailed

descriptions of proficiency relevant to particular domains of language learning” (p. 64).

Moreover, Jones (2002) points out that “different people tend to understand ‘Can-do’

somewhat differently” (p. 181), thereby creating discrepancies in their self-rated scores. All

of the above raise legitimate concerns about the validity of CEFR self-assessment rubric.

On the contrary, Little (2006) argues that one of the advantages of the CEFR is its

ability “to bring curriculum, pedagogy and assessment into much closer interdependence than

has usually been the case” (p. 382). Despite what was reported by Huhta et al. (2002) above,

this multi-tasking ability engages CEFR users in ‘action-oriented’ language scenarios through

which they can demonstrate their language abilities (Council of Europe, 2001). Moreover,

drawn upon the (Basic Interpersonal Communication Skills) and (Cognitive Academic

Language Proficiency), North (2007) argues that the CEFR distinguishes between productive

and receptive language use by dividing the former into interaction and production, resulting

in “34 illustrative scales for listening, reading, oral production, written production, spoken

interaction, written interaction, note-taking, and processing text” (p. 646).


68
METHODOLOGY

Significance of the Study

The researcher, as a previous EFL student and current Teaching Assistant at a

university EFL department in Saudi Arabia, has first-hand experience regarding the lack of

using alternative assessment techniques in many EFL departments in Saudi Arabia. Thus, this

article emphasizes the importance of incorporating self-assessment rubrics into classroom

assessment processes as an alternative assessment technique. In other words, it recommends

that substantive efforts be exerted to use alternative assessments by not only L2 programs in

Saudi Arabia, but also in many other L2 contexts. The study involved 21 ESL students; 18 of

them are Saudis who represent a sample of a larger population of Saudi L2 learners.

The ultimate purpose of conducting this research was to promote incorporating self-

assessment techniques into EFL classroom assessment processes. Because some EFL

departments in Saudi Arabia fail to integrate alternative assessment techniques into

assessment processes, this paper aims to promote the use of alternative assessment

techniques, in particular self-assessment, not only in EFL contexts in Saudi Arabia, but also

in other L2 contexts. This paper examines the accuracy of ESL students’ self-rated scores

obtained by CEFR rubric compared to recently obtained TOEFL scores. Finally, the paper

will provide EFL departments in Saudi Arabia and other L2 practitioners with implications

on how self-assessment techniques can be effectively implemented in L2 settings.

Research Questions

1. Do ESL students’ self-rated scores correlate with their TOEFL scores? If not, why?

2. Are gender and levels of proficiency major influential factors for causing any potential

score underestimation?

3. What are ESL students’ attitudes towards a ‘Can-do’ self-rating rubric?


69
Research Tools

A mixed-methodology approach was used to collect data for this study using three

main research tools. First, there was a web-based CEFR self-assessment rubric (Appendix B),

which has already been mapped onto the TOEFL proficiency levels. It would, however, be

difficult to use the whole rubric, for it contains 227 ‘I-can-do’ and ‘I-cannot-do’ CEFR

statements, which can be time-consuming to use in its entirety and possibly inapplicable in

some situations. Thus, only levels B1, B2 and C1, which are equivalent to the TOEFL

intermediate, upper-intermediate, and advanced proficiency levels, respectively, were

selected. The second research tool was semi-structured interviews conducted with those who

provided their emails to participate in the second round of data collection (Appendix H). The

third research tool was participants’ most recent TOEFL scores, which were obtained based

on student consent.

Participants

The participants of this study were 24 (14 male and 10 female) ESL students

attending CESL (Center for English as a Second Language), University of Arizona. Their

levels of proficiency were intermediate (n = 7), upper-intermediate, (n = 9) and or advanced

(n = 8). In addition, they were attending CESL levels 3 through 7 in a seven-level program.

Their nationalities, as shown in Table 6, are Saudi (n=18), Chinese (n=3), Mexican (n=2),

and Qatari (n=1). For confidentiality purposes, their names were coded by their initials,

which they were requested at the beginning of the study rubric. Three of the participants were

excluded from the analyses because they did not provide their TOEFL scores, nor did they

provide their emails, leaving 21 participants. Moreover, after contacting those who provided

their emails, ten out of these participants agreed to participate in the interviews in order to

allow them to expand more on their attitudes towards using CEFR self-assessment rubric.
70
Table 6

The Participants in This Study

No. Gender Level of English Proficiency


1 Female Intermediate
2 Male Intermediate
3 Female Intermediate
4 Male Intermediate
5 Male Intermediate
6 Male Upper-intermediate
7 Male Upper-intermediate
8 Male Upper-intermediate
9 Male Upper-intermediate
10 Female Upper-intermediate
11 Female Upper-intermediate
12 Male Upper-intermediate
13 Male Upper-intermediate
14 Male Advanced
15 Female Advanced
16 Female Advanced
17 Female Advanced
18 Female Advanced
19 Male Advanced
20 Female Advanced
21 Male Advanced

Background on Site

The Center for English as a Second Language (CESL) is a nationally accredited IEP

located at The University of Arizona, in Tucson, Arizona, USA (CESL, 2014). Its IEP

consists of seven levels; each level is comprised of an eight-week session leading to optional

bridge programs for either undergraduate or graduate students It is also one of the ESL
71
programs highly recommended by SACM (SACM, 2015). CESL provides three types of

programs. First, there are the English language programs, which include eleven programs:

ESL graduate and undergraduate bridge programs, evening programs, intensive English (full-

and part-time) programs, local portable classes, online programs, skill classes, skill intensive

workshops, a teen English program, and tutoring. Second, it provides two different teacher

training programs: general teacher training and Content Area Teacher Training (CATT)

(CESL, 2014). Third, it offers customized programs.

Procedures

A randomly selected group of CESL students, who were soon going to take or had

just taken the TOEFL, were recruited. To encourage them to participate in the study, they

were offered a free one-time proofreading for one of their class papers (10 pages or less). The

recruitment flyer (Appendix I) was sent to these students’ emails by CESL the IEP

coordinator. The flyer contained links to the self-assessment rubric, which correlated to the

three levels of English proficiency that the participants could select (intermediate, upper-

intermediate, or advanced) to self-determine their current CESL levels. The participants were

also told that they should use the rubric two weeks at most prior to or right after taking the

TOEFL test in order to avoid any potential effect of learning progress between the study’s

two sessions. Next, they were asked for their permission to provide their TOEFL scores. The

first part of the rubric explained in detail the entire self-rating process in order to ensure that

the participants understood it fully.

After that, participants’ self-rated scores were re-examined to ensure that they

satisfied the study’s conditions. For example, those who did not provide their TOEFL scores

were contacted via email a month later because two weeks were needed as a maximum

timeframe between the two sessions, and two other weeks were required for TOEFL scores to
72
be released by ETS. Fortunately, sixteen participants had taken the TOEFL test on January

30, 31, or February 7, 2016, before they used CEFR self-assessment rubric in mid- and late-

February. Hence, they provided their TOEFL scores before they performed the self-

assessment rubric. As for those who did not provide their TOEFL scores, two of them

responded and provided their TOEFL scores, whereas the remaining three participants did not

respond and were thereby excluded from the study. Finally, those who provided their emails

in the first session were asked via email to participate in the semi-structured interviews. Two

days later, 10 out of 17 participants agreed to participate in the interviews through Skype,

email, or Tango.

Before analyzing the data, the CEFR, TOEFL, and IELTS equivalency table

developed by (Tannenbaum & Wylie, 2007) was used to compare participants’ self-rated

scores against their TOEFL scores (See Appendix F) At first, the researcher attempted to

color-code each level of the equivalency table; nevertheless, this strategy appeared to be very

confusing and even seemingly misleading (see Appendix J). As a result, given that the study

has only three levels of English proficiency (intermediate, upper-intermediate, advanced), the

Basic and Proficient levels were excluded from the study equivalency table in order to make

the table readable and more meaningful. Table 7 illustrates how the range of scores of the

TOEFL and IELTS are constructed to correspond to CEFR different levels of language

proficiency.

Table 7

The Study Adapted CEFR, TOEFL, and IELTS Equivalency Table

Levels of Proficiency CEFR TOEFL IBT IELTS


Intermediate B1 30 – 64 3.0 – 4.0
Upper-intermediate B2 65 – 78 4.5 – 5.0
Advanced C1 79 – 95 5.5 – 6.0
73
After that, the 80% ratio needed for meeting each level was determined. For example,

since intermediate and upper-intermediate levels have 42 statements, their ratio is (36.6). In

the same vein, the advanced level has 37 statements; thus, its ratio is (29.6). After that,

intermediate and upper-intermediate participants’ self-rated scores were divided by (36.6)

and the self-rated scores of advanced participants were divided by (29.6) so that they can be

entered into ANOVA as ratios. One crucial issue is that the participants were not told that if

they have achieved 80% of ‘I-can-do’ statements, this would indicate that they met the level.

This was done to avoid any potential effects of participants focusing on reaching 80% instead

of rating their actual level of proficiency, which might impact their overall self-rating

performance.

FINDINGS

1. Do ESL students’ self-rated scores correlate with their TOEFL scores? If not, why?

Data were analyzed first using a two-factor ANOVA with participants’ gender (male,

female) and level of English proficiency (intermediate, upper-intermediate, advanced) as the

factors (see Table 8). Participants’ TOEFL scores were mainly used as a covariate in each of

the analyses described below. The findings indicate that the participants’ self-rated scores did

not correlate with their TOEFL scores or with their levels of proficiency. For example, the

analysis showed that the gender effect was not significant F(1, 14) = .027, p = .87, and the

level of English proficiency was not significant either: F(2, 14) = 1.81, p = .335. This is

because there were only three participants (two intermediate and one upper-intermediate)

who had over 80% of ‘I-can-do’ statements, suggesting that they had probably reached these

levels.

74
Table 8

Findings of UNIANOVA Ratio by Gender Level of Proficiency with the TOEFL

Type III Sum Observed


Source Df Mean Square F Sig.
of Squares Powerb
Corrected Model .367a 6 .061 1.048 .436 .284
Intercept .083 1 .083 1.419 .253 .199
Gender .002 1 .002 .027 .873 .053
Level of proficiency .138 2 .069 1.182 .335 .217
TOEFL .149 1 .149 2.553 .132 .319
Gender * Level of proficiency .209 2 .104 1.790 .203 .311
Error .816 14 .058
Total 14.337 21
Corrected Total 1.182 20

This statistical analysis did not, however, reveal the extent to which each participant’s

self-rated score was accurate. In other words, although the findings concluded that there is no

correlation between participants’ self-rated and TOEFL scores, they did not show how many

participants reached, were very close, close, somewhat close, far, or very far from reaching

the assigned ratio of 80%. To obtain a fuller explanation of these findings, a descriptive

analysis was applied in order to identify participants’ self-rated performance individually.

Therefore, data were examined descriptively in order to gain further insight into these

findings. As discussed earlier, in order for participants’ self-rated scores to correlate with

their TOEFL scores, the participants should have at least 80% or more of the responses

selected in ‘I-can-do’ column to reach the level in which they participated. As shown in Table

9, only three out of 21 participants were able to obtain over the assigned ratio, achieving

100% (42/42), 85% (36/42), and 95% (40/42) respectively, suggesting that their self-rated

scores correlated with their TOEFL scores.

75
Table 9

Data of the Participants Who Reached or Were Very Close to Reaching the Assigned Ratio

Name Gender Level of English TOEFL/IELTS CEFR Self-rated Score Overall %


Proficiency Score # of I can do % obtained
1 Female Intermediate 90 42 100% 42/33.6
2 Male Intermediate 39 36 85% 36/33.6
3 Male Upper-intermediate 72 40 95% 40/33.6
4 Male Upper-intermediate 36 33 78% 33/33.6
5 Female Advanced 70 29 78% 29/29.6
6 Female Advanced 65 28 75% 28/29.6

Furthermore, the data showed that only three participants (4, 5, and 6) were very close

to 80%: 78% (33/33.6), 78% (29/29.6), and 75% (28/29.6). On the other hand, how close or

far the remaining self-rated scores from the assigned ratio (80%) varied depending on

participants’ self-rated performance. For example, they ranged from close (73%=31/33.6 and

70%=26/29.6), somewhat close (67%=25/29.6), far (59%=25/33.6 and 50%=21/33.6), to very

far (40%=15/29.6 and 28%=12/33.6)3. These percentages suggest that participants’ self-rated

scores were not normally distributed. In other words, they do not reflect participants’

determined levels of language proficiency, nor are they consistent with participants’ TOEFL

scores. In order to account for these discrepancies, a randomly selected sample of the

participants (10) was engaged in semi-structured interviews.

The interviews were designed in a way that could help provide further insight into the

accuracy or inaccuracy of participants’ self-rated performance. For example, any external

factors that could have potentially affected participants’ self-rated scores were included in the

3
For full details of each participant’s self-rated scores, look at Appendix (K)

76
interview questions. This included how many times have participants taken the TOEFL test

(test familiarity), if they ever used a self-assessment rubric before (lack of exposure), and

other questions that could help answer the other part of the first question (If not, why?). It

was found that three out of 10 participants had taken the TOEFL test for the first time,

whereas the remaining seven participants had taken the test more than two times. This

suggests that most of the participants have background knowledge about the TOEFL test,

which is likely to minimize any potential effects on their TOEFL scores.

Moreover, none of these 10 participants had ever used any self-assessment measures

before. This calls into question the extent to which participants’ lack of adequate exposure to

self-assessment rubrics may have led to lack of correlation between their self-rated TOEFL

scores. This is supported by the findings that most of the participants’ TOEFL scores were

consistent with their current CESL levels as shown in Table 10.

Table 10

Consistency of Participants’ TOEFL Scores with Their CESL Levels

Participants’ Current CESL Levels Participants’ Range of TOEFL Scores

Advanced 79 – 95

Upper-intermediate 65 – 78

Intermediate 30 – 64

To identify why the majority of participant’s self-rated scores are not consistent with

their levels of language proficiency or TOEFL scores, the participants were also asked which

measure scores, TOEFL or self-rated, they believe to be more accurate. All 10 participants

argued that their self-rated scores are more accurate in reflecting their actual levels of English

proficiency; nevertheless, this still does not answer the second part of the first question.

77
To account for why 18 out of 21 participants obtained lower self-assessed scores,

empirically derived evidence is needed such as observing participants’ performance on the

three measures: TOEFL, level of proficiency, and self-assessment. Since this study did not

intend to collect such empirical evidence, the participants who were interviewed were asked

to what extent they consider their self-rated scores accurate. All 10 participants indicated that

they feel that their self-rated scores reflect their actual levels of English proficiency. After

that, each participant was provided with a table showing his or her level of proficiency,

TOEFL scores, and self-rated scores and was asked what s/he thought of them. Although all

of the ten participants, except one, obtained lower self-rated scores compared to their TOEFL

scores, they did not show any form of embarrassment or frustration.

However, seven out of 10 participants indicated that they did not expect their self-

rated scores would be very low. After being shown their different scores, three of the

participants attributed their low self-rated scores to the fact they often become uncertain to

choose ‘I-can-do’ or ‘I-can-not-do’ statements, as they find them dichotomous. Therefore,

they decided to underestimate their scores. For example, one of the participants indicated that

once he read the survey instructions, he decided to assess his language abilities accurately

without any form of bias. This, therefore, has led some participants to underrate themselves.

Moreover, another participant argued that when the CEFR self-assessment rubric asked her

about language tasks that she once performed, she found no difficulty choosing ‘I-can-do’ or

‘I-can-not-do’ statements based on her previous background. Nevertheless, when asked about

language tasks that she has never encountered before, she spent some time visualizing herself

performing this task, and then selected ‘I-can-not-do’ statements. Another participant asserted

that when he wanted to select ‘I-can-do’, he remembered his reluctance to interact with native

78
speakers and ended up choosing ‘I-can-not-do’ statements, a CEFR problem that Weir (2005)

called channel for the communication.

Having indicated that none of them had ever used a self-assessment rubric for

proficiency purposes, the participants were asked about any other factors that have could

have potentially impacted their self-rated performance. Although the self-assessment task

was explained to them in writing, the participants pointed out that they encountered some

difficulties using the CEFR self-assessment rubric, especially during the first parts of the

rubric. This suggests that engaging students in performing self-assessment tasks might not be

as effective unless the tasks are demonstrated for them through modeling or scaffolding

(Tara, 2003). For example, McDonald and Boud (2003) divided their study participants into

experimental and treatment groups. The former received formal training on using self-

assessment rubrics, while the latter received no training. The findings revealed that the

experimental group “outperformed their peers who had been exposed to teaching without

such training in all curriculum areas” (McDonald & Boud, 2003, p. 217).

Another interesting finding accounting for participants’ low self-rated scores was the

ambiguity of some statements that some participants faced. For example, one participant

argued that during responding to some of the statements, she found difficulty understanding

the given language task fully. Another participant pointed out that some statements have

unknown or confusing words (especially verbs), which makes it difficult for them to respond

to the statement accurately. This is consistent with previous literature that urges CEFT self-

assessment rubric developers to take the synonyms they use into considerable account. For

example, Alderson et al., (2004) argue that B2 level has several different verbs are used

interchangeably to describe language scenarios relating to comprehension such as scan,

locate, monitor, identify, and so forth. The researchers express some concerns whether these
79
verbs are simply “stylistic synonyms” or they indicate “real differences in cognitive

processes” (p. 9). During the interviews, one intermediate and two upper-intermediate

participants indicated that they were unable to identify the meanings of some verbs.

2. Do gender and level of English proficiency cause any form of score underestimation?

Given that the number of participants was relatively small, the two main effects

(gender and levels of proficiency) were not significant, and that only three participants were

able to do more than 80% of the ‘Can-do’ statements, it would be rather difficult to run an

analysis of variance to identify whether gender and levels of proficiency have a substantial

impact on CEFR users’ score underestimation. However, the only statistical analysis that can

help provide some yet inadequate, explanation about any potential association between these

two factors and CEFR users’ score underestimation is by using a scatter plot.

Figure 2. Scatterplot (BIVAR)=TOEFL with Ratio by Level of Proficiency for Male and

Female.

As shown in Graph 1, the overall pattern of the plot shows that participants’ gender

and levels of proficiency scattered randomly, as the correlation is very weak. For example,

scores of intermediate and upper-intermediate participants scattered everywhere, acting

80
completely different from each other. On the other hand, although the correlation of advanced

participants’ scores is also weak, they are less randomly distributed, suggesting that the more

advanced participants obtain higher TOEFL scores, they more their self-rated scores will

increase. As for gender, the findings suggest that the more female participants obtain higher

TOEFL scores, the more their self-rated scores increase. However, this does not help draw a

clear conclusion about the impact of these two variables on score underestimation.

The other option was to examine data descriptively to identify any potential effect of

gender and levels of proficiency in causing score underestimation. To do so, a cutoff score

should be first established indicating that a user’s self-rated score is underrated compared to

his/her TOEFL scores. For example, if a participant placed herself in advanced level but her

‘I-can-not-do’ statements in level C1 dramatically outnumbered their ‘I-can-do’ counterparts,

what is the cutoff score that suggests that this user should be in a lower level (e.g. B2 or B1).

To the best of the researcher’s knowledge, no cutoff score has ever been identified to indicate

the lowest performance in CEFR self-assessment rubric as opposed to the 80%. Thus, due to

the lack of a previous established minimum cutoff score in addition to not having internal

reliability estimated by Cronbach's alpha, this option was not pursued further.

The following table contains the participants who achieved less than 60%, taking into

account that this is a random rather a cutoff percentage. It can be noticed that there are ten

participants who were either far (59% - 41%) or very far (40%28%) from the assigned ratio

(80%). Even if all external variables such as test anxiety or inadequate experience in using

self-assessment rubrics are controlled, the data in Table 11 still do not reveal any pattern

indicating that gender and levels of proficiency cause any form of score underestimation.

Therefore, it can be concluded that because the aforementioned approaches did not reveal

clear findings, further data are needed so that valid conclusions can be drawn.
81
Table 11

Participants’ Self-rated Scores Lower Than 60% of the Assigned Ratio

Gender Level of English CEFR TOEFL CEFR Self-rated Score Overall %

Proficiency Levels Score # of I can do % obtained

Female Intermediate B1 46 25 59% 25/33.6

Male Intermediate B1 36 23 54% 23/33.6

Male Intermediate B1 32 12 28% 12/33.6

Male Upper-intermediate B2 42 25 59% 25/33.6

Male Upper-intermediate B2 66 24 57% 24/33.6

Male Upper-intermediate B2 47 21 50% 21/33.6

Female Upper-intermediate B2 63 13 30% 13/33.6

Male Advanced C1 78 19 51% 19/29.6

Female Advanced C1 78 15 40% 15/29.6

Male Advanced C1 68 14 37% 14/29.6

3. Participants’ Attitudes towards the CEFR Self-assessment Rubric

Finally, the participants were asked about their attitudes towards using the CEFR self-

assessment rubric. In addition to what have been discussed earlier, the participants provided a

wide range of interesting insights from which some implications can be drawn. All ten

participants consensually indicated that they found the CEFR self-assessment rubric more

effective in determining their levels of English proficiency. For example, two participants

argued that the rubric enabled them to pinpoint both their strengths and weaknesses. Another

participant asserted that the rubric allowed him to assess his abilities to demonstrate his

language skills in a performance-based manner as opposed to the TOEFL test. In addition,

another participant pointed out that one of the advantages of the CEFR self-assessment rubric
82
is that it does not only focus on users’ four language skills, but it also includes other skills

including interaction, strategies, and language quality. Such skills, as another participant

noted, helped her visualize herself perform certain tasks inside and outside the classroom.

However, in addition to the limitations discussed at the beginning of this study, some

participants also provided some drawbacks of using self-assessment rubrics. For example,

one participant, as indicated earlier, argued that some of the rubric statements are

dichotomous, which makes her uncertain which ‘Can-do’ choice she should select. Moreover,

another participant indicated that using the rubric for purposes other than grades made her in

some cases do not take the assessment seriously. Furthermore, two intermediate participants

provided interesting comments in that although they can perform some of the rubric language

tasks, they often found themselves not motivated enough to select ‘I-can-do’ choice. In

addition and more importantly, the participants complained about the difficult level of some

words used in the rubric statements. For example, one of the participants argued that he

responded to some prompts without fully understanding the task. Yet, all of the participants

concluded that they are interested in using the CEFR self-assessment in classroom settings.

Discussion and Conclusion

Self-assessment has long been used in L2 learning contexts for various different

purposes (Ekbatani, 2011; Cassidy, 2001; Harris, 1997; LeBlanc & Painchaud, 1985;

McNamara, 1995). However, the extent to which self-assessment rubrics correlate with

traditional measures is still controversial. This study explored the experience of 21 ESL

participants in using the CEFR self-assessment rubric in order to identify whether their self-

rated scores correlate with their TOEFL scores, whether gender and levels of proficiency play

an influential role in causing score underestimation, and the possibility of incorporating this

rubric into L2 classroom assessment procedures. The findings indicated that 18 out of 21
83
participants were unable to rate themselves to the level reflecting their TOEFL scores and

CESL levels, even though the latter were found to be consistent. Moreover, the small number

of participants, lack of significance of the two main effects, and the fact that only three

participants were able to rate themselves to the given levels did not help obtain adequate data

to conclude whether gender and level of proficiency have major effects in score

underestimation.

Nevertheless, based on their responses during the interviews, ten participants showed

great enthusiasm and high motivation to use self-assessment rubrics in L2 learning settings.

In addition, they provided valuable insights about their experience in using the CEFR

assessment rubric, which can be very beneficial for L2 practitioners for promoting the

integration of self-assessment rubrics into classroom contexts. Generally, it can be concluded

that self-assessment rubrics can be very effective if certain conditions are met. For example,

students should be trained on using these rubrics efficiently in order to avoid any potential

discrepancies between their self-rated scores and their actual levels of proficiency. Moreover,

self-assessment rubrics can be a complement to traditional assessment methods until

empirically evidence is obtained indicating their high reliability.

TOEFL-related Limitations

One of the main limitations of this study was that the participants were not asked if

their reported TOEFL scores were obtained the first time they took the test. This is important

because familiarity and more practice typically increase test scores, and we are not certain

about the correlation between the participants’ TOEFL scores and their level of proficiency.

Although seven out of 10 participants who were interviewed indicated that they have taken

the TOEFL test more than two times, this does not necessarily mean that their TOEFL scores

were not impacted by any external factors. For example, test anxiety is a critical factor that
84
affects test-takers’ performance on any test, especially high-stake tests. That is, when taking

the self-assessment rubric, it is unlikely that the subjects had high-test anxiety, as the task

they were performing was for research rather for grade purposes. Thus, concurrent validity is

needed.

Another limitation is concerned with the limited number of participants (21), which

calls into question the appropriateness of drawing valid and generalizable conclusions about

the accuracy and effectiveness of self-assessment rubrics. In addition to the limited number of

participants, only three out of 21 participants were able to select more than 80% of the ‘I-can-

do’ statements, a problematic issue that which poses critical questions. Moreover, in order to

gain valid and generalizable findings, the participants should have been divided into two

groups: an experimental group, which receives training on the CEFR self-assessment rubric,

and a treatment group, which receives no training. This would certainly help draw more

accurate conclusions. Furthermore, time constraints are also another limitation of this study.

That is, the participants were not given certain time constraints to complete the CEFR self-

assessment rubric. This is unlike many assessment measures where time allotment is one of

the key aspects of evaluating students’ performance.

Implications for ESL/EFL Program Administrators

ESL program administrators should first compare their program levels to a previously

validated measure (e.g., TOEFL, IELTS, ITEP) against which students’ self-rated scores can

be compared. In this way, student performance in various measures (traditional assessment

tools, self-assessment rubrics, achievement tests) can be observed and compared. It is also

recommended that self-assessment measures be incorporated into ESL program assessment

procedures from early levels so that students’ progress throughout the subsequent levels can

85
be effectively monitored. This will help ESL practitioners and self-assessment researchers

identify the extent to which level of language proficiency impact users’ self-rated scores.

Implications for ESL/EFL Teachers

One of the key implications for ESL/EFL teachers is training students on using self-

assessment rubrics. Teachers can start by having students use a small version of any self-

assessment measure to rate their performance on a certain language task. As a result, with the

continuous use of self-assessment rubrics, students will be able to identify their areas of

language strengths and weaknesses in that the more they practice using self-assessment

rubrics the more they will become accurate in identifying their levels of proficiency.

Moreover, the CEFR ‘Can-do’ rubric is highly recommended to be used for a pre- post

measure purposes through which students’ learning progress can be identified.

Implications for Future Research

Future researchers are advised to recruit a larger number of participants so that they

can be able to increase statistical power. Moreover, they should also divide participants into

two main groups: 1) an experimental group and 2) a treatment group in order to explore the

impact of training on students’ self-assessment performance. In addition, self-assessment

rubrics can be introduced in participants’ L1 to identify their L2 skills. Finally, researchers

can look at participants’ language skill subsets (reading, listening, speaking, writing).

86
CHAPTER 4 QUALITY ASSURANCE AND ACCREDITATION AS FORMS FOR

LANGUAGE PROGRAM EVALUATION: A CASE STUDY OF TWO EFL

DEPARTMENTS IN A SAUDI UNIVERSITY

87
ABSTRACT

In 2004, the Saudi Ministry of Higher Education, currently named the Ministry of

Education, established the National Commission for Academic Accreditation and Assessment

(NCAAA) to ensure that Saudi higher institutions adhere to predetermined national standards

and frameworks. This led to a paradigm shift from institutional-oriented to outcome-based

education. As a result, many programs in Saudi Arabia seek quality in order to obtain

academic accreditation. As an initial step to assure quality, this paper provides a simulated

evaluation of two EFL departments at a Saudi University, which will be referred to as X

University throughout the paper. The evaluation processes were based on a review using a

combination of integrated standards of the NCAAA and CEA (the Commission on English

Language Program Accreditation). Data were collected qualitatively through surveys and

interviews with students, faculty, and quality assurance coordinators. Moreover, the mission,

goals, objectives, curriculums, teaching strategies, assessment techniques, and learning

outcomes of the two departments were also evaluated.

The findings indicated that the two EFL departments appear to partially meet the

standards based on their the mission, curriculum, student learning outcomes (SLOs), and

program development, planning, and review, whereas the standards for teaching strategies,

assessment methods, and student achievement were not met. At the end of the paper, some

concrete suggestions for improvement are provided that should help the two departments

address areas of weaknesses in order to be ready to obtain academic accreditation from CEA

and NCAAA in the future.

Key words: NCAAA, CEA, quality assurance, academic accreditation, program review,

curriculum, and male and female-section departments, EFL.

88
INTRODUCTION

Brown (2007) points out that “no curriculum should be considered complete without

some form of program evaluation” that corresponds with its changing conditions and

emerging developments (p. 158). Research suggests that academic programs be evaluated on

a constant basis to ensure that their learning outcomes are achieved effectively (Ellis, 1993;

Halliday, 1994; Mercado, 2012; Royse et al., 2009). Pennington and Hoekje (2010) point out

that program evaluation can be undertaken on a formal and informal basis, where the former

may include external review and accreditation, and the latter may involve ongoing internal

monitoring, SWOT (Strengths, Weaknesses, Opportunities, Threats) analysis, and internal

quality control. Royse, Thyer, and Padgett (2009) argue that program evaluation is

characterized as “a practical endeavor, not an academic exercise, and is not primarily an

attempt to build theory” (p. 2). In other words, it is a process of evaluating a program using

research tools to improve its outcomes.

In recent decades, numerous researchers have reported a growing interest in quality

assurance and accreditation at higher education institutions in both developed (MacKay,

1998; Mercado, 2012; Stensaker & Rosa, 2007; Van Damme, 2004; Westerheijden) and

developing countries (Sallis, 2002; Shawer, 2013; Smith, & Abouammoh, 2013). The

relationship between program evaluation on one side and quality assurance and accreditation

on the other has long been recognized in higher education contexts as one of the key

impetuses for program evaluation that “takes the form of accreditation-mandated student

learning outcomes assessment” and quality control (Norris, 2016, p. 173). As a result and in a

response to the rapidly growing quality assurance and accreditation interests, this paper

evaluates quality assurance processes at two EFL departments in a Saudi Arabian university

89
using an integrated set of accreditation standards of NCAAA and CEA to obtain accreditation

from them in the future.

What Is Program Evaluation?

Research has documented a wide range of different, albeit interrelated, definitions of

program evaluation. From the early 1980s to the present, these definitions have revolved

around one central theme even though other considerations have been suggested for the term

program evaluation. For example, Palmer (1992, p. 144) defines program evaluation as a

process of finding out whether a program is “feasible” in terms of its curriculum (practicality

issue), “productive” in terms of producing the intended learning outcomes (validity issue),

and “appealing” in terms of responding to real-life language scenarios (authenticity issue).

Barker (2003) defines program evaluation as a “systematic investigation to determine the

success of a specific program” (p. 149). In a broader view, this investigation should involve

examining the efficiency of “the individual components of [any] program in relation to each

other and to contextual factors, goals, criteria of value” (Pennington & Hoekje, 2010, p. 262).

Thus, program evaluation involves thorough assessments of the targeted program’s

performance.

Allen (2004) defines program evaluation as an ongoing process “for focusing faculty

attention on student learning and for provoking meaningful discussions of program

objectives, curricular organization, pedagogy, and student development” (p. 4). In this study,

the main focus is on the program evaluation form that leads to maintaining quality assurance

and thereby obtaining academic accreditation. This is because program evaluation processes

in Saudi Arabia have recently received substantial attention for accreditation purposes. As a

result, this paper adopted Pennington and Hoekje (2010)’s comprehensive definition of

program evaluation as a systematic investigation of the extent to which a program adheres to


90
nationally and internationally determined standards, which in this case are the standards of

the NCAAA and CEA agencies, respectively.

LITERATURE REVIEW

Given that applied linguistics is a field that began in the 1940s (Kaplan, 2010),

program evaluation literature “within the field is quite scant” as well (Lynch, 1996, p. 12).

However, research has documented some evidence of early systematic evaluations in public

education. For example, the first documented program evaluation dates back to the 19th

century, when many federally-funded schools in the United States, Great Britain, and

Australia began evaluating school curricula (Rea-Dickins, 1994). The premise underpinning

the evaluation processes undertaken during that era was based on evaluating curricula

“through scrutiny of the competence and behavior of the teacher” (Kiely & Rea-Dickins,

2005, p. 18). Then, during the late 19th century, programs were evaluated quantitatively based

on data (e.g., student retention rates, learning outcomes) from a positivistic view (Gitlin &

Smyth, 1989). Nevertheless, program evaluation during these eras remained recognized as a

field inseparable from applied linguistics (Beretta, 1992; Lynch, 1991).

Program evaluation began to emerge as an independent field of applied linguistics

during the 1960s and early 1970s (Beretta, 1992; Cronbach, 1963, Keating, 1963; Lynch,

1990; Sherer & Wertheimer, 1964; Smith, 1970). According to Lynch (1996), the emergence

of program evaluation field was due to some researchers’ calls for more systematic

approaches to program evaluation that could provide us with “what counts as evidence” (p.

9). In other words, program evaluation should be used as an effective tool for providing

concrete evidence of the appropriateness and efficiency of teaching and learning of the

program being evaluated. As a consequence, many funding agencies expect program

administrators to provide them with some statistical information as proof indicating that a
91
program indeed achieves the desired learning outcomes that will allow it to continue to

receive funding (Gitlin & Smyth, 1989).

Foreign Language Program Evaluation

There was a plethora of studies on foreign language program evaluation from the

1960s through the 1990s (Brown, 1995). According to Beretta (1992), until the late 1980s,

most of the studies and discussions on program evaluation were limited to research papers,

and there were very few books addressing this area. This was, as Beretta (1992) puts it, due to

“the seemingly never-ending” quantitative versus qualitative research-method debate (p. 5).

Beretta (1992) reviewed most, if not all, of these studies, and the extent to which they

contributed to shaping the modern field of program evaluation. For example, one of the early

studies on foreign language program evaluation, conducted by Keating (1963), examined the

impact of laboratory-based versus classroom-based teaching of French. The findings showed

that classroom-based teaching achieved better results as opposed to laboratory teaching.

Similarly, Campbell and Stanely (1963) conducted the first true- and quasi-experimental

study on L2 program evaluation, which provoked the research-method debate for a long time.

During the 1970s, several program evaluation studies were conducted (e.g., Asher et

al. 1974; Bushman & Madsen, 1976; Gary, 1975; Levin, 1972; Postovsky, 1974; Smith,

1970; as cited in Beretta, 1992). These studies focused on manipulating the teacher variable

so that variations could be minimized. This variable was controlled by comparing teachers’

performance from different programs (Smith, 1970), comparing the performance of two

program taught by the same teacher (Bushman & Madsen, 1976; Postovsky, 1974), and

finally “[eliminating] teachers altogether and [replacing] them with tape-recorded lessons”

(Beretta, 1992, p. 11). During the 1980s, Tyler’s Model of Evaluation, developed in the

1940’s, was again used by program evaluators by comparing the desired outcomes with
92
achieved outcomes (Beretta & Davies, 1985; Prabhu, 1987; Wagner & Tilney, 1983; Wolfe

& Jones, 1982; as cited in Beretta, 1992).

During the 1990s, program evaluation literature witnessed an increasing focus of

studies addressing many areas of program evaluation. For example, Beretta (1992) examined

‘program-fairness’ of evaluation and concluded that there was a gap between program

evaluation theory and practice that needed to be addressed. Drawing upon this, Lunch (1996)

suggested that data-gathering techniques “provide information that makes sense and counts as

evidence” of program performance (p. 155). To achieve these two implications, standards of

the Joint Committee on Standards for Educational Evaluation (JCSEE) were reviewed in the

mid-1990s, which led to innovative and standard-based program evaluation (Patton, 1997). In

the late 1990s, several researchers advocated the incorporation of political, affective, and

cultural aspects into program evaluation processes, for language learning involves all these

aspects (Rea-Dickins, 1998). This resulted in including language changing environmental

factors into program evaluation, which can be found in Pennington’s (1998) work that studied

dynamic and interactive dimensions of language program evaluation than solely focusing on

quantitative data.

At the beginning of the 21st century, the epistemological and methodological tenets of

program evaluation have changed due to the impact of a wide range of political, economic,

and technological changes (Kiely & Rea-Dickins, 2005). This has resulted in spurring the

demand for compliance with mandates as a basis for language program evaluation.

Evaluation approaches such as self-study, peer review, key performance indicators, and

external audits have become key to verifying program’s compliance with mandates (Kiely,

2001), which have led to conducting language program evaluation for meeting “internal and

external commitments to quality assurance” (Kiely & Rea-Dickins, 2005, p. 52). In


93
conjunction with this mandate compliance, the culture of promoting and assuring quality,

albeit not new, has made language program evaluation key for accreditation purposes (Eaton

2006; Harvey 2004; Norris, 2009).

Evaluation Paradigms

Every evaluation approach is premised on certain theoretical underpinnings, which

makes it crucial at this point to understand how programs have been evaluated. Lynch (1996)

provides a broad discussion of a ‘paradigm dialog’ in social literatures dominated by two

paradigmatic camps: 1) positivistic view and 2) naturalistic view. Debates between these two

paradigms are based on “the epistemological basis of research” (Lynch, 1996, p. 13).

Advocates of the positivistic paradigm prefer traditional, quantitative, and experimental

approaches to conducting inquiry (Kiely & Rea-Dickins, 2005). This paradigm has generated

two key inquiry approaches: 1) true experiments, which assign participants randomly to

experimental and control groups to draw comparisons between them, and 2) quasi-

experiments, which compare the two groups without random assignment (Lynch, 1996).

Many researchers (Beretta, 1986; Long, 1983) criticized the positivistic paradigm, for it

merely focuses on “product or outcome rather than also attending to the process of how the

program was being carried out” (Lynch, 1996, p. 32).

On the other hand, the naturalistic paradigm “requires an emergent or variable design”

where evaluation takes place as the “evaluator proceeds to investigate the program setting,

allowing new information and insights to change how and from whom data will be gathered”

(Lynch, 1996, p. 14). This form of evaluation does not manipulate any conditions or

variables; rather, it observes, describes, and interprets how a program performs in real-life

contexts. In other words, a naturalistic inquiry is carried out in a program’s natural settings,

which is consistent with the naturalistic second language learning approach (Krashen, 1982).
94
Such an approach includes “observations, interviews, questionnaires, tests, and program

documentation” (Lynch, 1996, p. 82). Later, a combination of positivistic and naturalistic

approaches was applied to help us articulate “what counts as evidence for our evaluation”

(Lynch, 1996, p. 40).

Most early program evaluation studies have focused mainly on program design and

evaluation as key to examining “pedagogical innovations” in L2 language programs (Ross,

2003, p. 2). Later, a wide range of program evaluation approaches emerged, changing to

some extent the motives why language programs are evaluated. For example, during the mid-

and late-1990s, program evaluation shifted from merely improving program specific features

to focusing more on “large-scale educational accountability mandates” (Norris, 2009, p. 8).

This impetus for accountability was a result of the necessity of gaining more funds from

funding agencies (Brindley, 1998). Program outcome-based evaluation can therefore have

several motives including but not limited to responding to greater accountability needs,

maintaining quality assurance and control, obtaining academic accreditation, gaining higher

rankings, and so forth. Regardless of the purpose of program evaluation, it is prudent realize

that program evaluation provides valuable conclusions about a program performance as well

as the extent to which strengths and weaknesses should be supported and addressed

respectively.

Having identified different program evaluation paradigms, program evaluators should

then decide how program evaluation processes will be planned, implemented, and even

evaluated in a systematic way. To help program evaluators make a decision with this regard,

Brown (1995) argues that prior to conducting program evaluation, evaluators should first

consider the following six fundamental types of program evaluation:

1) formative or summative
95
2) involves external experts or internal participants

3) field or laboratory research

4) on-going or short-term evaluation

5) quantitative or qualitative data

6) process or product focused.

These illustrate some of the key conceptual and practical approaches to program evaluation

that program evaluators, administrators, teachers, practitioners, and external reviewers should

take into consideration.

Formative vs. Summative

Formative and summative evaluations are two distinct measures that have long been

used in the evaluation industry (Alderson, 1986; Bachman & Palmer, 1996; Black & William,

2009; Brown, 1995). Although formative evaluation was first introduced into educational

psychology literature by Scriven (1967), it was defined in detail by Bloom et al. (1971) as

“the use of systematic evaluation in the process of curriculum construction, teaching, and

learning for the purpose of improving any of these three processes” (p. 117). It is aimed at

improving a course, curriculum, or program as a whole through an ongoing evaluation

process. In other words, it refers to the process of collecting information and data that

contribute to making useful “decisions about a program while it is under development”

(Bachman & Palmer, 1996, p. 62). However, Bailey (2009) defines program level formative

evaluation as a form of appraisal that provides “feedback for program improvement” (p. 707).

It helps stakeholders gain periodic or ongoing guidance that helps them “adjust their activities

accordingly” (Mohr, 1995, p. 33).

In contrast to formative evaluation, summative evaluation refers to any form of

measure that “typically occurs at the end of a program or a funding period for a program”
96
(Bailey, 2009, p. 707). It is typically used to provide “evaluative conclusions for any other

reasons besides development” (Scriven, 1991, p. 21). Brown (1995) argues that summative

evaluations occur at the end of a program “to determine whether the program was successful”

(p. 228). This form of evaluation provides teaching practitioners with an inclusive summary

of the value of their program for future improvement. As for the study contexts, most of the

evaluations undertaken by the NCAAA are summative, while those conducted by program

participants are formative. A useful distinction of this dichotomy was suggested by Alderson

(1986) that “when the cook tastes the soup, that is formative. When the guests taste the soup,

that is summative" (p. 11).

External Experts vs. Internal Stakeholders

Program evaluation may involve bringing in external experts, internal participants, or

both. Some program administrators invite external experts to carry out their program

evaluation processes. For example, during one interview, the quality assurance coordinator of

the male EFL department under study indicated that they once invited a quality assurance

expert from another Saudi university to evaluate their learning outcomes. The NCAAA

requires Saudi higher educational institutions to bring in a group of external experts to help

them conduct the developmental self-study or the mock program review. According to Brown

(1995), external reviewers can provide “a certain amount of impartiality and credibility to the

results” (p. 232). Nevertheless, Alderson and Scott (1992) argue that outside evaluators “are

perceived by insiders as at least threatening to themselves and the future of their project, and

at worst as irrelevant to the interests and perspectives of the project” (26-27). Hence, they

suggest depending more on stakeholders, which is widely known in literature as the

Participatory Evaluation Model (Alderson & Scott, 1992).

97
The Participatory Model includes involving stakeholders in program planning,

implementation, and evaluation processes (Alderson & Scott, 1992). This model helps

provide a dynamic and spectral analysis of program performance by obtaining stakeholders’

different perspectives about the effectiveness of their program. One of the main benefits of

this model is that it helps “develop greater insights not only about the roles that stakeholders

may play in evaluations, but also how learning may take place” (Kiely & Rea-Dickens, 2005,

p. 201). Nonetheless, to overcome any potential bias, Alderson and Scott (1992) suggest that

a consultant be involved in the evaluation processes to provide guidance rather than

judgment. Moreover, Ross (1992) recommends engaging teachers in the evaluation processes

to reduce their anxiety of “being watched” and to promote their roles as “both practitioner

and observer” (1992, p. 172).

Field vs. Laboratory Research

Another aspect that program evaluators should be aware of is whether the evaluation

processes will be undertaken through field or laboratory research. Beretta (1986, p. 296)

defines field evaluation as a “long-term, classroom-based inquiry” aimed at identifying the

effectiveness of a program through an expansive evaluation of its components (curriculum,

teaching strategies, and assessment procedures). In contrast, laboratory research refers to a

short-term evaluation that often “involves the testing of individual components of a theory in

an environment in which extraneous variables are artificially held constant” (Beretta, 1986, p.

296). Reliance on either form depends on the purpose of the evaluation (Brown, 1995).

Alderson and Scott (1992) argued that some program evaluators prefer a field evaluation to

obtain an overall image of a program, while others prefer to conduct a small-scale laboratory

evaluation to avoid wasting time.

Ongoing vs. Short-term Evaluations


98
Similarly, program evaluation can also take the form of ongoing or short-term

evaluations depending on the evaluation purpose and context. According to Brown (1995),

previous studies of program evaluation were primarily longitudinal in that they were

conducted during the implementation of a program. On the other hand, short-term evaluations

(often called after-program evaluations) are carried out at the end of a program in an attempt

to evaluate the program outcomes in a retrospective manner. Some program evaluation

researchers pay substantive attention to the content of the program by observing student

performance constantly (on-going evaluation), while others seek to look for the final output

(short-term evaluation). Brown (1995) suggests that a combination of both ongoing and

short-term evaluations be integrated “during the program, immediately after it, and in a

follow-up as well” so that all potential aspects of the program can be covered effectively (p.

233).

Qualitative vs. Quantitative

Program evaluation data can be gathered and analyzed qualitatively, quantitatively, or

a combination of both. Qualitative methods include observations, interviews, journal entries,

questionnaires, and so forth (Lynch, 1996). Advocates of qualitative approaches adopt the

naturalistic design, which depends mainly on data interpretation (interpretivist perspective)

(Lynch 1990, 1992; McKay, 1991; Rea-Dickens, 1999). Qualitative approaches are useful for

providing “the best array of information types” allowing program evaluators to gain broader

and deeper insight into program processes (Brown, 1995, p. 234). However, such approaches

can be problematic, for data collection and analysis are time-consuming and may include

multi-faceted themes, an issue that encourages many program evaluators to rely more on

quantitative methods.

99
On the other hand, some researchers choose to collect and analyze data quantitatively

(positivist perspective). Brown (1989) defines quantitative data as any form of information

“gathered using those measures which lend themselves to being turned into numbers and

statistics” including course grades, test scores, accountability criteria, faculty qualifications,

ratio of students per teaching faculty and so forth (p. 231). One of the benefits of quantitative

data in program evaluation is that they have been seen as “easier to gather, and more

amenable to analysis and summary” (Alderson & Scott, p. 53). Another benefit is that they

can help alleviate as much stakeholders’ bias as possible, as rarely is there any form of

subjective analysis (Ross, 2003). Nevertheless, recent program evaluation studies have

incorporated eclectic approaches (i.e., measurement, non-measurement) into program

evaluation to provide multiple perspectives of how learning processes are undertaken

(Alderson & Beretta, 1992). According to Ross (2003), this diversity of approaches can help

us “yield richer contextualized data about program processes as well as empirical data about

outcomes” (p. 3).

Process vs. Product

Program evaluators should also adopt a process-based, product-oriented evaluation, or

both. Brown (1989) defines product-oriented evaluation as an evaluation verifying if “the

goals (product) of the program were achieved,” as opposed to process-oriented evaluation

that focuses mainly “on what it is that is going on in the program (process) that helps to arrive

at those goals (product)” (p., 231). During the 1990s, as denoted earlier, program evaluation

was focused more on process-oriented methodologies (Lynch, 1996). That is, program

evaluation advocates called for examining how learning outcomes are achieved effectively

(Beretta, 1986b; Burstein & Guiton, 1984; Campbell, 1978; Elley, 1989; Lynch, 1990, 1996;

Ross, 2003). Process-oriented evaluations can provide internal stakeholders (e.g., students,
100
teachers, administrators) and external stakeholders (e.g., accreditation agencies, funding

agencies) with detailed information about their program to enhance its performance and to

respond to accountability (Gredler, 1996).

Having reviewed how programs were evaluated in different decades, it is crucial to

identify the different motives of educational institutions for evaluating their programs. Over

time, according to Thomas (2003), program evaluation has changed dramatically to include

other considerations such as accreditation, quality assurance, external review, and so forth in

addition to curriculum development, accountability, the value and worth of programs, and

fund gaining (as cited in Kiely & Rea-Dickins, 2005). Hence, purposes for program

evaluation differ depending on the expectations of the stakeholders of the program being

evaluated. That is, program evaluation can be sought for different purposes including

developing “a thorough understanding of the program”, obtaining “information for

organizational development”, and drawing “valid generalizable conclusions” about the

program being evaluated (Posavac, 2015, p. 23). Most of these purposes lead to quality

assurance and accreditation goals.

Program Evaluation through Quality Assurance

During the 1990s, quality assurance became a systematic and comprehensive

component of program evaluation (Van Damme, 2004). The term quality assurance refers to

a systematic evaluation process through which a program’s learning outcomes are evaluated

to ensure that they meet specific predetermined standards (McNaught, 2009). According to

Halliday (1994), quality assurance was first sought in educational contexts during the 1980s

“as a response to political demands for institutional accountability” (p. 36). The term quality

assurance is “derived partly from manufacturing and service industry, partly from health

care” and then was pervasively integrated into educational contexts (Ellis, 1993, p. 3). Norris
101
(1998) listed six approaches to curriculum evaluation of social programs that help manage

quality assurance and comply with mandates, namely “1) Experimentation, 2) Measurement

of Outcomes, 3) Key Performance Indicators (KPIs), 4) Self-study Reports, 5) Expert Review

and 6) Inspection” (p. 209).

Gosling and D'Andrea (2001) went beyond quality management to a higher phase that

they call Quality Development, which is “an integrated educational development model that

incorporates the enhancement of learning and teaching with the quality and standards

monitoring processes in the university” (p. 11). The premise of this model is to assure total

quality of program curriculum rather than solely focusing on achieving other goals, for

example, accreditation. Thus, various new initiatives have been introduced into quality

assurance literature including total quality, quality control, trust, and so forth (Sallis, 2002).

Research reveals many approaches to assuring quality such as self-report (Ellis, 1993), needs

analysis (Richard, 2001), SWOT analysis (Dyson, 2004), benchmarking (Ellis & Moore,

2006), and site visit (Norris, 2009). Moreover, one of the purposes of assuring quality is to

obtain accreditation, a process by which institutions, universities, and programs are validated

(Pennington & Hoekje, 2010).

Accreditation-based Program Evaluation

Accreditation is another essential measure of quality assurance that “leads to the

formal approval of an institution or program that has been found by a legitimate body to meet

predetermined and agreed upon standards” (Van Damme, 2004, p. 129). Accreditation is a

self-regulated process of “external quality review used by higher education to scrutinize

colleges, universities, and educational programs for quality assurance and quality

improvement” (Council for Higher Education Accreditation, 2002, p. 1). That is, it entails

evaluating the extent to which certain standards are met by an institution or program.
102
According to Eaton (2006), accreditation functions as “a reliable authority on academic

quality” whereby quality assurance, federal and state funds, accountability, etc. can all be

maintained effectively (p. 3). There are two types of accreditation in higher education:

institutional accreditation and programmatic accreditation. The former provides an institution

with “a license to operate”, while the latter accredits programs “for their academic standing”

(Harvey, 2004, p. 208).

Many accreditation bodies stress “the role of evaluation within institutions,

departments, and programs” to assure quality in their programs (Norris, 2009). The

exponentially growing demands for accountability, outcome-based education, and quality

assurance have placed substantial emphasis on accreditation. Consistent with this,

postsecondary institutions in many developing countries, including Saudi Arabia, have begun

to seek “to participate and compete in the global economy” in order to ensure that their

programs meet predetermined accreditation standards (Smith & Abouammoh, 2013, p. 104).

In fact, some Saudi universities have imposed strict systems that lead their programs to meet

national accreditation standards. For example, in 2009, King Saud University imposed

probation on its programs, “and unless they satisfy the NCAAA standards by September

2012, they would be shut down” (Shawer, 2013, p. 2884).

Quality Assurance and Accreditation in Saudi Arabia

Education in Saudi Arabia has been undergoing drastic reforms that have resulted in

extraordinary improvements. These improvements were associated with several initiatives,

starting from introducing accreditation-based evaluation in 2004, launching the King

Abdullah Scholarship Program in 2005, establishing more governmentally funded

universities in 2007, and ending with sponsoring several private higher education institutions

in 2010 (Alamri, 2011). This has led the Saudi Ministry of Education to implement quality
103
assurance in higher education institutions. Nonetheless, according to Darandari et al. (2009),

“there was no quality system for higher education at the national level in Saudi Arabia”

before 2004 save some institutional endeavors (p. 40). Since that time, the NCAAA has

become the official government accreditation body that governs national quality assurance

and accreditation standards to which many public and private postsecondary institutions

should adhere in order for them to obtain academic accreditation.

NCAAA

The National Commission for Academic Accreditation and Assessment (NCAAA)

was established in 2004 to oversee quality assurance and accreditation processes in Saudi

Arabia. Although it is a governmentally funded agency operating under the Ministry of

Education, the NCAAA is an independent body that accredits Saudi postsecondary

institutions (NCAAA, 2013). The mission of the NCAAA is “to encourage, support, and

evaluate the quality assurance processes of postsecondary institutions and the programs they

offer” (NCAAA, 2014). In 2005, it became a member of the International Network for

Quality Assurance Agencies in Higher Education (Darandari et al., 2009). The quality

assurance and accreditation trends in conjunction with the expansion of higher education

have revitalized program evaluation in Saudi Arabia.

Significance of the Study

The Saudi higher education system has recently witnessed a significant expansion

(Smith & Abouammoh, 2013). As of 2015, there are 26 government universities, 18 male and

80 female Primary Teacher Colleges, 37 Health Colleges and Institutes, 12 Technical

Colleges, and 49 Private Universities and Colleges (MOHE, 2015). Although obtaining

accreditation is not mandatory in Saudi Arabia, nor does it impact government-based funding,

many Saudi universities seek to obtain accreditation from the NCAAA. According to the
104
Saudi Ninth Development Plan (2014), accreditation is a national strategic dimension of the

policies of the country (Ministry of Economy and Planning, 2015). Hence, all universities are

required to restructure their academic programs by closing those that do not fulfill job market

needs and promoting those that do. Moreover, no university is allowed to establish new

programs unless high employability rates relating to these programs can be obtained. Thus,

many universities tend to customize their current programs to meet job market needs. One of

the most effective and highly recommended approaches to do so is by obtaining national

accreditation from NCAAA.

Quality assurance and accreditation are therefore a key trend to shaping the future of

Saudi higher education (Darandari et al., 2009). Since quality assurance processes, as

Mercado (2012) noted, are better performed by internal reviewers first (Emic perspective),

this paper serves as a simulated evaluation of two EFL departments in a Saudi university as

an initial step for assuring quality and thereby obtaining academic accreditation. Therefore, in

coordination with the vice-rector and the heads of the two departments, the researcher

conducted a site visit to the two departments to evaluate their quality assurance processes. It

is also hoped that this paper will be the first step toward promoting language program

evaluation in Saudi Arabia in order to assure quality and obtain academic national and

perhaps international accreditation.

METHODOLOGY

The primary motive for conducting this study is to guide two EFL departments

through preliminary program evaluation processes using an integrated model from

accreditation standards of NCAAA and CEA commissions that meet the nature, needs, and

purposes of the 18 EFL departments. The quality assurance practices of two departments (one

for male students and the other for female students) were evaluated to serve as internal
105
benchmarks for the remaining 16 departments to help their administrators assure quality and

obtain accreditation from NCAAA and CEA. Any inapplicable or unnecessary standards

were excluded (e.g., marketing, housing, and others). To evaluate these two EFL departments

effectively, the researcher made a site visit to X University where these departments are

located. The researcher circulated surveys to male and female students, teaching staff, and

department heads, and he also conducted semi-structured interviews with a randomly selected

sample of participants.

Background on Site

X University has 18 male and female EFL undergraduate departments, which are

planning to obtain departmental accreditation from the NCAAA in the next two years and

perhaps from the CEA in the near future. Both the departments have a four-year, full-time

education program divided into eight levels and preceded by a one-semester pre-university

intensive English language program. The total credit hours required for graduation are 137,

22 of which are non-major courses. The departments have two main tracks: 1) EFL and

Translation and 2) EFL and Literature. These two departments were selected in particular

because they are the largest male and female EFL departments at X University, so they can

function as internal benchmarks for their comparable EFL department counterparts at X

University.

Research Objectives

The research objectives of this paper vary but fall under one overarching goal. First, they

aim to obtain stakeholders’ (students, faculty, and department administrators) opinions about

the effectiveness of the two departments being studied. Second, they attempt to identify the

extent to which the two departments meet the integrated standards of NCAAA and CEA in

order to prepare them to obtain academic accreditation from these two commissions. Third,
106
they pinpoint both positive and potentially poor quality assurance practices in the two

departments. Fourth, they identify potential dilemmas and barriers that may delay the two

EFL departments from obtaining academic accreditation from the NCAAA and CEA. Fifth,

they provide some implications that will contribute to helping the two EFL departments

maintain quality assurance and thereby obtain academic accreditation. Finally, the researcher

will provide a concise report describing quality assurance processes of these two EFL

departments. It is hoped that the report will be useful not only for EFL departments at X

University, but also for any similar EFL departments that plan to obtain national or

international academic accreditation.

Research Questions

1. To what extent do the two departments meet the study-integrated model of standards?

2. To what extent are students satisfied with the two departments’ curricula, teaching

strategies, learning outcomes and assessment methods?

3. What are good quality assurance practices in the two departments?

4. What are poor quality assurance practices in the two departments that need to be

improved?

5. Are there any potential dilemmas and barriers that may delay the two EFL

departments from obtaining academic accreditation? If so, what are they?

Participants

The participants of this study are divided into two groups, male and female. Each

group is subdivided into three subgroups: teaching staff, students, and administrators. The

first group consists of a randomly selected sample of teaching faculty, students, and the

quality assurance coordinator of the male EFL department. The second group consists of a

randomly selected female group of the same categories from the female department. The first
107
subgroup of participants is male and female undergraduate students majoring in EFL. Their

levels of English proficiency, which were identified based on their current levels at the two

departments, range from beginner to advanced. Two hundred twenty seven students studying

in the EFL departments participated in the survey, and 15 of them (six male and nine female)

were interviewed. The second subgroup consists of two male and four female teaching

faculty. The third subgroup consists of one male quality assurance coordinator and one

female academic coordinator.

Research Tools

The central research tools used in this study included a web-based survey (Appendix

M), semi-structured interviews (Appendix N), and the researcher’s evaluation checklist

(Appendix O). The survey consisted of six themes: general information (4 items), course

objective (10 items), teaching strategies (10 items), student learning outcomes (10 items), and

student assessment (10 items). There were also three different semi-structured interviews: 1)

student interviews, 2) teaching faculty interviews, and 3) administrator interviews. The third

research tool was the evaluation checklist (Appendix O), which was used to evaluate the

mission, curricula, objectives, and goals of the two departments. Survey items, semi-

structured questions, and evaluation checklist were all designed based on the standards and

substandards of the integrated model of NCAAA and CEA to ensure that the evaluation

emulates those used by the two commissions.

Integrating NCAAA and CEA Standards

It is relatively difficult and perhaps time-consuming to integrate all NCAAA and CEA

standards in Table 12 into one model; thus, only standards suiting the Saudi higher education

contexts were included. Moreover, this paper focuses on the standards that address student

learning experience and ignores those that deal with administrative and/or financial matters.
108
Hence, 1) Mission, Goals, and Objectives, 2) Learning and Teaching, and 3) five sub-

standards of Learning and Teaching standard were selected from NCAAA standards.

Similarly, 1) Mission, 2) Curriculum, 3) Program Development, Planning, and Review, and

5) Student Achievement were selected from CEA standards. The integration process also

included the sub-standards of the selected standards, as it is shown in Table 13 below. One

noteworthy issue is that meeting the integrated model of standard does not mean in any case

that the two departments meet all of the CEA and NCAAA standards; rather, it means that

they appear to meet those in the model.

109
Table 12

NCAAA and CEA Standards

NCAAA CEA

Mission Goals and Objectives Mission

Learning and Teaching Curriculum

Management of Quality Assurance Faculty

Facilities and Equipment Facilities, Equipment and Supplies

Financial Planning and Management Administrative and Fiscal Capacity

Student Administration and Support Services Student Services

Employment Processes Recruiting

Learning Resources Length and Structure of Program of Study

Governance and Administration Student Achievement

Research Student Complaints

Table 13

Integrated Model of Standards and Sub-standards

NCAAA CEA

Mission Goals and Objectives Mission

Learning and Teaching Curriculum

Sub-standard 4.2: Learning Outcomes


Student Achievement
Sub-standard 4.5: Student Assessment

Sub-standard 4.3: Program Development


Processes
Program Development, Planning, and Review
Sub-standard 4.4: Program Evaluation and
Review Processes

Procedures

110
Data were collected circulating surveys and by conducting semi-structured interviews

with a randomly selected sample of the two departments (i.e., students, teaching faculty,

program quality assurance coordinator). That is, having visited the male department, the

researcher attended some classes, where he was allowed to spend time with students to

collect data. Although the survey was delivered via Qualtrics, online survey software, the

participants did not have access to the Internet. At first, the researcher thought a technical

glitch occurred for some reason; nevertheless, it turned out that the department did not have

Internet access. Thus, the researcher had to print 300 copies of the survey. Then, students

were provided with a detailed description of the study’s purpose and procedure, and the

survey was explained to them in order to clarify any potential confusion. Nonetheless, for

ethical considerations, students were given the option of completing the survey via hard copy

or online using their phones.

After that, a randomly selected sample of male students was engaged in semi-

structured interviews to allow them to expand on their attitudes towards the department. Each

interview session took approximately 20-25 minutes. By randomly choosing some students

from each level, the researcher ensured that the interviewed students had different levels of

proficiency to obtain a broader insight into the department. Moreover, the mission, goals,

curriculums, teaching strategies, assessment methods, and learning outcomes of the two

departments under study were evaluated using the evaluation rubric that has been designed

based on the integrated standards model (See Appendix O). As for the female-section

department, the researcher was unable to interview female participants face-to-face due to

cultural constraints; thus, a female faculty was assigned by the department head to carry out

the interviews under the researcher’s supervision.

111
Data Analysis

Data were analyzed systematically and qualitatively as carried out by Mitchell (1989)

for her evaluation of the language program at the University of Stirling, Scotland. That is,

responses of all participants were used as evidence for exploring the extent to which the two

departments meet the target standards and then to provide a comprehensively descriptive and

explanatory report about the two departments’ performance. Based on participants’

responses, all potential themes, explicit or implicit patterns, or emerging trends were sought.

For example, all answers obtained in the second round of data collection (interviews) were

transcribed and then analyzed to identify themes using comparative techniques against CEA

and NCAAA standards. The findings of this study were divided into four major areas: 1) the

two departments’ adherence to the integrated model of standards, 2) student satisfaction with

the departments, 3) good and poor quality practices of the two departments, and finally 4) any

barriers delaying the departments from obtaining academic accreditation.

FINDINGS

The extent to which the practices of the two departments meet the CEA and NCAAA

standards was examined by evaluating them against the study-integrated model of standards.

Standard One: Mission/Goals/Objectives

Research has documented several definitions of a program’s mission statement,

defined by Allen (2004) as a “holistic vision of the values and philosophy” of an institution or

program (p. 28). According Kiely and Rea-Dickens (2005), articulating a mission is key to

organizing a program’s activities and for conforming to institutional norms. Mission

statements are "ubiquitous in higher education” in that “accreditation agencies demand them,

strategic planning is predicated on their formulation, and virtually every college and

university has one available for review” (Morphew & Hartley, 2006, p. 456). Conway,
112
Mackay and Yorke (1994) provide six criteria that a mission statement should fulfill: 1)

institutional mission, 2) target groups, 3) goals and objectives, 4) programs offered, 5)

program context, and 6) values sought by the program over other similar programs.

Evaluating the Mission Statements of the Two EFL Departments

As discussed earlier, the mission statement plays an integral role in determining how a

program operates within a previously determined set of standards. Therefore, both the CEA

and NCAAA commissions emphasize the importance of developing a publicly announced

mission statement that can be accessed by stakeholders at anytime and from anywhere.

Moreover, the CEA mission standard states that a program should have “a written statement

of its mission and goals, which guides its activities, policies, and allocation of resources”

(CEA, 2015, p. 7). Surprisingly, the two EFL departments under investigation have the same

mission statement in writing, which also appears to have areas of weaknesses. For instance, it

was derived from the mission of the college under which the two departments operate (See

Appendix P).

To better evaluate the departments’ mission statement, it is necessary to examine the

college’s mission, which functions as the umbrella for the two departments’ mission.

Although the college’s mission is posted on the college’s website, which is adequate for

meeting one aspect of the NCAAA sub-standard, “Stakeholders should be kept informed

about the mission and any changes in it” (NCAAA, 2013, p. 8) and CEA main standard, “The

mission should be communicated to faculty, students, and staff, as well as to prospective

students, student sponsors, and the public” (CEA 2015, p. 7), it appears to be too wordy. This

may make it difficult to be understood by stakeholders and evaluators. Moreover, it does not

specify the college’s activities, policies, and allocation of resources. Rather, it only describes

its future graduates’ attributes as teachers in humanities. As a result, it would not be


113
surprising that the mission statement of the two departments, though not wordy, is also not

specific enough in summarizing the departments’ educational and service goals.

Paradoxically, the Arabic version of the college’s mission is also posted on the same

website, yet with an entirely different meaning as follows:

Preparing nationally qualified graduates in languages and humanities to cope with

sustainable development, achieving educational and research development by means

of modern technological and scientific advances to mobilize the community towards

the local, national, and international scientific and social fields. (college’s website,

2015)

This inconsistency may create some ambiguity for stakeholders. For example, when the

researcher first visited the college’s website, he was confused as to which mission, the

English or the Arabic one, guides the college’s goals, policies, and allocation of resources, as

each mission has a completely different purpose. Hence, one of these two mission statements

should be selected, and the other one should be removed.

The mission of the two departments (Promoting the overall performance of learners of

language, literature, and translation to achieve the possibly highest quality of learning

outcomes) is posted on the college’s website as well. Nevertheless, having examined the

department specifications report, it was noticed that the two departments have a different

mission, namely, “Preparing highly qualified cadres with skills and expertise in English

language, translation, and literature, and well-trained educated researchers who follow the

scientific approach in thinking and in dealing with technology, multi-faceted thinking, and

problem solving” (Department specifications report, 2015). This mission seems to have been

written clearly as required by CEA and NCAAA. Moreover, it includes “the elements

114
commonly understood to form the basis for a higher education mission” (Morphew, &

Hartley, 2006, p. 458).

The departments’ mission statement appears to fulfill only two criteria of those

mentioned by Conway et al. (1994); it highlights the departments’ goals and objectives, and it

includes the programs offered (i.e., literature, & translation). However, it neglects the

remaining four criteria: 1) relevance to the college mission (though it is almost aligned with

that of X University), 2) the target groups (EFL students), 3) the departments’ geographical

areas, and 4) the added value sought by the departments over other similar EFL departments.

Another drawback is that the mission statement does not have a controlling idea. That is,

when compared with the mission statement from a comparable department, for example, the

Department of English at King Abdulaziz University, which states “Building an interactive

learning environment for the rehabilitation of graduates in English language and translation

able to meet society's expectations for a promising future” (Department website, 2015), it is

clear that the latter has a controlling idea (creating an interactive English learning

environment), whereas the former does not. Furthermore, unlike that of X University, King

Abdulaziz University’s mission statement is developed in a suitable breadth.

Taking the above into account, the two departments’ mission statement fail to

provide a sense of the current identity of X University. In addition, based on data obtained

from the interviews, 11 out of 15 of the students asserted that they have no idea what the

mission statement of these two departments is. More importantly, data showed that this

mission statement was updated only once in the past four years. This may impede the mission

from reflecting “new goals or shifts in the focus of [their] educational programs or services

and whenever activities and policies are conceived, implemented, or revised” (CEA, 2015, p.

7). All of these factors suggest that this mission statement appears to only partially meet the
115
Mission Standard of CEA and NCAAA. A poorly phrased mission statement would make it

difficult to meet other standards such as ensuring alignment of the curriculum with the

mission, which is considered problematic for the CEA Curriculum Standard.

Evaluating the Goals and Objectives of the Two EFL Programs

Since CEA does not have a separate standard for goals and objectives, NCAAA

standards were used to identify whether the goals and objectives of the departments meet

these standards. The goals and objectives of the two departments are almost identical and can

be summarized as preparing students to become seasoned EFL teachers, professional

translators, competent linguists, and excellent researchers. These are clear, attainable, and

“consistent with and support the mission” (NCAAA, 2013, p. 8), which mostly fulfill the

NCAAA sub-standard (Relationship between Mission, Goals & Objectives). However, the

other objectives, such as training students on computer-assisted language learning, preparing

them for seeking higher studies, and developing their presentation and rhetorical skills are not

derived from the mission of the departments. That is, the mission does not indicate a desire to

instill in the students a sense of providing community service, which seems to be irrelevant to

the two departments’ mission.

Students’ Attitudes towards Course Objectives and Requirements

One of an instructor’s main duties is to ensure that students are fully aware of course

objectives (Allen, 2004). Out of the 202 participants who responded to this item, 13 (6%)

students indicated that they strongly agree that most of the departments’ course objectives are

communicated to them at the beginning of each course, while 95 (47%) of them agreed with

this item. On the other hand, 22 (11%) of the participants disagreed, and 12 (6%) strongly

disagreed with this statement, suggesting that only a small number of the students are

dissatisfied with the poor communication of course objectives or lack of them. However, 60
116
(30%) of the participants were uncertain (neither agree nor disagree) about the delivery of

course objectives at the beginning of the course. This high percentage poses serious questions

as to why these participants would be uncertain about whether course objectives were

communicated to them earlier. This can be attributed to several factors detailed in the

following discussion.

Moreover, some of these participants are likely to have “no clear idea of what the

intended outcomes of the course are,” and thus ending up being confused about the ultimate

goal of the course itself (Menges, Weimer, & Associates, 1996, p. 188). Another bewildering,

albeit reasonable, question is why are these students unsure about being informed of these

objectives in advance if, as Allen (2004) argues, articulating course objectives on a course

syllabus “allows students to make informed decision before enrolling, to monitor and direct

their own learning, and to communicate what they have learned to others, such as graduate

schools, employers, or transfer institutions” (p. 44)? To obtain further insight into this issue,

some of the participants were asked during interviews about their uncertainty. Seven out of

15 argued that they are rarely provided with written syllabi that highlight course objectives.

In fact, one of the female participants asserted that the she has never been given a syllabus in

writing.

Standard Two: Curriculum

The second and one of the key standards of CEA is curriculum, which refers to the

goals, methods, and assessment through which an academic program achieves its targeted

instructional objectives (Nation & Macalister, 2009). As for NCAAA, the word curriculum is

not mentioned in their handbook; rather, all curriculum components are distributed across

various standards and sub-standards. To better evaluate the curricula of the two EFL

departments, it is easier to follow CEA classification of curriculum as: 1) an educational goal


117
or purpose, 2) objectives for each course in the curriculum, 3) statements of student learning

outcomes, 4) processes for teaching and learning, and 5) means of assessment. Nonetheless,

the educational goal and purpose as well as the objectives for each course in the curriculum

have been discussed in the Mission, Goals, and Objectives standard, so we are left with the

remaining three components.

The two departments do not have a curriculum in writing, per se; instead, they have a

Study Plan and Program Specifications from which adequate information about the

curriculum components can be obtained. Having compared these two documents against CEA

and NAAA standards, there were some interesting findings. The departments’ study plan is

fairly consistent with their mission. That is, it has “adequate and appropriate levels to meet

the assessed needs of the students through a continuity of learning experiences” (CEA, 2015,

p. 9). Moreover, it has several strengths in that it is designed with a logical progression from

one level to the next through eight levels. It also has consistent objectives across courses and

levels in a way that is appropriate for students’ needs. Nevertheless, one of the drawbacks of

this study plan is that it is not well-organized, nor does it allow evaluators and stakeholders to

monitor and document students’ performance. Despite this, the study plan of the two

departments appears to partially meet the standards with further improvements needed.

Student Learning Outcomes

Student learning outcomes (SLOs) are well-developed statements of “what a learner is

expected to know, understand and/or be able to demonstrate at the end of a period of

learning” (Adam, 2004, p. 2). They are “broad statements of what is achieved and assessed at

the end of a course of study” (Harden, 2002, p. 155). Although SLOs of the two departments

are specified, measurable, attainable, aligned with each other, and relatively consistent with

course objectives, they do not “represent significant progress or accomplishment” (CEA,


118
2015, p. 10). This is because they do not reflect various levels of students’ cognitive,

affective, and interpersonal skills as required by NCAAA. For example, labeling

“autonomous and collaborative learning” and “self-confidence and the responsibility of

directing the classroom management” as cognitive and interpersonal skills respectively is

inadequate for meeting the NCAAA standards.

Figure 3. Student learning outcomes cover all language skills.

However, what concerns the researcher is the extent to which SLOs cover all

language skills. According to students’ responses (Fig. 3), out of 115 participants who

responded to this item, 12 strongly agreed that SLOs cover all language skills, 41 agreed, 26

were uncertain, 30 disagreed, and 6 strongly disagreed. This suggests that SLOs of the two

departments need to be reviewed. For example, three out of six teaching staff participating in

this study indicated that SLOs do not appear to cover all language skills. Based on these

findings, SLOs practices appear to partially meet the standards of both the CEA and NCAAA

due to the abovementioned considerations.

Teaching Strategies

Among various NCAAA standards, Learning and Teaching standard is the largest

standard in length, suggesting that it plays a vital role in helping an institution or program

obtain academic accreditation. Therefore, both students and teaching faculty were asked

about the practices of this standard extensively. The graphs below (Figs. 4-7) show students’

119
responses to four key items of the Teaching Strategies section, which were derived from the

NCAAA fourth standard:

Figure 4. Teaching strategies are proper for various learning styles. Figure 5. Instructors communicate with you in English.

Figure 6. You are engaged presentations & leading discussions Figure 7. You are engaged in research projects.

Moreover, five students strongly agreed, 46 agreed, 23 were uncertain, 30 disagreed,

and 12 strongly disagreed that teaching strategies used in the two departments are proper for

their different learning styles. Yet, the participants complained that rarely do teaching faculty

speak English. Moreover, they stated that due to the large number of students per class, they

are not given the opportunity to demonstrate their learning. On the other hand, their responses

to the extent to which they are engaged in research projects varied in that 10 strongly agreed,

38 agreed, 40 were unsure, seven strongly disagreed, and 21 disagreed. Based on these

results, it can be concluded that teaching strategies at the two departments appear to partially

meet the standards.

120
Assessment Methods

Effective assessment methods help faculty make “informed decisions about

pedagogical or curricular changes” in addition to improving “student mastery of learning

objectives that faculty value” (Allen, 2004, p. 13). Thus, it is crucial to ensure that program

assessment methods are “for learning (i.e., formative in nature) rather than of learning (i.e.,

not summative, as in conventional tests) (Cumming, 2009, p. 93). Despite this, two out of six

teaching faculty indicated that they only use direct assessment tools (e.g., midterm/final

exams, homework). Lack of using other forms of assessment (e.g., performance assessment)

may not help identify students’ strengths and weaknesses (McNamara, 1996). When asked

about the infrequent use of alternative assessment methods, one faculty member noted that

they have “a set schedule to follow,” which does not give them the flexibility to diversify

assessment methods to meet students’ learning differences.

Figure 8. Are assessment methods communicated in advance?

One of the crucial good practices of the CEA and NCAAA assessment substandards is

to inform students of assessment methods in advance. The two departments have achieved a

noticeable progress in this respect. For example, out of 112 students who responded to this

item, 10 students strongly agreed and 34 agreed that assessment methods are communicated

to them in advance. Nevertheless, only 16 students disagreed and 12 strongly disagreed with

this statement, which can be considered a small percentage, yet it still needs to be addressed.

121
Moreover, course assessment methods are set out in student handbook distributed to students

at the orientation day.

Standard Three: Student Achievement

Since assessment methods are closely related to CEA Student Achievement Standard,

it may be better to shed some light on the practices of the two departments relating to this

standard. First, CEA requires that assessment practices be undertaken consistently throughout

the program and aligned with its mission, curriculum, objectives, and student learning

outcomes. As shown in Figure 9, students revealed different, yet profound insights into

assessment methods used in the two departments. That is, 16 out of 186 students strongly

agreed that assessment practices are used consistently, 57 agreed, 38 disagreed, and 19

strongly disagreed. One perplexing finding is that almost 56 participants were unsure about

this, which has raised the researcher’s curiosity to know why. After being asked about their

uncertainty, eight of these students argued that even though assessment methods are

undertaken consistently, they are limited to quizzes, a midterm, and a final exam. They

suggested that assessment methods be diversified so that their understanding and mastery of a

course can be examined effectively.

Figure 9. Are these methods used consistently?

As for NCAAA standards, students’ responses to the assessment methods section suggest that

assessment practices in the two departments do not appear to meet the standards. For

122
example, the majority of the participants indicated that they are either unsure (68), disagree

(32), or strongly disagree (10) that assessment techniques measure aspects on which they

expect to be measured, grade distribution is appropriate to course objectives, and exams and

assignments are graded and returned to them promptly, all of which imply that this standard

is not being met.

Standard Four: Program Development, Planning, and Review

As program evaluation processes can spark investigation across the program, this

form of investigation helps language program administrators document their program

performance, taking into account other considerations such as “the social and political basis

and motivation for language learning and teaching (Lynch, 1996, p. 10). Both the CEA and

NCAAA place extensive emphasis on program evaluation. The last standard of CEA

standards is Program Development, Planning, and Review, while in NCAAA it comes in a

form of two sub-standards of the fourth standard Learning and Teaching as Program

Development Processes and Program Evaluation & Review Processes. The departments’ data

about this standard obtained from teaching faculty did not reveal any findings worth-

mentioning because their responses focused on two main themes: 1) lack adequate training in

applying quality assurance processes, and 2) inability to participate in quality assurance

processes effectively due to their heavy teaching loads (23 hours per week).

Quality Assurance Coordinator of the Male-section Program

In order to gain further insight into the two departments’ evaluation practices, one of

the teaching or administrative staff involved in program evaluation quality assurance

processes was interviewed apart from the department heads. Therefore, this section will be

allocated for discussing the interview responses obtained from the coordinator of the quality

assurance unit, John, with whom the researcher recently spoke using Line, a widely-used
123
communication application. The interview with John, who has been the quality assurance

coordinator for the male student department for three years, took nearly 30 minutes. He

shared his experience in taking over the department’s quality assurance processes. He

indicated that during the past three years, quality assurance evaluation and processes were

unsatisfactorily slow for several reasons to be discussed below.

John indicated that the department does not have a plan in writing for planning,

implementing, or evaluating its performance and quality assurance practices. Instead, he

stated, that the NCAAA handbook has been used as a plan for the department evaluation

processes. However, the NCAAA requires that all academic programs and/or departments

have a plan in writing for managing all “quality assurance processes across the [program],

and the adequacy of responses made to evaluations processes that are made” (NCAAA, 2013,

p. 18). The lack of such plan has resulted in not setting out a route for reviewing the

departments’ components such as teaching strategies, assessment practices, learning

activities, policies, and services. To address this issue, a quality assurance unit was

established at the two departments in 2013. Although this unit takes on all responsibilities of

department evaluation and quality assurance processes, it still does not comply with the

NCAAA standard and sub-standards of Administration of Quality Assurance.

Although all teaching and other staff are engaged in program evaluation processes,

there is a lack of quality assurance and program evaluation literacy. According to the male

department annual report (2014), teaching faculty’s participation in department evaluation

processes were solely limited to filling in the NCAAA course reports without involvement in

department planning, implementing, and monitoring of overall department evaluation. John

complained about the lack of training teaching faculty on quality assurance processes.

Furthermore, John described a lack of effective cooperation with other departments affiliated
124
with the same college regarding exchanging experiences about quality assurance processes.

Another issue concerns the instructional staff’s teaching and administrative workloads. For

example, John is responsible for teaching 23 hours per week in addition to supervising

quality assurance and examination processes. This does not meet the NCAAA substandard

that states “Sufficient time should be given for a senior member to supervise quality

processes” (p. 17).

Unlike many language programs, where a program evaluation director can exercise

authority over all available or needed resources (Westerheijden, et al., 2007), John, along

with his team, was not given adequate authority that meets the standards of the NCAAA and

CEA. Providing a quality assurance or program evaluation coordinator with sufficient

authority to obtain all necessary is a basic requirement of the NCAAA and CEA so that s/he

can close the loop of data feedback (Allen, 2004, p. 17). However, John indicated that he

finds difficulty obtaining program evaluation-related data from various sections such as the

program, the college, and other similar programs, for he is not authorized to do so. This has

resulted in poor practices of the two departments in evaluating, reporting on, and improving

quality assurance arrangements. Moreover, when John asked the college to provide him with

funding for conducting external benchmarking, he said, “the college refused to do so for

unrealistic reasons.”

To promote an effective practice of program evaluation, both the NCAAA and CEA

advocate conducting an independent or even a mock evaluation. Nevertheless, the male

student department has had only one informal external evaluation, which was conducted by a

visiting professor from King Saud University. Although the external reviewer provided

concrete recommendations for improvement, he has different areas of interest, as he is a

professor of Arabic. According to John, despite the external reviewer’s great efforts, he was
125
unable to pinpoint areas of weaknesses from which language-related suggestions could be

made for further improvement and systematic review. On the other hand, the female

department has never been subjected to any evaluation. When asked about this, the

department head stated that they have no access to data needed for internal or external

benchmarking, as their role is only limited to filling in the NCAAA forms. Thus, it appears

that this standard is not being met for either the NCAAA or CEA.

Student Dissatisfaction with the Two EFL Departments

Now that the extent to which the two departments meet the integrated model of

standards has been identified, one can understand the students’ level of satisfaction with the

departments. One noteworthy issue is that the evaluation findings of some standards may help

us identify students’ satisfaction with the two departments (e.g., curriculum, learning

outcomes, teaching strategies, assessment methods), while others may not provide us with

sufficient information regarding students’ satisfaction (e.g., mission statement, program

review). Having conversed with some male students, the researcher noticed that they seemed

dissatisfied with some of the department’s educational activities. For example, six out of 15

students who participated in the interviews indicated that they have not noticed dramatic

progress in their English skills. Specifically, one student argued that some of the courses do

not promote students’ critical thinking skills; instead, they primarily rely upon memorization.

Moreover, nine of the 15 participants expressed anxiety and frustration with their poor

speaking and writing skills in English. One of them argued that most of the courses are

designed in a way that improves their receptive skills (e.g., listening and reading), but

neglects improving their productive skills. Another student indicated that he had to engage

himself in communicative activities outside the classroom, which has helped him speak

English intelligibly. Unfortunately, the NCAAA standards do not include theoretical


126
underpinnings about L2 oral fluency, for these standards are developed for various areas of

specialization, the CEA standards; on the other hand, place substantial emphasis on

improving language skills as a whole. Having examined the male department’s objectives

(See Appendix Q), it can be noticed that instead of focusing on developing students’

language skills, they address other important skills such as technical, research, and

intercultural competence skills.

Furthermore, one of the students complained that although they are “placed in a safe,

clean, livable environment” (CEA, 2015, p. 29), many students complain that it is not a

motivating learning environment. He said, “Some of the department courses are taught in a

high-school manner wherein the instructor dominates the classroom discourse without

engaging students in interactive discussions” (personal communication, April 18, 2015)

When asked about this issue, one of the female faculty members indicated that this could be

true to some extent, pointing out that teaching students who have drastic learning

backgrounds makes it difficult to diversify teaching strategies. Thus, some teaching faculty

members use merely a lecture-based teaching format. Another teaching faculty complained

that having a large number of students per class does not allow them to avoid lecturing and

engaging students in interactive discussions.

One of the students complained that some non-native English speaking teaching

faculty members speak Arabic frequently. To determine if this was true, the researcher asked

some faculty members how often they spoke Arabic inside class. Two of them asserted that

they speak English most of the time; however, when they want to provide students with

important instructions (e.g., about exams), they often repeat what they have said in Arabic.

Another student complaint was that their assignments are not graded and returned in a timely

manner. According to the survey results, 20 students strongly agree that their work is not
127
graded and returned to them, 55 agree, 56 are uncertain, 30 disagree, and 25 strongly disagree

(Fig. 10).

Figure 10. Students’ responses to how often their work is graded and returned to them

promptly.

Student Satisfaction with the Practices of the Two Departments

Although most of the practices discussed earlier were found inadequate to fully meet

the study-integrated model of standards, students seemed satisfied with some of these

practices. For example, it can be noted that the percentages of students who strongly agree

and agree with most of the statements outweigh those who strongly disagree or disagree. This

may suggest that the practices of these statements are relatively satisfying, with further

improvements needed. Another example is that one of the highest overall ratings of student

satisfaction of the practices was about student learning outcomes. When asked about the

extent to which SLOs meet their learning expectations, 95 out of 190 participants (50%)

seemed satisfied with SLOs, while only 37 of them (20%) were dissatisfied that SLOs did not

meet their expectations (Fig. 11).

128
Figure 11. Students’ responses to the extent to which SLOs meet their expectations.

In addition, as a former student and current lecturer at the male EFL department, the

researcher has noticed progress in numerous aspects of the department performance. For

example, compared to its performance in 2008, when the department had its first review, the

male department currently operates according to predetermined standards and KPIs based on

which the achievement of the department outcomes are verified. Teaching strategies, SLOs,

and assessment methods are all evaluated on a semesterly basis and the evaluation results are

included into the NCAAA templates for improvement. For example, each course coordinator

is responsible for preparing course blueprints indicating the consistency between the intended

SLOs with teaching and assessment strategies (Male Department Annual Report, 2014).

Another satisfying practice of both the male and female departments, which was not

included into the survey but naturally arose during the interviews, was concerned with the

academic and career counseling and advice provided for students. Two out of 15 of the

participants commended the efforts of teaching faculty in providing students with effective

advising about their current academic performances and potential career opportunities. They

indicated that the department administrators have allocated separate offices for student

individual counseling services. Moreover, four of the female participants pointed out that

whenever they face any form of learning difficulties, the academic advisors to which they are

assigned provide them with all needed counseling and assistance. These good practices meet

one of the salient substandards of the NCAAA, namely, Educational Assistance for Students.
129
Generally, students appeared to be satisfied with the two departments’ newly

established facilities. In the interviews, three of the male students indicated that unlike the old

building, the environment of the new building is clean, attractive, and well maintained, which

meets Standard 7, Facilities & Equipment (NCAAA, 2013, p. 39). Despite their complaints

about the lack of Internet connectivity, they also noted that the main library provides them

with “efficient access to online databases, research, and journal material” relating to the

department courses, which meets Standard 6, Learning Resources (NCAAA 2013, p. 35). The

female department has also recently transferred to a new building equipped with adequate

technological and learning resources that facilitate the educational process. In the interviews,

having attended classes at both buildings, five out of nine participants indicated that the new

building is well-equipped and adapted to meet their learning requirements as opposed to the

old building.

Good Quality Assurance Practices

Given that the ultimate purpose of this paper is to evaluate the readiness of the two

departments for obtaining academic accreditation from the NCAAA and CEA in the future, it

would be very useful to highlight both good and potentially poor practices so that they can be

considered carefully for future enhancement and improvement respectively. In general, the

two departments have numerous good practices that the researcher has identified in this

study. For example, although the two departments’ mission statement seems to partially meet

the Mission Standard of the CEA and NCAAA, it only needs few improvements with greater

emphasis placed on students’ language skills, and then it will fulfill most of the practices of

the mission standard of the NCAAA and CEA. Therefore, if the problematic issues raised

earlier about the departments’ mission statement are addressed effectively, the mission

standard will be one of the strongest standards in meeting the two commission standards.
130
Another good quality assurance practice of the two departments is collecting data on

student achievement of SLOs, advancement from one level to another, graduation, and so

forth (Male Department Annual Report, 2014), which is highlighted as a positive aspect in

the CEA Program Development, Planning, and Review standard. According to John, most

data is collected on a semester basis and included into the NCAAA evaluation templates for

future use and improvements. In each department, John continued, a coordinator is assigned

to oversee the extent to which course reports and specifications are prepared on a semester

basis according to the NCAAA templates. Another good quality practices was evident in

rating all practices on an annually basis. For example, according to the two departments’

recent annual reports, feedback from stakeholders (e.g., students, teaching staff, employers,

alumni) is obtained for improvement purposes.

Poor Quality Assurance Practices

Despite the good practices discussed above, the two departments still have some poor

quality assurance practices that need to be addressed in order to improve their performance.

First, as stated in their annual report (2014), the two departments fail to conduct internal or

external benchmarking against quality comparable EFL departments from which

recommendations for improvement can be made. Second, most of the quality assurance

processes, as John noted, are based on individual initiatives even though there is a quality

assurance unit in both departments. In other words, since quality assurance processes at the

two departments are not institutionalized, this results in not involving all teaching and other

staff in quality assurance processes. This was noticed in John’s complaints about the lack of

having a plan in writing for quality assurance practices, lack of cooperation among quality

assurance coordinators at different departments, and lack of having access to data necessary

for program evaluation.


131
Another poor quality assurance practice is that the departments’ data is not utilized

well. In other words, although the two departments lack necessary data for conducting

benchmarking, they can still overcome this issue by utilizing any data at hand. Unfortunately,

despite collecting some essential data about student performance (e.g., student achievement

and progression) as discussed earlier, the departments’ annual reports do not indicate that

these data are used for improvement (closing the loop). For example, no department has ever

conducted internal benchmarking against its performance from previous years. Another

observed poor practice of the two departments was that their quality assurance processes are

not themselves evaluated and reported on a regularly basis. This is one of the fundamental

quality assurance practices that the NCAAA highlights in its substandard Administration of

Quality Assurance Processes.

Assessment-related Dilemmas

One of the potential dilemmas that may delay the two EFL departments from

obtaining academic accreditation from the NCAAA and CEA is concerned with their

assessment practices. First, as indicated in their annual reports (2014), the departments lack

using performance-based assessment approaches such as self-assessment, task assessment,

group-project assessment, portfolios and so forth. Second, the two departments fail to engage

students in a culminating experience course through which they can demonstrate their gained

skills throughout their learning experience in the departments. For example, the study plans

of the two departments lack a capstone or research course that verifies students’ overall

achievement. Third, although one of the two departments’ main goals is to graduate EFL

teachers, their study plans lack a teaching practicum course, which is a basic requirement of

NCAAA (i.e., field experience).

Quality Assurance-related Dilemmas


132
As for issues relating to quality assurance practices, the two departments lack training

their teaching and other staff on quality assurance and program evaluation processes. As

noted earlier, this poses a barrier for obtaining academic accreditation especially from the

NCAAA. In other words, one of the NCAAA substandards requires that “all teaching and

other staff participate in self-evaluations and cooperate with reporting and improvement

processes” (p. 16). Thus, it is likely that teaching and other staff find difficulty participating

in self-study evaluations without having adequate training programs about quality assurance

processes. Moreover, another potential dilemma of obtaining accreditation from the CEA is

the lack of having a program evaluation plan in writing that “guides the review of curricular

elements, student assessment practices, and student services policies and activities” (CEA,

2015, p. 46).

DISCUSSION AND CONCLUSION

In general, the two EFL programs are to be commended for their efforts in applying

academic accreditation standards. Nonetheless, it is clear that their efforts lack adequate

experience in program evaluation, preventing them reaching an efficient evaluation. In other

words, the researcher noticed that most of the practices lack essential compliance with the

standards of both CEA and NCAAA. For example, the programs’ mission is lacking for

several reasons. First, the programs have two different missions: one stated in program

specifications and another one posted on the university’s website. This does not meet the

NCAAA sub-standard “The mission should be publicized widely within the institution and

action taken to ensure that it is known about and supported by teaching and other staff and

students” (NCAAA, 2013, p. 8). In addition, neither department has a curriculum in writing,

which made it difficult for me to document evidence as they are scattered in various

133
documents. Nevertheless, the curriculum of the two programs appears to partially meet the

standards of both the CEA and NCAAA.

As for SLOs, they seem to meet the standards partially as they do not “represent

significant progress or accomplishment” (CEA, 2015, p. 10). Another SLO issue is that they

do not reflect various levels of students’ cognitive, affective, and interpersonal skills as

required by the NCAAA. Furthermore, teaching strategies were also found to only partially

meet the standard, for they are not appropriate for various learning styles. In addition, some

students were unsatisfied with why some teaching faculty members persistently speak Arabic.

Furthermore, both assessment practices and student achievement do not meet the standards

for the considerations discussed before. Therefore, this standard requires immediate action

leading to further improvements. Finally, based on the data obtained, the Program

Development, Planning, and Review standard appears to only partially meet the standards of

the CEA and NCAAA.

Implications

Standard One: Mission/Goals/Objectives. First, given that the two departments

have two different mission statements, only one of them should be chosen and posted on the

university website. It is also recommended that the mission statement fulfill all six criteria

proposed by Conway et al. (1994). More specifically, it lacks relevance to the college

mission, the target groups (EFL students), the departments’ geographical areas, and the added

value sought by the departments similar to other EFL departments. In addition, in order to

develop a mission statement that meets the NCAAA and CEA standards, the mission should

have a controlling idea that guides the two departments’ activities, policies, and allocation of

resources. Finally, a plan for regularly reviewing, evaluating and reaffirming the mission

statements of the two departments in light of changing conditions should be developed.


134
Standard Two: Curriculum. As discussed earlier, the two departments do not have

an organized curriculum; instead, they have several documents representing components of a

curriculum such as a study plan, course specifications, and so forth. Therefore, it is

recommended that a separate curriculum be written in a guide and be made clear for all

stakeholders and evaluators. Moreover, SLOs should focus on all learning domains as

proposed by Bloom (1956) and be expressed in observable, measurable terms as required by

the standards. In addition, teaching strategies used in the two departments should be proper

for various types of learning outcomes. Performance assessment techniques should also be

integrated into assessment practices. Furthermore, the departments’ objectives should focus

on developing students’ language skills. Finally, teaching faculty should be trained on “the

theory and practice of student assessment” (NCAAA, 2013, p. 23).

Standard Three: Student Achievement. Closely related to the NCAAA student

assessment standard is the CEA Student Achievement Standard, which needs further

improvement. For example, it is highly recommended that systems be established “for central

recording and analysis of course completion and program progression and completion rates”

(NCAAA, 2013, p. 23) by “timely reporting to students of their progress through a level

and/or completion of the course” (CEA, 2015, p. 41). Another significant implication is

concerned with ensuring that all assessment practices are aligned with program-level

outcomes to ensure that they meet the professions of teaching, translating and interpreting.

Finally, it is highly recommended that an assessment expert be hired at the two departments

as an assessment coordinator to ensure that assessment practices meet the targeted standards.

Standard Four: Program Development. Planning, and Review.

One of the essential implications for promoting the two departments’ development,

planning, and review processes is by regularly reviewing their mission statements,


135
curriculum, teaching strategies, and assessment practices. Moreover, teaching faculty should

be trained on program evaluation and quality assurance processes so that they can obtain

adequate knowledge and skills needed for undertaking these processes. Moreover, to make

quality assurance and program evaluation processes consistent and involve all teaching and

other staff, it is necessary to establish an institutionalized quality assurance system and

include it into the job descriptions of teaching and other staff to ensure that quality assurance

practices are integrated into normal planning and development strategies. Furthermore, all

available data should be utilized for self-evaluation purposes instead of relying on difficult-

to-obtain data. Finally, quality assurance coordinators should be given delegated authority to

perform quality assurance processes.

136
CHAPTER 5: CONCLUSION

The findings of this dissertation will contribute to the literature of L2 testing,

assessment, and evaluation even though some of them still need further research. For

example, although the findings of article one, “Saudi Student Placement into ESL Program

Levels: Issues beyond Test Criteria”, revealed that 11 out of 27 of the pilot study participants

(40%), 14 out of 71 of the Saudi ESL students (20%), and 57 out of 216 of GCC students

(18%) reported that they deliberately underperformed on ESL placement tests, these cannot

be generalizable due to lack of obtaining concrete evidence. In other words, program

placement and replacement rates, Saudi ESL students’ placement rates across different

programs (obtained from SACM), and participants’ language performance were not obtained

as concrete evidence to support the findings. Moreover, the substantial discrepancies between

the responses of ESL administrators and assessment coordinators as well as those of

participating students bring into question the validity of the latter. That is, if the issue at hand

really exists in these percentages, one may wonder why the vast majority of ESL

administrators, at least in this study, are unable to detect the issue.

As a consequence, given that there are 32,557 Saudi IEP students enrolled in US

programs, and perhaps double this number of other IEP scholarship students, the findings of

article one suggest an urgent need for further research studies that better provide more

tangible evidence of ESL students purposefully underperforming on ESL placement tests.

More specifically, future researchers should avoid asking students directly if they have ever

intentionally performed poorly on ESL placement tests. It is recommended that they allow

such findings to occur naturally in order to make the data more valid and reliable. In sum,

regardless of the differences between the percentages of the study’s participants, it can be

concluded that this study’s issue indeed exists and is worth addressing. Therefore, this study
137
proposes some concrete suggestions that may contribute to eradicating or at least mitigating

this issue.

The findings of the second article, “Integrating Self-assessment Techniques into L2

Classroom Assessment Procedures”, encourage many ESL/EFL programs to incorporate self-

assessment techniques into their assessment procedures. That is, although self-assessment

techniques have long been utilized in many L2 classrooms (Ekbatani, 2011), several language

programs, especially in Saudi Arabia, still lack using not only self-assessment rubrics but also

many other alternative assessment techniques. As a result, this study suggests that much more

attention be paid to using alternative assessment techniques, including self-assessment

measures, in order to promote student-centeredness (Nunan, 1988), enhance student learning

(Boud 1995; Taras, 2010), and augment learner autonomy (Benson, 2012). Nonetheless, the

study concluded that a larger number of participants should be recruited in order to identify

any potential correlation between participants’ self-assessed scores and their TOEFL/IELTS

scores.

More specifically, this study investigated the experience of only 21 ESL participants

using the CEFR self-assessment rubric, which was inadequate to draw valid and

generalizable conclusions that answer the research questions and accurately reflect students’

attitudes and experiences in using self-assessment rubrics. However, although the study did

not reveal a statistically significant correlation between the two measures, ten participants

reported that they found self-assessment rubrics very accurate in reflecting their actual levels

of language proficiency. For example, they provided valuable insights about their experience

in using CEFR assessment rubric including but not limited to pinpointing their areas of

strength and weakness, engaging them in higher language skills such as interaction,

strategies, and language quality, and involving them more in their own learning. Generally,
138
self-assessment rubrics can be very effective especially if they are first used as a supplement

to traditional assessment methods, taking into consideration the importance of training

students on using them effectively.

As for the third article, “Quality Assurance and Accreditation as Forms for Language

Program Evaluation: A Case Study of Two EFL Departments in A Saudi University”, it is

hoped that it will contribute to raising language program stakeholders’ awareness of the

importance of program evaluation as a process for enhancing student learning outcomes. In

general, the article concluded that the two EFL programs that participated in the study should

be commended for utilizing many good quality assurance practices. However, they still lack

adequate background regarding program evaluation. In other words, it was found that most of

their practices lack compliance with the standards of the CEA and NCAAA due to lacking

adherence to the two agencies guidelines. For example, the two departments lack a program

evaluation plan in writing, which resulted in dissatisfaction of students with some essential

standards and substandards. Moreover, lack of having a curriculum in writing is another issue

common to the two departments, which may make it difficult to document evidence on their

current performance. However, student achievement standards are not being met due to the

absence of using alternative assessment techniques.

It is important to note that this study is not solely intended for EFL programs, but it

also provides all language programs with a simulated evaluation that can be used as a

reference for future program evaluation projects. In other words, in addition to making this

simulated study an internal benchmark for the remaining 16 EFL departments at X

University, it can also be beneficial to program administrators of other disciplines. For

example, the findings here can serve any program that seeks to obtain programmatic

accreditation from the NCAAA since the study integrated a set of standards can be applicable
139
to any post-secondary programs in Saudi Arabia. In conclusion, administrators of any

program in general but language programs in particular are highly advised to make their

program evaluation processes as systematic as they can in order to meet the standards of the

target accreditation commission.

140
APPENDIX A: PILOT STUDY FINDINGS

141
8. If you have other reasons, please mention them here.

142
11. Do you think the Ministry of Education should establish effective mechanisms to

enhance your English skills before you study abroad?

143
APPENDIX B: SURVEY QUESTIONS

If you are a Saudi ESL student (who has once taken an ESL placement test), would

you please participate in this survey to help me collect some data for my dissertation research

project? My name is Adnan Alobaid, and I am a Ph.D. candidate in Second Language

Acquisition and Teaching (SLAT), University of Arizona, Tucson, USA. For my doctoral

dissertation, I am conducting a research study on the placement of Saudi ESL students into

ESL levels. If you decide to participate in this study, you would have to respond to a 20-item

online survey, which will take you approximately 10-15 minutes. Then, I will, if you provide

me with your email address, interview you either at your ESL program, at any other place

you prefer, over Skype, or via any other online platforms. To participate in the survey, please

respond to the following items. However, at the end of the survey, you will be asked to

provide your email address in case you decide to participate in the interview part. Please

share this email with your friends, classmates, or any other categories that you think fit this

study. I will not collect any video recordings for this research; however, I will, with your

permission, audiotape and take notes during the interview. I might also need to collect one of

your classroom writing samples.

All survey responses, interview transcriptions, audio recordings, and writing samples

will be stored on a password-protected drive. The drive will be kept in a locked departmental

office at the University of Arizona for six years past the date of project completion. However,

no one other than (the primary investigator) will have access to the data. Thus, all of your

information and responses will be kept confidential. Also, your name will not be recorded on

the tape as well as will not be associated with any part of the written reports or oral

presentations. Your participation is voluntary. You may refuse to participate in this study.

You may also leave the study at any time. No matter what decision you make, it will not
144
affect your grades or future benefits. Participation Consent: I have read (or someone has read

to me) this form, and I am aware that I am being asked to participate in a research study. I

have had the opportunity to ask questions and have had them answered to my

satisfaction.

1. Do you voluntarily agree to participate in this study?

- Yes

- No

If No Is Selected, Then Skip To End of Survey

Q2. Do you allow the researcher to publish your responses for a research project and

potential publication? (Note that your name will not appear in the survey).

- Yes

- No

If No Is Selected, Then Skip To End of Survey

Q3. Gender?

- Male

- Female

Q4. Which of the following ESL programs are you currently or have previously

attended?

- (Program A).

- (Program B).

- (Program C).

- Others.

Q5. How long have you been studying English abroad in ESL Classes?
145
- 0 – 3 months.

- 4 – 6 months.

- 7 – 12 months.

- More than a year.

Q6. What is your current ESL level?

- Beginner.

- Lower-intermediate.

- Intermediate.

- Upper-intermediate.

- Advanced.

- Passed all ESL Levels

- Others.

Q7. When did you take the Placement Test?

- 0-2 months ago.

- 3-6 months ago.

- 7-12 months ago.

- More than a year ago.

- Never taken it before

If never taken it before Is Selected, Then Skip To End of Survey.

Q8. Do you think your placement scores reflected your actual level of language

proficiency?

- Definitely yes

- Probably yes
146
- Probably not

- Definitely not.

Q9. Do you think the content of ESL Placement Test reflected real-life language uses?

- Definitely yes

- Probably yes

- Probably not

- Definitely not.

Q10. The time given for completing the test was:

- Too long.

- Relatively long.

- Too short and inadequate to answer the test items.

- Relatively short.

- Appropriate for completing the test.

Q11. Did you intentionally, in one way or another, perform poorly on the ESL

Placement Test to be placed in lower levels?

- Yes.

- No.

If No Is Selected, Then Skip To Do you think the Saudi...

Q12. If yes, what are the reasons that made you intentionally place yourself into lower

levels? Please choose all boxes that apply.

- To have more time for learning English.

- To have adequate time to adapt to the program and educational system.

- To have adequate time to adapt to the city environment.

- To focus more on preparing for standardized tests (TOEFL, IELTS, GRE).


147
- Other reasons.

Q13. If you have any other reasons, please mention them here:

Q14. Who encouraged you to perform poorly in the test?

- A friend of mine who has already taken an ESL Placement Test.

- Friend(s) with whom I took the test.

- Advice obtained from the Internet.

- Myself.

Q15. If you did not have a scholarship, would you still internationally perform poorly?

- Yes.

- No.

- Maybe.

- I don't know.

Q16. Do you think the Saudi Ministry of Education should establish effective

mechanisms to enhance your English skills before you study abroad?

- Definitely yes

- Probably yes

- Probably not

- Definitely not.

Q17. If yes, what are some suggestions that help establish such mechanisms? Please

check all boxes apply.

- We should be given ESL courses prior to studying abroad.

- A TOEFL or IELTS score should be required to obtain a scholarship.

- We should be given more time to learn English in our country.


148
- Others.

Q18. If not mentioned above, what are other suggestions do you have?

Q19. Briefly, provide us with some comments about your experience in taking ESL

placement tests that you would like to share.

Q20. Please provide your email if you would you like the researcher to interview you

(either individually or in groups) to allow you to expand more on your responses? I do

look forward to meeting with you (either face-to-face or online) to obtain your opinions

about ESL Placement Tests. (Note that your information will be kept confidential).

149
APPENDIX C: INTERVIEW QUESTIONS

1. What are the challenges you faced when you took the ESL placement test?

___________________________________________________________________________

___________________________________________________________________________

___________________________________________________________________________

2. Do you think that the ESL placement tests that you have taken are accurate in reporting

your actual level of language proficiency? Why?

___________________________________________________________________________

___________________________________________________________________________

___________________________________________________________________________

3. Do you think that the ESL placement test that you had taken placed you in a level

appropriate to your language skills? Please explain.

___________________________________________________________________________

___________________________________________________________________________

___________________________________________________________________________

4. Did you prefer to be placed into a lower, higher, or the same level of that of your level of

proficiency? Why or why not?

___________________________________________________________________________

___________________________________________________________________________

___________________________________________________________________________
150
5. If you had intentionally performed poorly in the ESL placement test, did you benefit from

that afterwards? Why or why not?

___________________________________________________________________________

___________________________________________________________________________

___________________________________________________________________________

6. After you were placed into lower than-expected levels, were you able to improve your

English better more than if you were placed into a class that fits your level of proficiency?

Please explain.

___________________________________________________________________________

___________________________________________________________________________

___________________________________________________________________________

7. If you had not intentionally performed poorly on ESL placement tests, can you tell me why

you did not do so?

___________________________________________________________________________

___________________________________________________________________________

___________________________________________________________________________

8. If the Saudi Ministry of Education demands you to take a six-month ESL course in Saudi

Arabia prior to coming to the targeted country, would you commit to attending that class?

Please explain why and why not?

151
___________________________________________________________________________

___________________________________________________________________________

___________________________________________________________________________

9. If the Saudi Ministry of Education demands you to obtain a score of (4.5) in the IELTS or

a score of (40) in the TOEFL prior to studying abroad, would you be willing to do so? Why

and why not?

___________________________________________________________________________

___________________________________________________________________________

___________________________________________________________________________

10. Do you have any comments that you have not mentioned in the survey?

___________________________________________________________________________

___________________________________________________________________________

152
APPENDIX D: ESL ADMINISTRATORS & ASSESSMENT COORDINATORS

My name is Adnan Alobaid. I am a doctoral candidate in Second Language Acquisition and

Teaching (SLAT), University of Arizona, Tucson, USA. For my doctoral dissertation, I am

conducting a research study on the placement of Saudi scholarship students into ESL levels.

Based on my data, I found that some ESL Saudi students intentionally perform poorly on

ESL placement tests to be placed into lower ESL classes. This is because some of them need

to have more time learning English, preparing for standardized tests, applying for

universities, or avoiding returning to Saudi Arabia (as they did not obtain unconditional

admission). Research has reported negative consequences of such misplacements. Based on

his findings, Stewart (1999) for example reported that 22% of ESL placement test-takers

were misplaced. This rate has resulted in complaints among test-takers. Moreover, Crusan

(2002) argued that if misplaced, students would have to pay additional financial expenses. In

addition, Kokhan (2014) pointed out that cases of misplacement are likely to make students

fail to significantly improve their English. In my data, I am missing an important part, which

is concerned with the extent to which such behavior (students throwing a test in purpose) can

affect the whole placement system of an ESL program. Therefore, I would like to obtain your

valuable opinions on this issue. Moreover, if you, based on your permission, provide your

email, I will send you the findings of this study. Note: No names of participants or ESL

programs are required in this survey.

Do you agree to participate in this survey?

Yes (1)

No (2)

153
1. Have you ever noticed any incident of students throwing an ESL placement test in

purpose? If yes, what is the approximate rate?

2. How do you identify cases of misplaced students? Please check all that apply.

Teacher feedback.

Students’ class grades.

Students’ claims of being misplaced.

Others.

3. If you have other means of identifying cases of misplacement, please state them here:

4. How do you usually deal with significant misplacements? Please check all that apply.

A systematic diagnostic assessment is given to students.

Teacher evaluation.

Standardized tests are used as supplementary.

Others.

5. If you have other ways for alleviating misplacements, please state them here:

6. Suppose that almost 20% of scholarship students attending your program

intentionally perform poorly on the placement test, how would this affect the whole

placement system?

7. If you wish to receive the findings of this study, please provide your email address:

154
APPENDIX E: STUDY PARTICIPANTS

Total
Type of Participants Male Female

50 21 71
Saudi ESL Students
GCC Students
162 54 216

ESL Administrators and Assessment


17 17
Coordinators

155
APPENDIX F: CEFR, TOEFL, AND IELTS EQUIVALENCY TABLE

(TANNENBAUM & WYLIE, 2007)

Competency CEFR TOEFL IBT IELTS

0-8 0 - 1.0

Basic A1 9 - 18 1.0 - 1.5

A1 19 - 29 2.0 - 2.5

A2 30 - 40 3.0 - 3.5

Independent B1 (IELTS 3.5)

B1 41 - 52 4.0

B1 (IELTS 4.5) 53 - 64 4.5 - 5.0

B2

B2 65 - 78 5.5 - 6.0

Proficient C1 79 - 95 6.5 - 7.0

C2 96 - 120 7.5 - 9.0

156
APPENDIX G: CEFR SELF-ASSESSMENT RUBRIC DESCRIPTORS

Taken from the Bank of descriptors for self assessment in European Language Portfolios

© Council of Europe, Language Policy Division

1. Your First Name and Middle Initial (Again, your name will not appear in the study).

2. Gender:

m Male (1)

m Female (2)

3. What is your current ESL Level? (Note that your level of proficiency must be

intermediate, upper-intermediate, or advanced).

m Intermediate (1)

m Upper-intermediate (2)

m Advanced (3)

4. What was your most recently obtained TOEFL score? And when did you take the

test? If you are planning to take the TOEFL very soon, please provide the test date.

157
Welcome ESL Intermediate Students (B1)

Instructions:

I can do this

I cannot do this
a. Please try to assess your level of language proficiency by choosing the appropriate (I can do this or I cannot do

this) choice as accurate as you can.

b. If you think you can efficiently do the skill/task that each statement describes, choose the (I can do this) choice.

However, if you think you can't, in a way or another, do it, please select the (I cannot do this) choice.

Listening 1 2

I can follow clearly articulated speech directed at me in everyday conversation, though I sometimes have to ask for repetition of

particular words and phrases.

I can generally follow the main points of extended discussion around me, provided speech is clearly articulated in standard dialect.

I can listen to a short narrative and form hypotheses about what will happen next.

I can understand the main points of radio news bulletins and simpler recorded material on topics of personal interest delivered

relatively slowly and clearly.

I can catch the main points in TV programmes on familiar topics when the delivery is relatively slow and clear.

I can understand simple technical information, such as operating instructions for everyday equipment.

Reading 1 2

I can understand the main points in short newspaper articles about current and familiar topics.

I can read columns or interviews in newspapers and magazines in which someone takes a stand on a current topic or event and

understand the overall meaning of the text.

I can guess the meaning of single unknown words from the context thus deducing the meaning of expressions if the topic is familiar.

I can skim short texts (for example news summaries) and find relevant facts and information (for example who has done what and

where).

I can understand the most important information in short simple everyday information brochures.

I can understand simple messages and standard letters (for example from businesses, clubs or authorities).

In private letters I can understand those parts dealing with events, feelings and wishes well enough to correspond regularly with a

pen friend.

I can understand the plot of a clearly structured story and recognise what the most important episodes and events are and what is

significant about them.


Spoken Interaction 1 2

I can start, maintain and close simple face-to-face conversation on topics that are familiar or of personal interest.

I can maintain a conversation or discussion but may sometimes be difficult to follow when trying to say exactly what I would like

to.

I can deal with most situations likely to arise when making travel arrangements through an agent or when actually travelling.

I can ask for and follow detailed directions.

I can express and respond to feelings such as surprise, happiness, sadness, interest and indifference.

I can give or seek personal views and opinions in an informal discussion with friends.

I can agree and disagree politely.

Spoken Production 1 2

I can narrate a story.

I can give detailed accounts of experiences, describing feelings and reactions.

I can describe dreams, hopes and ambitions.

I can explain and give reasons for my plans, intentions and actions.

I can relate the plot of a book or film and describe my reactions.

I can paraphrase short written passages orally in a simple fashion, using the original text wording and ordering.

Strategies 1 2

I can repeat back part of what someone has said to confirm that we understand each other.

I can ask someone to clarify or elaborate what they have just said.

When I can’t think of the word I want, I can use a simple word meaning something similar and invite ”correction”.

Language Quality 1 2

I can keep a conversation going comprehensibly, but have to pause to plan and correct what I am saying – especially when I talk

freely for longer periods.

I can convey simple information of immediate relevance, getting across which point I feel is most important.

I have a sufficient vocabulary to express myself with some circumlocutions on most topics pertinent to my everyday life such as

family, hobbies and interests, work, travel, and current events.

I can express myself reasonably accurately in familiar, predictable situations.

Writing 1 2

I can write simple connected texts on a range of topics within my field of interest and can express personal views and opinions.

I can write simple texts about experiences or events, for example about a trip, for a school newspaper or a club newsletter.

159
I can write personal letters to friends or acquaintances asking for or giving them news and narrating events.

I can describe in a personal letter the plot of a film or a book or give an account of a concert.

In a letter I can express feelings such as grief, happiness, interest, regret and sympathy.

I can reply in written form to advertisements and ask for more complete or more specific information about products (for example a

car or an academic course).

I can convey – via fax, e-mail or a circular – short simple factual information to friends or colleagues or ask for information in such

a way.

I can write my CV in summary form.

160
Welcome ESL Upper-intermediate Students (B2)

Instructions: Students

I can do this

I cannot do this
a. Please try to assess your level of language proficiency by choosing the appropriate (I can do this or I cannot do

this) choice as accurate as you can.

b. If you think you can efficiently do the skill/task that each statement describes, choose the (I can do this) choice.

However, if you think you can't, in a way or another, do it, please select the (I cannot do this) choice.

Listening 1 2

I can understand in detail what is said to me in standard spoken language even in a noisy environment.

I can follow a lecture or talk within my own field, provided the subject matter is familiar and the presentation straightforward and

clearly structured.

I can understand most radio documentaries delivered in standard language and can identify the speaker’s mood, tone, etc.

I can understand TV documentaries, live interviews, talk shows, plays and the majority of films in standard dialect.

I can understand the main ideas complex speech on both concrete and abstract topics delivered in a standard dialect, including

technical discussions in my field of specialisation.

I can use a variety of strategies to achieve comprehension, including listening for main points; checking comprehension by using

contextual clues.

Reading 1 2

I can rapidly grasp the content and the significance of news, articles and reports on topics connected with my interests or my job,

and decide if a closer reading is worthwhile.

I can read and understand articles and reports on current problems in which the writers express specific attitudes and points of

view.

I can understand in detail texts within my field of interest or the area of my academic or professional speciality.

I can understand specialised articles outside my own field if I can occasionally check with a dictionary.

I can read reviews dealing with the content and criticism of cultural topics (films, theatre, books, concerts) and summarise the

main points.

I can read letters on topics within my areas of academic or professional speciality or interest and grasp the most important points.

I can quickly look through a manual (for example for a computer program) and find and understand the relevant explanations and

help for a specific problem.


I can understand in a narrative or play the motives for the characters’ actions and their consequences for the development of the

plot.

Spoken Interaction 1 2

I can initiate, maintain and end discourse naturally with effective turn-taking.

I can exchange considerable quantities of detailed factual information on matters within my fields of interest.

I can convey degrees of emotion and highlight the personal significance of events and experiences.

I can engage in extended conversation in a clearly participatory fashion on most general topics.

I can account for and sustain my opinions in discussion by providing relevant explanations, arguments and comments.

I can help a discussion along on familiar ground confirming comprehension, inviting others in, etc.

I can carry out a prepared interview, checking and confirming information, following up interesting replies.

Spoken Production 1 2

I can give clear, detailed descriptions on a wide range of subjects related to my fields of interest.

I can understand and summarise orally short extracts from news items, interviews or documentaries containing opinions,

argument and discussion.

I can understand and summarise orally the plot and sequence of events in an extract from a film or play.

I can construct a chain of reasoned argument, linking my ideas logically.

I can explain a viewpoint on a topical issue giving the advantages and disadvantages of various options.

I can speculate about causes, consequences, and hypothetical situations.

Strategies 1 2

I can use standard phrases like ”That’s a difficult question to answer” to gain time and keep the turn while formulating what to

say.

I can make a note of ”favourite mistakes” and consciously monitor speech for them.

I can generally correct slips and errors if I become conscious of them or if they have led to misunderstandings.

Language Quality 1 2

I can produce stretches of language with a fairly even tempo; although I can be hesitant as I search for patterns and expressions,

there are few noticeably long pauses.

I can pass on detailed information reliably.

I have sufficient vocabulary to express myself on matters concerned to my field and on most general topics.

I can communicate with reasonable accuracy and can correct mistakes if they have led to misunderstandings.

Writing 1 2

162
I can write clear and detailed texts (compositions, reports or texts of presentations) on various topics related to my field of

interest.

I can write summaries of articles on topics of general interest.

I can summarise information from different sources and media.

I can discuss a topic in a composition or “letter to the editor”, giving reasons for or against a specific point of view.

I can develop an argument systematically in a composition or report, emphasising decisive points and including supporting

details.

I can write about events and real or fictional experiences in a detailed and easily readable way.

I can write a short review of a film or a book.

I can express in a personal letter different feelings and attitudes and can report the news of the day making clear what – in my

opinion – are the important aspects of an event.

163
Welcome ESL Advanced Students (C1)

Instructions:

I can do this

I cannot do this
a. Please try to assess your level of language proficiency by choosing the appropriate (I can do this or I cannot do

this) choice as accurate as you can.

b. If you think you can efficiently do the skill/task that each statement describes, choose the (I can do this) choice.

However, if you think you can't, in a way or another, do it, please select the (I cannot do this) choice.

Listening 1 2

I can follow extended speech even when it is not clearly structured and when relationships are only implied and not signalled

explicitly.

I can understand a wide range of idiomatic expressions and colloquialisms, appreciating shifts in style and register.

I can extract specific information from even poor quality, audibly distorted public announcements, e.g. in a station, sports

stadium etc.

I can understand complex technical information, such as operating instructions, specifications for familiar products and services.

I can understand lectures, talks and reports in my field of professional or academic interest even when they are propositionally

and linguistically complex.

I can without too much effort understand films employing a considerable degree of slang and idiomatic usage.

Reading 1 2

I can understand fairly long demanding texts and summarise them orally.

I can read complex reports, analyses and commentaries where opinions, viewpoints and connections are discussed.

I can extract information, ideas and opinions from highly specialised texts in my own field, for example research reports.

I can understand long complex instructions, for example for the use of a new piece of equipment, even if these are not related to

my job or field of interest, provided I have enough time to reread them.

I can read any correspondence with occasional use of a dictionary.

164
I can read contemporary literary texts with ease.

I can go beyond the concrete plot of a narrative and grasp implicit meanings, ideas and connections.

I can recognise the social, political or historical background of a literary work.

Spoken Interaction 1 2

I can keep up with an animated conversation between native speakers.

I can use the language fluently, accurately and effectively on a wide range of general, professional or academic topics.

I can use language flexibly and effectively for social purposes, including emotional, allusive and joking usage.

I can express my ideas and opinions clearly and precisely, and can present and respond to complex lines of reasoning

convincingly.

Spoken Production 1 2

I can give clear, detailed descriptions of complex subjects.

I can orally summarise long, demanding texts.

I can give an extended description or account of something, integrating themes, developing particular points and concluding

appropriately.

I can give a clearly developed presentation on a subject in my fields of personal or professional interest, departing when

necessary from the prepared text and following up spontaneously points raised by members of the audience.

Strategies 1 2

I can use fluently a variety of appropriate expressions to preface my remarks in order to get the floor, or to gain time and keep

the floor while thinking.

I can relate own contribution skilfully to those of other speakers.

I can substitute an equivalent term for a word I can’t recall without distracting the listener.

Language Quality 1 2

I can express myself fluently and spontaneously, almost effortlessly. Only a conceptually difficult subject can hinder a natural,

smooth flow of language.

I can produce clear, smoothly-flowing, well-structured speech, showing control over ways of developing what I want to say in

order to link both my ideas and my expression of them into coherent text.

I have a good command of a broad vocabulary allowing gaps to be readily overcome with circumlocutions; I rarely have to

search obviously for expressions or compromise on saying exactly what I want to.

I can consistently maintain a high degree of grammatical accuracy; errors are rare and difficult to spot.

165
Writing 1 2

I can express myself in writing on a wide range of general or professional topics in a clear and user-friendly manner.

I can present a complex topic in a clear and well-structured way, highlighting the most important points, for example in a

composition or a report.

I can present points of view in a comment on a topic or an event, underlining the main ideas and supporting my reasoning with

detailed examples.

I can put together information from different sources and relate it in a coherent summary.

I can give a detailed description of experiences, feelings and events in a personal letter.

I can write formally correct letters, for example to complain or to take a stand in favour of or against something.

I can write texts, which show a high degree of grammatical correctness and vary my vocabulary and style according to the

addressee, the kind of text and the topic.

I can select a style appropriate to the reader in mind.

166
APPENDIX H: INTERVIEW QUESTIONS

(DEVELOPED FROM ANDRADE AND DU (2007)’S SELF-ASSESSMENT RUBRIC)

1. How many times have you taken the TOEFL test?

2. How well did you prepare for the TOEFL test?

3. During learning English, what was/were the most useful/accurate technique(s) that have helped
you identify your levels of proficiency?

4. In this study, you have obtained two different scores (self-rated and TOEFL), which one do you
think is more accurate in reporting your actual levels of proficiency?

5. Have you ever used a self-assessment technique before this study? If yes, how was it?

6. Tell me about your experience in using the self-assessment rubric in this study?

7. Would you like to use this assessment rubric (or a similar one) in classrooms? Why and why not?

8. Do you think you can assess your English skills using self-assessment tools? Why or why not?

9. When you used CEFR self-assessment rubric in this study, were you able to detect your areas of
strengths and weaknesses? How?

10. When you used the CEFR self-assessment rubric in this study, did you find it difficult to use?

11. When you used CEFR self-assessment rubric in this study, did you find any difficulty assessing
yourself in one particular area/skill? Would you explain it?

12. Please provide some advantages and disadvantages of using self-assessment techniques?

13. Finally, if you support using self-assessment techniques in classrooms, do think it should be used
alone? Or it should be supported with other assessment techniques? Why?

167
APPENDIX I

A One-time Free Proofreading of One of your CESL Papers

Dear CESL students:

My name is Adnan Alobaid, a doctoral candidate, in the Second Language Acquisition and Teaching

program at the University of Arizona, Tucson.

I am offering you a one-time free proofreading of one of your CESL papers (10 pages or less) if you

participate in this study. To participate in this study, you would have to:

1) Assess your level of English proficiency using an online CEFR rubric, which will take you

approximately 15-20 minutes.

2) After you assess yourself, you will, upon your permission, provide me with your TOEFL scores

once you obtain them (Note that you have the right to refuse to do so).

If you are in Level (3) or (4), please click on this link:

https://goo.gl/nEFoWY

If you are in Level (5) or (6), please click on this link:

https://goo.gl/0K153y
168
If you are in Level (7) or Bridge Program, please click on this link:

https://goo.gl/EietQm

Important Notes:

1. Your name, TOEFL scores, or any other private information will not appear in the study.

2. Do not forget to include your email so that we can arrange for proofreading.

3. Keep in mind that proofreading will be done for the first 50 participants ONLY. Therefore, it is highly

recommended that you respond to the survey as soon as possible.

If you have any question, please feel free to shoot me an email.

adnanalobaid@email.arizona.edu

Yours,

Adnan Alobaid.

169
APPENDIX J: COLOR-CODED CEFR, TOEFL, AND IELTS EQUIVALENCY TABLE

(TANNENBAUM & WYLIE, 2007)

Competency CEFR TOEFL IBT IELTS


N/A N/A 0–8 0 - 1.0
A1 9 – 18 1.0 - 1.5
Basic A1 19 – 29 2.0 - 2.5
A2
30 – 40 3.0 - 3.5
B1 (IELTS 3.5)
B1 41 – 52 4.0
Independent B1 (IELTS 4.5)
53 – 64 4.5 - 5.0
B2
B2 65 – 78 5.5 - 6.0
C1 79 – 95 6.5 - 7.0
Proficient
C2 96 – 120 7.5 - 9.0

170
APPENDIX K: PARTICIPANTS’ SELF-RATED SCORES

Name Gender Level of English TOEFL/IELTS CEFR Self-rated Score Overall %

Proficiency Score # of I can do % obtained

A. A. Female Intermediate 46 25 59% 25/33.6

H. A. Male Intermediate 36 23 54% 23/33.6

S. A. Female Intermediate 90 42 100% 42/33.6

S. M. Male Intermediate 39 36 85% 36/33.6

M. M. Male Intermediate 32 12 28% 12/33.6

A. A. Male Upper-intermediate 72 40 95% 40/33.6

H. A. Male Upper-intermediate 42 25 59% 25/33.6

A. K. Male Upper-intermediate 66 24 57% 24/33.6

M. A. Male Upper-intermediate 47 21 50% 21/33.6

D. J. Female Upper-intermediate 63 13 30% 13/33.6

C. C. Female Upper-intermediate 62 31 73% 31/33.6

Y. N. Male Upper-intermediate 57 28 66% 28/33.6

Y. C. Male Upper-intermediate 36 33 78% 33/33.6

M.A. Male Advanced 78 19 51% 19/29.6

A. A. Female Advanced 78 15 40% 15/29.6

N. M. Female Advanced 50 25 67% 25/29.6

A. A. Female Advanced 70 29 78% 29/29.6

A.M. Female Advanced 65 28 75% 28/29.6

O. I. Male Advanced 96 26 70% 26/29.6

S. N. Female Advanced 70 25 67% 25/29.6

J. M. Male Advanced 68 14 37% 14/29.6

171
APPENDIX L: CONSENT FORM

My name is Adnan Alobaid, a doctoral candidate, in the Second Language Acquisition and Teaching

program at the University of Arizona, Tucson. I am conducting a research study for using alternative

assessment approaches for proficiency purposes. To participate in this study, you would have to:

1) Assess your level of English proficiency using an online CEFR rubric, which will take you
approximately 15-20 minutes.
2) Upon your permission, provide me with your TOEFL scores once you obtain them (Note that you have
the right to refuse to do so.
3) And finally be engaged in a face-to-face or online interview at your discretion to allow you to expand
more on your experience in and attitudes towards using self-assessment rubrics.
All self-rated scores, interview transcriptions, and audio recordings will be stored on a password-
protected drive that will be kept in a locked departmental office at UA for six years past the date of
project completion (May. 2016). However, no one other than the primary investigator (me) will have
access to the data. In case we are unable to arrange for a face-to-face meeting, we can carry out the
interview over Skype. However, The interview will be audio recorded after obtaining your consents. In
case you choose not to be recorded, you can still participate in the interview part without being recorded
under any circumstances. Moreover, if you refuse to participate in the interviews, your participation in the
first round of data collection is still highly appreciated and adequate for the study.
Your participation is voluntary. You may refuse to participate in this study. You may also leave the study
at any time. No matter what decision you make, it will not affect your grades or future benefits.

Participation Consent
I have read and I am aware that I am being asked to participate in a research study. If you have any
questions about your rights are a research participant, please contact the Human Subjects Protection
Program at 520-626-6721 or at www.orcr.arizona.edu/hspp or you can contact me via: (818) 614-4522,
adnanalobaid@email.arizona.edu

By taking part in the survey, you are agreeing to have your responses used as part of research.

172
APPENDIX M: WEB-BASED STUDENT SURVEY

To what extent do you agree or disagree with the following statements:

Statements4

Strongly To some Strongly


Agree Disagree Comments
agree extent disagree

Practices

Course Objectives

1. Course objectives are communicated

to you at the beginning of the course

2. Course syllabus states course

objectives clearly.

3. Course requirements are consistent

with its objectives.

4. Course objectives are appropriate and

help achieve those of the Program.

5. You are fully aware of the course

requirements in advance.

6. Course objectives clearly determine

the materials that will be taught.

7. Course materials are appropriate and

help achieving course objectives.

8. Instructional activities and

4
The statements have been developed from the NCAAA and CEA standards.

173
assignments help you master the targeted

objectives.

9. Course goals and objectives are

measurable.

10. Course content and assignments are

aligned with course objectives.

Teaching Strategies

1. Teaching strategies are appropriate for

various learning styles.

2. The instructor is always well-prepared

3. The instructor provides you with

feedback to your work constantly.

4. The instructor always communicates

with you in English.

5. The instructor models activity tasks

for you.

6. The instructor promotes groupwork.

7. You often demonstrate your skills

(e.g. presentations, leading discussions).

8. The instructor monitors your

performance constantly.

9. The instructor allows you to assess

your own work.

10. You are engaged in research projects.

174
Student Learning Outcomes

1. Student learning outcomes are clearly

communicated to you at the beginning of

the course

2. Student learning outcomes are

consistent with course objectives.

3. Student learning outcomes meet your

expectations from the course.

4. Classroom materials, activities, and

procedures help achieve the intended

learning outcomes.

5. Learning outcomes are attainable.

6. Learning outcomes cover all language

skills.

7. Learning outcomes address your

competency rather than course content.

8. Student learning outcomes assessed

through direct assessment tools (e.g.

assignment, exam, term paper, and etc.)

9. Student learning outcomes assessed

through indirect assessment tools (e.g.

surveys, self-reported gains, exit

interviews and etc.)

10. Generally, student learning outcomes

175
have emphases on real-life language uses

Student Assessment

1. Assessment techniques are appropriate

for various learning styles.

2. Assessment techniques are clearly

communicated to you at the beginning of

the course

3. These assessment techniques are used

consistently throughout the Program

courses.

4. Assessment techniques measure things

on which you expect to be measured.

5. Exams questions are written clearly.

6. Your work is assessed fairly.

7. Homework assignments reflect course

objectives?

8. Grade distribution is appropriate to

course objectives.

9. Exams and assignments are graded

and returned to you promptly.

10. You are given an opportunity to

complain about your grades?

176
APPENDIX N: INTERVIEWS5

• Interviews with Students:

a. Curriculum:

1. To what extent does the curriculum fulfill your language needs? Please explain?

2. How well does the curriculum materials reflect real-life language uses and situations?

3. How well is the curriculum structured with regard to students’ level-to-level progression?

4. How well does the curriculum prepare you for your future career as stated in course goals?

b. Teaching Strategies:

1. How effective is your instructor in varying his/her teaching strategies?

2. How often are you engaged in activities such as leading discussions, presentations etc.?

3. What is the overall quality of teaching strategies used in this Program?

4. What suggestions do you have to improve the instructor’s performance?

c. Assistance and Support:

1. To what extent are you provided with professional counseling whenever needed?

2. To what extent are you provided with academic advising during your study?

3. How often do your instructors provide you with formative feedback?

4. How often do your instructors monitor your academic performance?

d. Course Assessment Methods:

1. How accurate are course assessment methods in assessing the course-learning objective?

2. Do assessment methods measure something in which you expect to be measured? Explain?

5
All interview questions have been developed based on the NCAAA and CEA standards.

177
3. To what extent does your instructor use a variety of different assessment methods?

4. Are you given the opportunities to demonstrate your English skills well? If yes, How?

5. Do the assessment results provide you with areas of strengths and weaknesses?

6. Are you provided with a proficiency scale that shows your levels of language proficiency?

• Interviews with Teaching Faculty:

a. Learning Outcomes

1. How are student intended learning outcomes defined? Please explain it in detail?

2. When defining intended learning outcomes, do you ensure that they are aligned with the Saudi

National Qualifications Framework?

4. Are intended learning outcomes communicated to students at the beginning of the course?

5. To what extent do student learning outcomes cover all language skills?

6. How often do you review and update student learning outcomes of your courses

b. Student Assessment

1. How are course assessment methods chosen for each course?

2. Are assessment procedures communicated to students at the beginning of the course?

3. How often do you integrate alternative assessment techniques into assessment processes?

4. How often do you provide students with informative feedback to their performance?

5. What mechanisms do you use to respond to students’ complaints about their grades?

6. Do you use direct and indirect assessment tools to assess student learning outcomes?

c. Teaching Strategies:

1. What mechanisms do you use to ensure that teaching strategies are appropriate for various

learning styles?

2. To what extent do you communicate with students in English?

178
3. What strategies do you use to facilitate learning activities for students (e.g. modeling,

scaffolding etc.)?

4. What types of visual means do you use in delivering the course?

5. How often do you engage students in leading discussions, presentations, and

collaborative/cooperative learning?

6. How is the effectiveness of teaching strategies evaluated?

d. Faculty Development

1. How often does the Program offer you general training programs (e.g. teaching skills)?

2. How often do you integrate technologies into your classroom teaching activities?

3. Do you maintain a portfolio that shows evidence of the course you teach?

4. Do you think the Program administrators provide you with adequate time and opportunity for

faculty development? Please explain?

5. Are you engaged in heavy teaching and/or administrative workloads? If yes, to what extent do

they affect your research activities?

6. Generally, what are some challenges you face in teaching the Program courses?

• Interviews with Program Administrators:

a. Program Mission:

1. Has the program mission been approved by the University Senior Administration?

2. Has the Program mission been developed after obtaining consultation of stakeholders?

3. Is the Program’s mission communicated to all stakeholders?

4. To what extent do you use the Program mission decision-making processes?

5. How often do you review and update the Program mission?

b. Faculty Recruitment and Development:

179
1. How do you undertake faculty recruitment processes?

2. What mechanisms do you use to ensure that prospective faculty members have appropriate

academic credentials and possess knowledge of the needs of students?

3. What strategies do you undertake to encourage teaching faculty members to develop strategies

for improvement of their own teaching?

4. How are faculty members encouraged to participate in professional development activities?

5. What training opportunities are given for teaching faculty members?

6. Are faculty members given the opportunities to attend and participate in conferences?

c. Program Development, Evaluation, and Review:

1. Do you have a plan for the Program development and evaluation? Please explain it?

2. What mechanisms do you use to evaluate the Program (e.g. surveys, employment data)?

3. How often do you do so for the Program courses, learning outcomes, and study plan?

4. Do you invite professional people, from fields relevant to the Program areas of specialization,

to evaluate the Program courses?

5. Does the program have KPIs that include learning outcomes? How are they determined?

6. In addition to regular program evaluations, do you conduct a comprehensive reassessment of

the program performance at least once every five years?

180
APPENDIX O: RESEARCHER’S EVALUATION CHECKLIST

Program__________________________________ Date of Evaluation_________

Standard 1: Mission Goals and Objectives

College Mission:

__________________________________________________________________________________

__________________________________________________________________________________

College Goals

College Objectives

181
Program Mission

__________________________________________________________________________________

__________________________________________________________________________________

Program Goals

Program Objectives

182
RESEARCHER’S EVALUATION CHECKLIST CONT.

Statements6 Yes No To Some Extent Comment

Program Mission

1. Does the program have a written

statement showing its mission and goals?

2. Is the program’s mission consistent

with those of the college and University?

3. Is the Program’s mission clearly

defined in a form that summarizes the

program’s ultimate goals?

4. Is the mission available for

prospective students and employees (e.g.

on the program website)?

5. Does the mission state the needs of the

community to which the Program is

affiliated?

6. Does the program’s mission place

emphasis on research?

Program Goals

1. Are the Program’s goals clearly and

coherently stated in measurable terms?

2. Are the goals related to the Program’s

6
The statements have been developed from the NCAAA and CEA standards.

183
mission?

3. Are the Program’s goals aligned with

the curriculum?

4. Do the goals specify clearly the

Program’s intended outcomes?

5. Are the Program’s goals achievable

within the level of available resources?

Program Objectives

1. Are the Program’s objectives very

specific?

2. Are the Program’s objectives logically

related?

3. Are the objectives consistent with the

Program’s goals?

4. Are the Program’s objectives aligned

with the curriculum?

5. Are the Program’s objectives

assessable?

Statements Yes No To Some Extent Comment

Curriculum

1. Is the curriculum consistent with the

program’s mission?

2. Is the curriculum structured in a

way that ensures a logical progression

184
from one level to the next?

3. Does the curriculum have consistent

objectives of courses and levels?

4. Is the curriculum appropriate for the

needs of students?

5. Are course objectives consistent

with those of the program?

6. Do course syllabi have detailed

information about course

requirements, attendance policies,

topics to be covered, and assessment

techniques?

7. Are courses appropriate for meeting

the program’s needs and purposes?

8. Are course goals and objectives

measurable

9. Are course materials appropriate for

achieving course objectives

10. Are course content and

assignments aligned with its

objectives?

Statements Yes No To Some Extent Comment

Teaching Strategies

1. Are teaching strategies included into

185
course specifications?

2. Are teaching strategies aligned with

the Program’s learning outcomes?

3. Are students engaged in research

activities?

4. Does the instructor communicate

with students in English?

5. Does the instructor provide students

with feedback to their work?

6. Are teaching strategies appropriate

for various learning styles?

7. Does the instructor models the tasks

for students?

8. Does the instructor promote

groupwork?

9. Does the instructor monitor

students’ performance?

10. Does the instructor use visual

representations of the lessons?

Statements Yes No To Some Extent Comment

Student Learning Outcomes

1. Are student learning outcomes

written?

2. Are student learning outcomes

186
measurable?

3. Are student learning outcomes

attainable?

3. Are student learning outcomes

visible (e.g. in textbooks)?

4. Do student learning outcomes

reflect various levels of students’

cognitive skills?

5. Do student learning outcomes

reflect various levels of students’

affective skills?

6. Do student learning outcomes

reflect various levels of students’

interpersonal skills and responsibility?

7. Do student learning outcomes

reflect various levels of students’

Communication, Information

Technology, Numerical skills?

8. Are student learning outcomes a

result of learning?

9. Are course goals, objectives, and

learning outcomes aligned with each

other?

Statements Yes No To Some Extent Comment

187
Assessment Methods

1. Are various types of assessment

methods used?

2. Do course specifications include

direct and indirect assessment tools?

3. Are course grades distributed

logically in course specifications?

4. Do assessment practices cover all

language skills?

5. Does the instructor allow students to

assess their own work?

6. Are students engaged in papers,

projects, and presentations tasks?

188
APPENDIX P: COLLEGE MISSION

The college aims at preparing the prospective teacher who is religiously committed; the one is

attached strongly to his homeland. This teacher should be a good model for his students in his

work and behavior. He should be at a high level of professionalism. Learning instinct and love

for the career should be part and parcel of him. He should be completely experienced and fully

aware of his role in facilitating learning approaches. He should be continuously developing in his

field of specialty as well as his teaching styles. He should have the traits of the strong leader who

has the ability to convince others and to prove that his opinion is correct. He should attain the

capacity of making a decision and of taking on responsibility. He should be able to plan well and

to put into consideration individual differences among students. He should be a good thinker

possessing all types of thinking skills. He should be able to develop these skills in his students.

He should have the traits of a social pioneer who has the ability to communicate effectively with

his society and to cooperate in solving its problems. He should be ever renewing in how to deal

with modern technology and how to function it properly in all instructional settings. He should

be a guide taking charge of directing the instructional process in fulfilling its targets and solving

its problems.

189
APPENDIX Q: MALE DEPARTMENT OBJECTIVES

Goals and Objectives Major Strategies Measurable Indicators


§ Providing the undergraduate and § Class lectures § Written/Oral feedback
graduate students with well-rounded § Class discussion for assignments
quality education and rigorous training § Assigning presentations § Rubric
in the English language, translation, § Assigning written works § Quizzes/ Exams
and literature and reading texts
§ Providing students with the necessary
linguistic, translation, interpreting, and
literary skills that can enrich their
intellectual, cultural, and artistic
visions
§ Developing the students' English § Class lectures § Written/Oral feedback
language, thinking, and research skills. § Class discussion for assignments

§ Preparing efficient graduates who can § Assigning presentations § Rubric


competently meet the needs of both § Assigning written works § Quizzes/ Exams
public and private sectors in the field and reading texts
of English language, linguistics,
translation, and literature.
§ Conducting community service and
continuing education programs
through the performance of various
forums, workshops, and training
sessions in the field of English
language, linguistics, translation, and
literature.
§ Emphasizing scholarly research and § Class lectures § Written/Oral feedback
community outreach and service, and § Class discussion for assignments
promoting intercultural understanding § Assigning presentations § Rubric
and exchange. § Assigning written works § Quizzes/ Exams
§ Taking an active part in conducting and reading texts
scholarly researches, and relating such
researches to the human and
developmental needs of the Kingdom
of Saudi.

190
REFERENCES

Adam, S. (2004). Using learning outcomes. In Report for United Kingdom Bologna Seminar (pp.

1-2).

Ahlgren Reddy, A., & Harper, M. (2013). Mathematics placement at the University of Illinois.

PRIMUS, 23(8), 683-702.

Alamri, M. (2011). Higher education in Saudi Arabia. Journal of Higher Education Theory and

Practice, 11(4), 88.

Al Asmari, A. (2013). Practices and prospects of learner autonomy: Teachers' perceptions.

English Language Teaching, 6(3), 1.

Alderson, J. C., & Scott, M. (1992). Insiders, outsiders and participatory evaluation. Evaluating

second language education, 25-58.

Alderson, J. C., Clapham, C., & Wall, D. (1995). Language test construction and evaluation.

Cambridge: Cambridge University Press.

Alderson, J. C., Figueras, N., Kuijper, H., Nold, G., Takala, S., & Tardieu, C. (2004). The

development of specifications for item development and classification within The

Common European Framework of Reference for Languages: Learning, Teaching,

Assessment: Reading and Listening: Final report of The Dutch CEF Construct Project.

Allen, M. J. (2004). Assessing academic programs. Boston: Anker Publishing.

Andrade, H., & Du, Y. (2007). Student responses to criteria-referenced self-assessment.

Assessment and Evaluation in Higher Education, 32, 159–181.

Andrade, H. & Valtcheva, A. (2009). Promoting learning and achievement through self-

assessment. Theory into Practice, 48(1), 12-19.

Al Murshidi, G. (2014). Participation challenges of Emirati and Saudi students at US

191
universities. International Journal of Research Studies in Language Learning, 3(5).

Alqahtani, A. (2014). Evaluation of King Abdullah Scholarship Program. Journal of Education

and Practice, 5(15), 33-41.

American Psychiatric Association. (2000). Diagnostic and statistical manual of mental

disorders (4th ed.,Text Revision). Washington, DC: Author.

Bachman, L. F., & Palmer, A. S. (1989). The construct validation of self-ratings of

communicative language ability. Language Testing, 6(1), 14-29.

Bachman, L. (1990). Fundamental considerations in language testing. Oxford: Oxford

University Press.

Bachman, L. F., Davidson, F., & Ryan, K. (1995). An investigation into the comparability of

two tests of English as a foreign language (Vol. 1). Cambridge University Press.

Bachman, L. & Palmer, A. (1996). Language testing in practice. New York: Oxford

University Press.

Bachman, L. F. (2000). Learner-directed assessment in ESL. In G. Ekbatani & H. Pierson

(Eds.), Learner-directed assessment in ESL (pp. ix-xii). New Jersey: Lawrence Erlbaum

Associates, Inc.

Bachman, L. F. (2005). Building and supporting a case for test use. Language Assessment

Quarterly: An International Journal, 2(1), 1-34.

Bailey, K. M. (2009). 37 Issues in Language Teacher Evaluation. The handbook of language

teaching, 706.

Banegas, M. R. (2013). Placement of English as a Second Language (ESL) Student Learners in

Higher Education (Doctoral dissertation). Retrieved from ProQuest Dissertations

Publishing, http://search.proquest.com/docview/1369866651.

192
Barker, R. (2003). The social work dictionary. Washington, DC: NASW Press.

Benson, P. (1995) 'A critical view of learner training'. Learning: JALT Learner

Development N-SIG Forum, 2:2, 2-6.

Benson (2012) in Burns, A., & Richards, J. C. (Eds.). (2012). The Cambridge guide to

pedagogy and practice in second language teaching. New York: Cambridge University

Press.

Beretta, A. (1986). Program‐Fair Language Teaching Evaluation. TESOL Quarterly, 20(3), 431-

444.

Black, P., & D. Wiliam. (2009). Developing the theory of formative assessment. Educational

Assessment, Evaluation and Accountability 21.1: 5–31.

Blanche, P., & Merino, B. J. (1989). Self‐Assessment of Foreign‐Language Skills:

Implications for Teachers and Researchers. Language Learning, 39(3), 313-338.

Bloom, B. S., Madaus, G. F., & Hastings, J. T. (1981). Evaluation to improve learning. New

York: McGraw-Hill.

Bock, R. D. (1997). A brief history of item theory response. Educational Measurement: Issues

and Practice, 16(4), 21-33.

Boud, D. (1991). Implementing student self-assessment. Campbelltown: The Higher Education

Research and Development Society of Australasia (HERDSA).

Bourke, B. (2014). Positionality: Reflecting on the research process. The Qualitative Report,

19(33), 1-9.

Brantmeier, C. (2006). Advanced L2 learners and reading placement: Self-assessment, computer-

based testing, and subsequent performance. System, 34(1), 15-35.

Brindley, G. (1998). Outcomes-based assessment and reporting in language learning

193
programs: A review of the issues. Language Testing, 15(1), 45-85.

Brown, D. (1989). Language program evaluation: A synthesis of existing possibilities. In R. K.

Johnson (ed.) The second language curriculum. Cambridge, UK: Cambridge University

Press, 222-241.

Brown, J. D. (1989). Improving ESL placement test using two perspectives. TESOL

Quarterly, 23(1), 65–83.

Brown, D. (1995). Language Program Evaluation: Decisions, Problems and Solutions. Annual

Review of Applied Linguistics, 15(1), 227-248.

Brown, J. D. (1996). Testing in language programs. New Jersey: Prentice Hall Regents.

Brown, D. (1997). Computers in language testing: present research and some future

directions. Language Learning and Technology, 1(1), 44–59.

Brown, J. D., & Hudson, T. (2002). Criterion-referenced language testing. UK: Cambridge

University Press.

Brown, D. (2007). Teaching by principles: An integrated approach to language pedagogy.

White Plains, NY: Pearson Education.

Brown, D. (2010). Language Assessment: Principle and Classroom Practices. New

York: Pearson Education.

Butler, Y. G., & Lee, J. (2010). The effects of self-assessment among young learners of

English. Language Testing, 27(1), 5-31.

Byrnes, H. (1991). Issues in Foreign Language Program Articulation. Critical Issues in

Foreign Language Instruction. Boston: Heinle & Heinle, pp. 6–28.

Cargan, L. (2007). Doing social research. Maryland: Rowman & Littlefield Publishers.

Carmines, E. G., & Zeller, R. A. (1979). Reliability and validity assessment. Beverly Hills, CA:

194
Sage publications.

CESL (2014). About Center for English a Second Language. Retrieved March 17th, 2016,

from http://www.cesl.arizona.edu/content/about-cesl

Chapelle, C. A. (1999). Validity in language assessment. Annual Review of Applied Linguistics,

19, 254-272.

Chapelle, C. (2001). Computer applications in second language acquisition. Cambridge, UK:

Cambridge University Press.

Chapelle, C. A., Jamieson, J., & Hegelheimer, V. (2003). Validation of a web-based ESL test.

Language Testing, 20(4), 409-439.

Chapelle, C. A., Chapelle, C. A., Enright, M. K., & Jamieson, J. M. (Eds.). (2011). Building a

validity argument for the Test of English as a Foreign LanguageTM. New York:

Routledge.

Clapham, C. (2000). Assessment and testing. Annual Review of Applied Linguistics, 20, 147-

161.

Conway, T., Mackay, S., & Yorke, D. (1994). Strategic planning in higher education: Who are

the customers. International journal of educational management, 8(6), 29-36.

Coppel, D. B. (2011). Use of neuropsychological evaluations. Physical medicine and

rehabilitation clinics of North America, 22(4), 653-664.

Council for Higher Education Accreditation (CHEA). (2002), The Fundamentals of

Accreditation, CHEA, Washington DC, USA.

Council of Europe (1996b). Modern languages: Learning, teaching, assessment. A Common

European Framework of Reference. Draft 2 of a Framework proposal. Strasbourg:

Council of Europe.

195
Council of Europe (2001). Common European Framework of Reference for Languages:

learning, teaching, assessment. Cambridge: Cambridge University Press.

Council of Europe (2015). Education and Languages, Language Policy. Retrieved March

23rd, 2016, from http://www.coe.int/t/dg4/linguistic/cadre1_en.asp

Cronbach, L. (1963). Course improvements through evaluation. The Teachers College Record,

64(8), 672-672.

Cumming, A. (2009). Language assessment in education: Tests, curricula, and teaching. Annual

Review of Applied Linguistics, 29, 90-100.

Cumming, A. (2008). Assessing oral and literate abilities. In E. Shohamy Encyclopedia of

language and education (pp. 2146-2163). US: Springer.

Darandari, E. Z., Al‐Qahtani, S. A., Allen, I. D., Al‐Yafi, W. A., Al‐Sudairi, A. A., &

Catapang, J. (2009). The quality assurance system for post‐secondary education in Saudi

Arabia: A comprehensive, developmental and unified approach. Quality in Higher

Education, 15(1), 39-50.

Dornyei, Z. (2001). Teaching and researching motivation. London: Pearson Education Ltd.

Development Society of Australasia (HERDSA).

Douglas, D. (2014). Understanding language testing. London: Hodder-Arnold.

Eaton, J. S. (2006). An Overview of US Accreditation. Washington DC. Council for Higher

Education Accreditation.

Eckes, T., & Grotjahn, R. (2006). A closer look at the construct validity of C-tests. Language

Testing, 23(3), 290-325.

Educational Testing Service. (2010). Linking TOEFL iBT scores to IELTS scores: A

196
research report. Princeton, NJ: ETS. Retrieved February 21st, 2016, from

https://www.ets.org/s/toefl/pdf/linking_toefl_ibt_scores_to_ielts_scores.pdf

Edwards, J. R. (2003). Construct validation in organizational behavior research. Chapel Hill:

University of North Carolina.

Ekbatani, G., & Pierson, H. D. (Eds.). (2000). Learner-directed assessment in ESL. Mahwah, NJ:

Erlbaum.

Ekbatani, G. (2000). Moving toward learner-directed assessment. In G. Ekbatani & H. Pierson

(Eds.), Learner-directed assessment in ESL. (pp. 1-11). Mahwah, NJ: Erlbaum.

Ellis, R. (1993). Quality Assurance for University Teaching. Buckingham: Open University

Press.

Elton, L. 1999. Facilitating change through self-assessment. Paper presented at the national

conference teaching and learning for the new millennium: Facilitating change through

self-assessment, September 9-10, in University of Bristol, Bristol, UK.

Fox, J. D. (2009). Moderating top-down policy impact and supporting EAP curricular renewal:

Exploring the potential of diagnostic assessment. Journal of English for Academic

Purposes, 8(1), 26-42.

Fulcher, G. (1997). An English language placement test: issues in reliability and validity.

Language Testing, 14(2), 113-139.

Fulcher, G. (1999). Computerizing an English language placement test. English Language

Teaching Journal 53, 4, 289 - 299.

Fulcher, G., & Davidson, F. (2007). Language testing and assessment. London, NY:

Routledge.

Fulcher, G. (2013). Practical language testing. London: Hodder Education.

197
Garrison, D. R. (1997). Self-directed learning: Toward a comprehensive model. Adult

Education Quarterly, 48(1), 18-33.

Gass, S. M., & Varonis, E. M. (1994). Input, interaction, and second language production.

Studies in second language acquisition, 16(03), 283-302.

Gitlin, A., & Smyth, J. (1989). Teacher evaluation: Educational alternatives. Lewes: Falmer

Press.

Gredler, M. E. (1996). Program evaluation. NJ: Prentice Hall.

Green, A. Weir, J. (2004). Can placement tests inform instructional decisions? Language

Testing, 22(4), 467-494.

Green, C. (2005). Integrating extensive reading in the task-based curriculum. ELT Journal,

59(4), 306-311.

Halliday, J. (1994). Quality in education: Meaning and prospects. Educational Philosophy and

Theory, 26(2), 33-50.

Harden, R. M. (2002). Learning outcomes and instructional objectives: Is there a difference?

Medical teacher, 24(2), 151-155.

Harris, M. (1997). Self-assessment of language learning in formal settings. ELT journal,

51(1), 12-20.

Holland, P. W., & Dorans, N. J. (2006). Linking and equating. Educational measurement, 4,

187-220.

Holec, H. (1981). Autonomy and foreign language learning. Oxford: Pergamon Press.

Hughes, A., Weir C. and Porter, D. (1995). The Global Placement Test. Reading: Centre for

Applied Language Studies, University of Reading.

Hughes, A. (2003). Testing for language teachers, (2nd edition). Cambridge: Cambridge

198
University Press.

Hughes, K. L., & Scott-Clayton, J. (2011). Assessing developmental assessment in

community colleges. Community College Review, 39(4), 327-351.

Huhta, A., Luoma, S., Oscarson, M., Sajavaara, K., & Teasdale, A. (2002). DIALANG, A

diagnostic language assessment system for adult learners. Common European

framework of reference for languages: Learning, teaching, assessment. Case studies.

Institute of International Education (2014). Intensive English Programs: Leading Places of

Origin. Retrieved March 16th, 2016, from http://www.iie.org/Research-and-

Publications/Open-Doors/Data/Intensive-English-Programs/Leading-Places-of-

Origin/2013-14

Jang, E. E. (2008). A Framework for Cognitive Diagnostic Assessment. In C. A. Chapelle, Y.-R.

Chung, & J. Xu (Eds.), Towards Adaptive CALL: Natural Language Processing for

Diagnostic Language Assessment (pp. 117-131). Amer, IA: Iowa Scaner University.

Javid, C. Z., Al-thubaiti, T. S., & Uthman, A. (2013). Effects of English language proficiency

on the choice of language learning strategies by Saudi English-major undergraduates.

English Language Teaching, 6(1), 35.

Johnson, D. (2015). Saudi students and IEP teachers: converging and diverging perspectives

(Doctoral dissertation, University of Illinois at Urbana-Champaign).

Jones, N. (2002). Relating the ALTE Framework to the Common European Framework of

Reference, In: Council of Europe (Eds). Case Studies on the use of the Common

European Framework of Reference (pp. 167-183). Cambridge: Cambridge University

Press.

Kahn, A. B., Butler, F. A., Weigle, S. C., & Sato, E. Y. (1994). Adult ESL Placement Procedures

199
in California: A Summary of Survey Results. Adult ESL Assessment Project.

Kane, M. T. (2006). Validation. Educational measurement, 4(2), 17-64.

Kaplan, R. B. (2010). Whence applied linguistics: The twentieth century. Oxford handbook of

applied linguistics, 3-33. Kiely, R., & Rea-Dickens, P. (2005). Program evaluation in

language education. Palgrave: Macmillan.

Kiely, R. (2001). Classroom evaluation-values, interests and teacher development. Language

teaching research, 5(3), 241-261.

Kirsch, I., Jamieson, J., Taylor, C., & Eignor, D. (1998). Computer familiarity among TOEFL

examinees. ETS Research Report Series, 1998(1), 1-23.

Kumaravadivelu, B. (2006). TESOL Methods: Changing Tracks, Challenging Trends.

TESOL Quarterly, 40(1), 59-81.

Kunnan, A. (2000). Fairness and justice for all. Fairness and validation in language

assessment (pp. 1-13). Cambridge, MA: Cambridge University Press.

Kunnan, A. J. (2008). Large-scale language assessments. In E. Shohamy Encyclopedia of

language and education (pp. 2275-2295). US: Springer.

Kunnan, A. J. (1998). Approaches to validation in language assessment. In A. J. Kunnan (Ed.),

Validation in language assessment (pp. 1–16). Mahwah, NJ: Lawrence Erlbaum

Associates.

Lantolf, J. P., & Poehner, M. E. (2008). Dynamic assessment. In Encyclopedia of language and

education (pp. 2406-2417). Springer US.

Larsen-Freeman, D. (1986). Techniques and principles in language teaching. Oxford: Oxford

University Press.

Lee, Y. J., & Greene, J. (2007). The Predictive Validity of an ESL Placement Test A Mixed

200
Methods Approach. Journal of Mixed Methods Research, 1(4), 366-389.

Lewis, J. (1990) 'Self-assessment in the classroom: a case study'. In G. Brindley (ed.) The

Second Language Curriculum in Action. Sydney: National Centre for English Language

Teaching and Research.

Little, D., Lazenby Simpson, B., & O’Connor, F. (2002). Meeting the English language needs

of refugees in Ireland. Common European Framework of Reference for Languages:

Learning, Teaching, Assessment. Case Studies, Strasbourg: Council of Europe, 53-67.

Little, D. (2006). The Common European Framework of Reference for Languages: Content,

purpose, origin, reception and impact. Language Teaching, 39(03), 167-190.

Long, M. H. (1981). Input, interaction, and second‐language acquisition. Annals of the New

York Academy of Sciences, 379(1), 259-278.

Lynch, B. K. (1996). Language program evaluation: Theory and practice. Cambridge:

Cambridge University Press.

MacKay, R. (1998). Program evaluation and quality control. TESL Canada Journal, 5(2), 33-42.

McDonald, B., & Boud, D. (2003). The impact of self-assessment on achievement: the effects

of self-assessment training on performance in external examinations. Assessment in

Education: Principles, Policy & Practice, 10(2), 209-220.

McNamara, M., & Deane, D. (1995). Self-assessment activities toward autonomy in language

learning. TESOL Journal, 5, 18-23.

McNamara, T. F. (1996). Measuring second language performance. New York: Longman.

McNamara, F. & Roever, C. (2006). Language Testing: The Social Dimension. Oxford:

Blackwell.

Menges, R. J., & Weimer, M. (1996). Teaching on Solid Ground: Using Scholarship to Improve

201
Practice. San Francisco: Jossey-Bass Inc.

Mercado, L., A. (2012). Guarantor of quality assurance. In M. A. Christison and F. L. Stoller

(Eds.) A handbook for language program administrators (2nd Edition), (pp. 117-136).

Miami, FL: Alta Book Center Publishers.

Messick, S. (1989). Validity. In Linn, R. L. (ed.), Educational Measurement. New York:

American Council on Education/Macmillan, 13–103.

Messick, S. (1995). Validity of psychological assessment: validation of inferences from persons'

responses and performances as scientific inquiry into score meaning. American

psychologist, 50(9), 741.

Messick, S. (1996). Validity and washback in language testing. Language Testing. 13, 241-

256.

Ministry of Economy and Planning (2015). The Saudi Ninth Development Plan. Retrieved

March 14th, 2016, from

http://services.mep.gov.sa/themes/GoldenCarpet/index.jsp#1457986308681

Mitchell, C. B. & Vidal, K. E. (2001). Weighing the Ways of the Flow: Twentieth Century

Instruction. Modern Language Journal, 85(1), 26-38.

MOHE. (2015). Saudi Higher Education Institutions. Retrieved March 8th, 2016, from

http://www.moe.gov.sa/ar/Pages/default.aspx

MOHE. (2014). The History of Study Abroad Scholarships in the Kingdom. Retrieved March

15th, 2016, from https://www.mohe.gov.sa/en/Ministry/General-administration-for-

Public-relations/BooksList/book2eng.pdf

Mohr, L. B. (1995). Impact analysis for program evaluation. Thousand Oaks, CA: Sage

Publications, Inc.

202
Morphew, C. C., & Hartley, M. (2006). Mission statements: A thematic analysis of rhetoric

across institutional type. The Journal of Higher Education, 77(3), 456-471.

NCAAA. (2013). Standards for Quality Assurance and Accreditation of Higher Education

Institutions. Retrieved February 14, 2015, from

http://www.ncaaa.org.sa/en/Releases/Pages/Handbooks.aspx

NCAAA. (2014). Mission, Vision & Values. Retrieved March 08, 2016, from

http://www.ncaaa.org.sa/en/AboutUs/Pages/Vision.aspx

Norris, J.M., Davis, J.M., Sinicrope, C. & Watanabe, Y. (Eds.). (2009) Toward Useful Program

Evaluation in College Foreign Language Education. Honolulu, HI: National Foreign

Language Resource Center. (TUPE) (Case studies)

Norris, J. M. (2009). Understanding and improving language education through program

evaluation: Introduction to the special issue. Language Teaching Research, 13(1), 7-13.

Norris, J. M. (2016). Language Program Evaluation. The Modern Language Journal, 100(S1),

169-189.

North, B. (2007). The CEFR illustrative descriptor scales. The Modern Language Journal,

91(4), 656-659.

Nunan, D. (1988). The learner-centered curriculum: A study in second language teaching.

Cambridge, England: Cambridge University Press.

Nunan, D. (1989). Designing tasks for the communicative classroom. Cambridge, England:

Cambridge University Press.

Oscarson, M. (1989). Self-assessment of language proficiency: Rationale and applications.

Language Testing,6(1), 1-13.

Oscarson M (1997). Self-assessment of foreign and second language proficiency. In C.

203
Clapham and D. Corson (Eds), Encyclopedia of language and education, Volume 7:

Language testing and assessment (pp. 175–187). Dordrecht, Netherlands: Kluwer

Academic.

Pennington, M. C., and Hoekje, B. (2010). Language program leadership in a changing

world: An ecological model. London: Emerald.

Posavac, E. (2015). Program evaluation: Methods and case studies. New York: Pearson Prentice

Hall.

Rea-Dickins, P. (1994). Evaluation and English language teaching. Language Teaching, 27(02),

71-91.

Reed, D. J., & Stansfield, C. W. (2004). Using the Modern Language Aptitude Test to Identify a

Foreign Language Learning Disability: Is it Ethical? Language Assessment Quarterly,

1(2-3), 161-176.

Reid, J. M. (1995). Learning Styles in the ESL/EFL Classroom. Florence, KY: Heinle & Heinle.

Roever, C. (2001). Web-based language testing. Language Learning & Technology, 5(2), 84-94.

Ross, S. (1998). Self-assessment in second language testing: A meta-analysis and analysis of

experiential factors. Language testing, 15(1), 1-20.

Royse, D., Thyer, B., & Padgett, D. (2009). Program evaluation: An introduction. Belmont, CA:

Brooks-Cole.

Saba, M. S. (2014). Writing in a New Environment: Saudi ESL Students Learning Academic

Writing. (Doctoral dissertation). Retrieved from

https://vtechworks.lib.vt.edu/bitstream/handle/10919/54012/Saba_MS_D_2014.pdf?sequen

ce=1&isAllowed=y

SACM (2013). Overview on the ESL Department. Retrieved March 15th, 2016, from

204
http://www.sacm-usa.gov.sa/Departments/ESL/about.aspx

SACM (2013). Recommended ESL Schools in the U.S. Retrieved March 18th, 2016, from

http://esllist.sacm.org/

Sallis, E. (2002). Total quality management in education. London: Kogan Page.

Scriven, M. (1967). The Methodology of Evaluation. Washington, DC: American Educational

Research Association.

Scriven, M. (1991). Evaluation thesaurus. Newbury Park, CA: Sage.

Sekely, A. (2014). Assessing the Validity of the Trauma Symptom Inventory on Military Patients

with PTSD (Doctoral dissertation). Retrieved from http://militarytbi.org/wp-

content/uploads/2014/09/Thesis05.04.2014.pdf

Shawer, S. F. (2013). Accreditation and standards-driven program evaluation: implications for

program quality assurance and stakeholder professional development. Quality &

Quantity, 47(5), 2883-2913.

Shohamy, E., Gordon, C., & Kraemer, R. (1992). The effect of raters’ background and training

on the reliability of direct writing tests. The Modern Language Journal, 76 (1), 27-33.

Smith, L., & Abouammoh, A. (2013). Higher Education in Saudi Arabia. The Netherlands:

Springer.

Spolsky, B. (2008). Language assessment in historical and future perspective. In Encyclopedia of

language and education (pp. 2570-2579). US: Springer.

Tannenbaum, R. J., & Wylie, E. C. (2008). Linking English-Language Test Scores Onto the

Common European Framework of Reference (CEFR): An Application of Standard-

Setting Methodology. ETS Research Report Series, 2008(1), i-75.

Taras, M. (2001). The use of tutor feedback and student self-assessment in summative

205
assessment tasks: Toward transparency for students and for tutors. Assessment and

Evaluation in Higher Education, 26, 605-614.

Taras, M. (2008). Issues of power and equity in two models of self-assessment. Teaching in

Higher Education, 13(1), 81-92.

Taras, M. (2010). Student self-assessment: processes and consequences. Teaching in Higher

Education, 15(2), 199-209.

Taylor, C. Albasri, W. (2014). The Impact of Saudi Arabia King Abdullah’s Scholarship

Program in the US. Open Journal of Social Sciences, 2(10), 109.

TESOL (2010). Position Statement on the Acquisition of Academic Proficiency in English at the

Postsecondary Level. Retrieved from http://www.tesol.org/docs/pdf/13489.pdf?sfvrsn=0

March 1st 2016.

Van Damme, D. (2004). Standards and Indicators in Institutional and Program Accreditation in

Higher Education: A Conceptual Framework and a Proposal. Studies on Higher

Education, 127-159.

van Teijlingen, E., & Hundley, V. (2001). The importance of pilot studies. Social research

update, (35), 1-4.

Wall, D., & Alderson, J. C. (1993). Examining washback: the Sri Lankan impact study.

Language Testing, 10(1), 41-69.

Wall, D., Clapham, C., & Alderson, J. C. (1994). Evaluating a placement test. Language

Testing, 11(3), 321-344.

Wesche, M. B. (1983). Communicative Testing in a Second Language*. The Modern Language

Journal, 67(1), 41-55.

Weigle, S. C. (1994b). Effects of training on raters of ESL compositions. Language Testing,

206
11(2), 197-223.

Weir, C. J. (2005). Language testing and validation. England: Palgrave Macmillan.

Weir, C. J. (2005). Limitations of the Common European Framework for developing

comparable examinations and tests. Language Testing, 22(3), 281-300.

Wenden, A. L. (1995). Learner training in context: A knowledge-based approach. System,

23(2), 183-194.

Westerheijden, D. F., Stensaker, B., & Rosa, M. J. (2007). Quality assurance in higher

education: Trends in regulation, translation and transformation (Vol. 20). Dordrecht:

Springer Science & Business Media.

Wiggins, G. (1993). Assessment: Authenticity, context, and validity. Phi Delta Kappan,

75(3), 200-08.

Wolochuk, A. (2009). Adult English learners' self-assessment of second language

proficiency: Contexts and conditions (Doctoral dissertation). Retrieved from ProQuest

Dissertations & Theses - Gradworks http://gateway.proquest.com/openurl?url_ver=Z39.88-

2004&res_dat=xri:pqdiss&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&rft_dat=xri:pqdi

ss:3346271

207

You might also like