Download as pdf or txt
Download as pdf or txt
You are on page 1of 38

Saturday

October 16, 2021

Principle of
Language
Assessment Group 1
Dian Savitri
Akhofullah
Muhamad Argi Afriandi
Why "Principles of Language
Assessment" are important?

Because they serve effective


useful
a guideline to
ensure that a test
is... appropriate plausible
1 Practicality
•A test that is easy to design, easy to administer and easy
to score can be defined as “Practical”.
•Practicality based on Brown (2004:19) includes:
Cost
Time
Administration
Scoring / Evaluation
Cost
Should not be too expensive to conduct
The cost has to stay within the budget

-> How do you think if a student should pay


Rp10.000,- for having a daily test?
It is practical or not?

Time
Should stay within appropriate time constraints
Don’t be too short or long

-> What do you think if a teacher conduct a test


that takes a student five hours to complete?
It is practical or not?
Administration
Relatively easy to administer
Should not conduct a complex test.

-> What do you think if a teacher conduct an


online test, but the students don’t have any
gadget.
It is practical or not

Scoring / Evaluation
Has a scoring/evaluation procedure that is
specific and time-efficient.
Should fit into the time allocation.

-> What do you think if a student complete the


test in a few minute, but it needs an hour to
score?
It is practical or not?
2 reliability
The test is administered to the same student on
different occasion then it produces the same
result.
A reliable test is consistent and dependable.
Factors that may contribute
to the unreliability of a test

Student- Test
Rater Test
Related Admnistration
Reliability Reliability
Reliability Reliability
Student-Related
Reliability
Temporary illness
Fatigue
A "bad" day
Anxiety
Test taker's "test wiseness
Strategies doing the test
Rater Reliability

It may be affected by:


human error, subjectivity
and bias
Two categories of rater-
reliability:
1. Inter-rater reliability
2. Intra-rater reliability
Inter-Rater
Reliability Two or more scores yield
inconsistent score of the
same test.
Possibly for lack of
attention to scoring
criteria, inexperience and
preconceive biases.
Intra-Rater
A common occurance for
Reliability
classroom teacher.
Possibly for lack of
unclear scoring criteria,
fatigue and bias toward
particular "good" and
"bad" students and
simple carelessness.
Test Administration
Reliability
The condition in which
the test is administered.
Example:
Bad photocopy sheets ->
unclear test -> bad result
Test Reliability
Nature of the test can
cause measurement
error. Such as:
The item of the test is not
clear -> create
ambiguity.
It takes too long or too
short.
1 Validity
Validity is"the extent to which inferences made from assessment
results are appropriate, meaningful, and useful in terms of the
purpose of the assessment (Gronlund, 1998, p. 226).
E.g, To measure writing ability.
Writing many words in 15 minutes (it would not constitute a valid
test)
some consideration of comprehensibility, rhetorical discourse
elements, and the organization of ideas, among other factors.
How is the validity of a 1 Content-Related Evidence

test established?

2 Criterion-Related Evidence

3 Construct-Related Evidence

Consequential Validity
4

5 Face Validity
1
Content-Related Evidence

You can usually identify content-related evidence


observationally if you can clearly define the achievement that
you are measuring. (Mousavi, 2002; Hughes, 2003)

E.g, assessing a person's ability to speak a second language


in a conversational setting.
-To answer multiple-choice questions requiring grammatical
judgments
- A test that requires student to speak within some sort of
authentic context does.
1
Criterion-Related Evidence

The extent to which the "criterion" of the test has actually


been reached.
Concurrent
A test has concurrent validity if its results are supported by
other concurrent performance beyond the assessment itself.
E.g, the validity of a high score on the final exam of a
foreign language course will be
substantiated/strengthened by actual proficiency in the
language.
1
Criterion-Related Evidence

Predictive validity
The assessment criterion in such cases is not to measure
concurrent ability but to assess (and predict) a test-taker's
likelihood of future success.
1 Construct-Related Evidence

A construct is any theory, hypothesis, or model that attempts


to explain observed phenomena in our universe of
perceptions.
E.g, Conducting an oral interview.
Severel factors in the final score are: pronunciation, fluency,
grammatical accuracy, vocabulary use, and sociolinguistic
appropriateness.
1 Consequential Validity

Consequential validity refers to the positive or negative


social consequences of a particular test.

E.g,, the consequential validity of standardized tests include


many positive attributes, including: improved student
learning and motivation and ensuring that all students have
access to equal classroom content. It is also have several
negative consequences as well. They include inappropriate
use of the tests to re-allocate state funds, and teaching
students to pass tests (instead of actually understanding the
material).
1 Face Validity

"Face validity refers to the degree to which a test looks right,


and appears to measure the knowledge or abilities it claims
to measure, based on the subjective judgment of the
examines who take it" (Mousavi, 2002, p. 244).
1 AUTHENTICITY

Bachman and Palmer (1996, p. 23) define authenticity as "the


degree of correspondence of the characteristics of a given
language test task to the features of a target language task."

In a test, authenticity may be present in the following ways:


• Natural as possible.
• Items are contextualized rather than isolated.
• Some thematic organization to items is provided, such as
through a story line or episode.
• Tasks represent, or closely approximate, real-world tasks.
Washback

Influence of testing on
teaching and learning
(high-stakes test)
Positive Negative
Impact Impact
Discrepancy of the goals in
Match between what is curriculum and the focus
taught and what is tested of testing
Positive Washback
Students use
language
authentically and
communicatively

Authentic Interactive
2 Negative Washback

Teaching for the test

Language accurancy rather than


actual language
2
Applying Principles to the
Evaluation

Are the test procedures practically? Is the test reliable?

Does the procedure demonstrate Is the procedure face valid and


content validity? "bias for the best"?

Are the test tasks as authentic as Does the test offer beneficial
possible? washback to the learner?
Are the test procedures
practically?
1 Uses consistent test of criteria for a correct
response

Give attention to those sets throughout


2 evaluation time

Is the test
reliable? 3 Read the test twice to check consistency

Apply the same standars of correct


4 response/"mid stream" modification

5 Avoid fatigue
Does the procedure demonstrate
content validity?

Are classroom objectives Are lesson objectives


identified and represented in the form of
appropriately framed? test specifications?
Is the procedure face valid and "bias
for the best"?
- directions are clear
- the structure of the test is organized logically
- its difficulty level is appropriate pitched
- the test has no "suprises", and
- timing is appropriate
Are the test tasks as
authentic as possible?

Is the language of the test is Are items are contextualized as


natural? posiible rather than isolated?

Are the topics and situations


Is some thematic organization
interesting, enjoyable, and/or
provided?
humorous?

Do task represent "the real word


tasks?"
Does the test Demonstrate relevance to the curriculum and
offer beneficial stage for washback

washback to
the learner?
2 References
Washback in Language Testing Research Contexts and
Methods by Liying Cheng, Yoshinori Watanabe with Andy
Curtis

Language Assessment Principle and Classroom Practices


by H. Douglas Brown
Thank you!

You might also like