Group 1 & 2

Saturday
October 16, 2021
Principle of
Language
Assessment Group 1
Dian Savitri
Akhofullah
Muhamad Argi Afriandi
Why "Principles of Language
Assessment" are important?
Because they serve effective

useful
a guideline to
ensure that a test
is... appropriate plausible
1 Practicality
•A test that is easy to design, easy to administer and easy
to score can be defined as “Practical”.
•Practicality based on Brown (2004:19) includes:
Cost
Time
Administration
Scoring / Evaluation
Cost
Should not be too expensive to conduct
The cost has to stay within the budget
-> How do you think if a student should pay

Rp10.000,- for having a daily test?
It is practical or not?

Time
Should stay within appropriate time constraints
Don’t be too short or long
-> What do you think if a teacher conduct a test

that takes a student five hours to complete?
Administration
Relatively easy to administer
Should not conduct a complex test.
-> What do you think if a teacher conduct an

online test, but the students don’t have any
gadget.
It is practical or not

Scoring / Evaluation
Has a scoring/evaluation procedure that is
specific and time-efficient.
Should fit into the time allocation.
-> What do you think if a student complete the

test in a few minute, but it needs an hour to
score?
2 reliability
The test is administered to the same student on
different occasion then it produces the same
result.
A reliable test is consistent and dependable.
Factors that may contribute
to the unreliability of a test
Student- Test
Rater Test
Related Admnistration
Reliability Reliability
Reliability Reliability
Student-Related
Reliability
Temporary illness
Fatigue
A "bad" day
Anxiety
Test taker's "test wiseness
Strategies doing the test
Rater Reliability
It may be affected by:

human error, subjectivity
and bias
Two categories of rater-
reliability:
1. Inter-rater reliability
2. Intra-rater reliability
Inter-Rater
Reliability Two or more scores yield
inconsistent score of the
same test.
Possibly for lack of
attention to scoring
criteria, inexperience and
preconceive biases.
Intra-Rater
A common occurance for
Reliability
classroom teacher.
Possibly for lack of
unclear scoring criteria,
fatigue and bias toward
particular "good" and
"bad" students and
simple carelessness.
Test Administration
Reliability
The condition in which
the test is administered.
Example:
Bad photocopy sheets ->
unclear test -> bad result
Test Reliability
Nature of the test can
cause measurement
error. Such as:
The item of the test is not
clear -> create
ambiguity.
It takes too long or too
short.
1 Validity
Validity is"the extent to which inferences made from assessment
results are appropriate, meaningful, and useful in terms of the
purpose of the assessment (Gronlund, 1998, p. 226).
E.g, To measure writing ability.
Writing many words in 15 minutes (it would not constitute a valid
test)
some consideration of comprehensibility, rhetorical discourse
elements, and the organization of ideas, among other factors.
How is the validity of a 1 Content-Related Evidence
test established?

2 Criterion-Related Evidence
3 Construct-Related Evidence
Consequential Validity
4
5 Face Validity
1
Content-Related Evidence

You can usually identify content-related evidence

observationally if you can clearly define the achievement that
you are measuring. (Mousavi, 2002; Hughes, 2003)
E.g, assessing a person's ability to speak a second language

in a conversational setting.
-To answer multiple-choice questions requiring grammatical
judgments
- A test that requires student to speak within some sort of
authentic context does.
1
Criterion-Related Evidence

The extent to which the "criterion" of the test has actually

been reached.
Concurrent
A test has concurrent validity if its results are supported by
other concurrent performance beyond the assessment itself.
E.g, the validity of a high score on the final exam of a
foreign language course will be
substantiated/strengthened by actual proficiency in the
language.
1
Criterion-Related Evidence

Predictive validity
The assessment criterion in such cases is not to measure
concurrent ability but to assess (and predict) a test-taker's
likelihood of future success.
1 Construct-Related Evidence
A construct is any theory, hypothesis, or model that attempts

to explain observed phenomena in our universe of
perceptions.
E.g, Conducting an oral interview.
Severel factors in the final score are: pronunciation, fluency,
grammatical accuracy, vocabulary use, and sociolinguistic
appropriateness.
1 Consequential Validity
Consequential validity refers to the positive or negative

social consequences of a particular test.
E.g,, the consequential validity of standardized tests include

many positive attributes, including: improved student
learning and motivation and ensuring that all students have
access to equal classroom content. It is also have several
negative consequences as well. They include inappropriate
use of the tests to re-allocate state funds, and teaching
students to pass tests (instead of actually understanding the
material).
1 Face Validity

"Face validity refers to the degree to which a test looks right,

and appears to measure the knowledge or abilities it claims
to measure, based on the subjective judgment of the
examines who take it" (Mousavi, 2002, p. 244).
1 AUTHENTICITY

Bachman and Palmer (1996, p. 23) define authenticity as "the

degree of correspondence of the characteristics of a given
language test task to the features of a target language task."
In a test, authenticity may be present in the following ways:

• Natural as possible.
• Items are contextualized rather than isolated.
• Some thematic organization to items is provided, such as
through a story line or episode.
• Tasks represent, or closely approximate, real-world tasks.
Washback
Influence of testing on
teaching and learning
(high-stakes test)
Positive Negative
Impact Impact
Discrepancy of the goals in
Match between what is curriculum and the focus
taught and what is tested of testing
Positive Washback
Students use
language
authentically and
communicatively
Authentic Interactive
2 Negative Washback
Teaching for the test
Language accurancy rather than

actual language
2
Applying Principles to the
Evaluation
Are the test procedures practically? Is the test reliable?
Does the procedure demonstrate Is the procedure face valid and

content validity? "bias for the best"?
Are the test tasks as authentic as Does the test offer beneficial
possible? washback to the learner?
Are the test procedures
practically?
1 Uses consistent test of criteria for a correct
response
Give attention to those sets throughout

2 evaluation time
Is the test
reliable? 3 Read the test twice to check consistency
Apply the same standars of correct

4 response/"mid stream" modification
5 Avoid fatigue
Does the procedure demonstrate
content validity?
Are classroom objectives Are lesson objectives

identified and represented in the form of
appropriately framed? test specifications?
Is the procedure face valid and "bias
for the best"?
- directions are clear
- the structure of the test is organized logically
- its difficulty level is appropriate pitched
- the test has no "suprises", and
- timing is appropriate
Are the test tasks as
authentic as possible?
Is the language of the test is Are items are contextualized as

natural? posiible rather than isolated?
Are the topics and situations

Is some thematic organization
interesting, enjoyable, and/or
provided?
humorous?
Do task represent "the real word

tasks?"
Does the test Demonstrate relevance to the curriculum and
offer beneficial stage for washback
washback to
the learner?
2 References
Washback in Language Testing Research Contexts and
Methods by Liying Cheng, Yoshinori Watanabe with Andy
Curtis
Language Assessment Principle and Classroom Practices

by H. Douglas Brown
Thank you!

Group 1 &amp; 2

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Group 1 &amp; 2

Uploaded by

Copyright:

Available Formats

Saturday

October 16, 2021

Because they serve effective

-> How do you think if a student should pay

-> What do you think if a teacher conduct a test

-> What do you think if a teacher conduct an

-> What do you think if a student complete the

It may be affected by:

You can usually identify content-related evidence

E.g, assessing a person's ability to speak a second language

The extent to which the "criterion" of the test has actually

A construct is any theory, hypothesis, or model that attempts

Consequential validity refers to the positive or negative

E.g,, the consequential validity of standardized tests include

"Face validity refers to the degree to which a test looks right,

Bachman and Palmer (1996, p. 23) define authenticity as "the

In a test, authenticity may be present in the following ways:

Teaching for the test

Language accurancy rather than

Are the test procedures practically? Is the test reliable?

Does the procedure demonstrate Is the procedure face valid and

Give attention to those sets throughout

Apply the same standars of correct

Are classroom objectives Are lesson objectives

Is the language of the test is Are items are contextualized as

Are the topics and situations

Do task represent "the real word

Language Assessment Principle and Classroom Practices

You might also like

Group 1 & 2

Group 1 & 2