Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 31

Language Testing

Dr. Muftah S. Lataiwish


Review of Chapter One
Purposes and Methods of
Language testing
Main points
• The chapter illustrates the following:
• Teacher made versus standardized tests
• The Principal educational uses of
language tests.
• Language testing techniques.
• The language skills and their components.
• Contrastive analysis and language testing.
• Chapter One
• Activity # 1
1- Discuss the objectives of both aptitude
and the proficiency test.
2- Illustrate the shortcomings of translation
as a language testing technique.
3- Composition and the scored interview are
two testing language techniques . Explain.
4- Language exists in two forms, the spoken
and the written. Discuss in Detail.
Chapter Two
Characteristics of a Good Test

• A Good test possesses three qualities:

• 1- Validity
• 2- Reliability
• 3- Practicality
• Any test that we use must be appropriate
and applicable to our objectives.
• Dependable in the evidence it provides.
• Applicable to our particular situation.
without any one of these three qualities
a test would be a poor test.
Reliability
• 1- The meaning of reliability
• 2- Types of estimates of reliability.
• 3- Estimating the reliability of speeded
tests
• 4- The question of satisfactory reliability.
• 5- The standard error of measurement.
• 1- The meaning of reliability.
• Reliability is the stability of test scores. A
test cannot measure anything well unless
it measures consistently.
• To have confidence in a measuring
instrument, we would need to be assured
that approximately the same results would
be obtained.
• If we tested a group on Tuesday instead of
Monday.
• If we gave two parallel forms of the test to
the same group on Monday and on
Tuesday.
• If we scored a particular test on Tuesday
instead of Monday.
• If two or more competent scorers scored
the test independently.
• Two types of consistency or reliability

• Reliability of the test itself.


• Reliability of the scoring of the test.
• Test Reliability:
• It is affected by a number of factors:
• A- The adequacy of sampling of tasks, the
more of students’ performance we take,
the more reliable will be our assessment of
their knowledge and ability.
• B- The conditions under which the test is
administered tend to fluctuate from
administration to administration.
• C- Poor student motivation will also lower
the reliability of a test.
• Sometimes the lack of proper motivation
can be attributed to the weaknesses in the
test or the testing procedure.

• Scorer or Rater Reliability:


• It concerns the consistency with which
test performance are evaluated.
• Would one scorer give the same score
repeatedly for the same test performance?

• Would two or more scorers assign equivalent


scores for the same performance?
• It is nearly perfect in multiple –choice tests.

• Low in the case of free-response tests like


compositions where an individual judgments
must be involved.
• Types of Estimates of Reliability
• 1- Test retest. If the results of the two
administrations were highly correlated then the
test has temporal stability. This method is
limited.
• A- time interval short leads to memory factor.
Overestimation of the test.
• B- time interval is long ,the examinees
proficiency may have gone undergone genuine
change. The test could be underestimated.
• 2- Alternative or parallel forms. Different
versions of the same test .equivalent in
length, difficulty, time limit, and format.

• 3- Giving a single administration of one


form of the test dividing the items into two
halves, obtaining two scores for each
individual.
• 4- Rational equivalence. Reliability here is
estimated from a single administration of
one form. We are concerned with inter-
item consistency as determined by the
proportion of persons who pass and who
don’t pass each item.
• Speeded Tests.
• The items of the test are easy but the time
limit is short. Neither the split-half nor the
rational equivalence should be used with
speed tests.
• Test-retest or parallel forms are the
methods best adapted to measure speed
test reliability.
• Satisfactory Reliability
• Quotient of 1.00 indicates a perfect or reliable
test.
• Standard test to make individual diagnoses
would have at least 0.90
• Homemade tests would run somewhat lower
in the 0.70s or 0.80s.
• Reliability can be increased by lengthening
the test additional material must be similar in
quality and difficulty to the original.
• The Standard Error of Measurement
• An obtained score on any test consists of
the “True” score plus a certain amount of
test error.
• A student may score 60 on an English
entrance test and 55 when retested with
an equivalent form of the test. Five points
decrease is probably not statistically
important.
• In short, reliability refers simply to the
precision with which the test measures. No
matter how high the reliability quotient, it is
by no means a guarantee that test
measures what the test user wants to
measure.
Validity
• What precisely does the test measure?
• How well does the test measure?
• A test must be based on a sound analysis
of the skill or skill we wish to measure.
• There must be sufficient evidence that test
scores correlate fairly highly with actual
ability the skills area being tested. Then
we assume that the test is valid for our
purposes.
• Types of Validity:

• 1- Content Validity
• 2- Empirical Validity
• 3- Face Validity
• Content Validity:
• If a test designed to measure mastery of a
specific skill or the content of a particular
course of study , we should expect the test
to be based upon a careful analysis of the
skill or an outline of the course.
• In choosing a test, we cannot simply
accept the title which the authors have
given it, for titles very often are misleading.
• We should expect the test makers to be
able to provide us with information about
the specific materials or skills being tested,
and the basis for their selection.

• Empirical validity:
• The best way to check on the actual
effectiveness of a test is to determine how
test scores are related to some
independent, outside criterion such as
marks given at the end of a course ratings.
• If there is a high correlation between test
scores and a trustworthy external criterion,
we are justified in putting our confidence in
the empirical validity of the test.
• Two kinds of empirical validity:
• Predictive
• Concurrent.
if we use a test of English as a second
language to screen university
applicants and then correlate test
scores with grades made at the end of
the first semester, we are attempting to
determine the predictive validity of the
test.
If we follow up the test immediately by
having an English teacher rate each
student’s English proficiency on the
basis of his class performance during
the first week and correlate the two
measures we are seeking to establish
the concurrent validity of the test.
Empirical Validity depends on the
reliability of the test and the criterion
measure.
• Face Validity
• /we simply mean the way the test looks to
the examinees.
• Its importance should not be
underestimated.
• Content must be relevant and appropriate.
• Test makers must always keep face
validity in mind.
• Practicality

• Refers to
• 1- Economy.
• 2- Ease of administration and scoring.
• 3- Ease of interpretation.
• Economy
• It refers to the cost per copy. Whether the
test books are reusable.
• Number of scorers and administrators
needed.
• Time allowed for administration and
scoring.
• Ease of administration and scoring.
• Full test directions provided.
• Test requirements ( mechanical devices,
rooms available. number of examinees)
• Scoring the test subjectively or objectively
• Ease of Interpretation
• If a standard test is being adopted
• Examine the date of publication, check if
there is an up to date test manual for
information about both reliability and
validity. If the test items are appropriate.

You might also like