Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

CHARACTERISTIC OF A GOOD TEST : RELIABILITY

Submitted to fulfill the assignment of Language Learning Assessment

PAPER
Arranged
By group 3 :

Reza Mantofani 1930104055


Helmalia Putri 2130104028
Nur’aini Fadhma 2130104047

Lecturer:
Dr. Nina Suzanne, M. Pd

ENGLISH TEACHING DEPARTMENT


TARBIYAH DAN TEACHER TRAINING FACULTY
STATE ISLAMIC UNIVERSITY
MAHMUD YUNUS BATUSANGKAR
2024
FOREWORD

Bismillahirrahmanirrohhim Alhamdulillah, all praise and gratitude for the


presence of Allah SWT, because of His grace, taufik and guidance, this English for
Special Purposes paper can be completed well. May prayers and greetings continue to
be poured out on the Prophet Muhammad SAW, his family, friends and all those who
always follow his sunnah. On this occasion we would like to thank the lecturer in the
Language Learning Assessment course, Dr. Nina Suzanne, M.Pd who has provided
guidance and direction to us. We really hope that this paper can be useful in order to
increase our insight and knowledge. And we would like to thank all parties who have
shared their knowledge so that we can complete this paper. Finally, as a good work
on Language Learning Assessment, it certainly requires a gap to perfect the material
for the future, so we humbly accept input from readers for the sake of improvement
and perfection in this paper and study.

Batusangkar, 21 March 2024

Authors

ii
CONTENS

FOREWORD ..................................................................................................................... ii
CONTENS ......................................................................................................................... iii
CHAPTER I ....................................................................................................................... 4
INTRODUCTION.............................................................................................................. 4
A. Background of the Paper ........................................................................................ 4
B. Problem Formulation ............................................................................................. 5
C. Purpose of the Paper .............................................................................................. 5
CHAPTER II...................................................................................................................... 6
BODY ................................................................................................................................. 6
A. The Concept of Reliability ...................................................................................... 6
B. Methods of Checking Test Reliability .................................................................... 8
C. Factors Affecting Reliability ................................................................................ 10
D. Practicality ............................................................................................................ 11
BIBLIOGRAPHY ............................................................................................................ 14

iii
CHAPTER I
INTRODUCTION
A. Background of the Paper
In the realm of education, assessment plays a pivotal role in
monitoring students' progress and the effectiveness of learning
programs. Within the domain of language learning, assessment holds
undeniable significance in evaluating students' abilities in speaking,
listening, reading, and writing. However, to ensure accurate and
reliable assessment outcomes, serious consideration must be given to
the reliability of the assessment instruments employed.
Reliability is the measure of the extent to which an assessment
instrument is consistent and dependable in measuring what it is
intended to measure. In the context of language learning, reliability is
crucial as language skills are inherently complex and subject to
various influences. For instance, the reliability of a language test can
be affected by inter-rater consistency, testing conditions, and students'
psychological factors such as test anxiety.
Despite its critical importance, a comprehensive understanding
of reliability is often lacking in the language education literature.
Therefore, there is an urgent need to provide a solid foundation for
comprehending reliability in the context of language learning
assessment.
Against this backdrop, this paper aims to provide a
comprehensive overview of the concept of reliability in language
learning assessment. Through a review of relevant literature, this
paper will explore various methods of assessing reliability commonly
used, factors influencing the reliability of assessment outcomes, and
recent challenges and developments in enhancing reliability in
language learning assessment. Thus, it is hoped that this paper will

4
make a valuable contribution to improving practices in language
assessment for more effective and accurate language learning
evaluation.

B. Problem Formulation
1. What is the Concept of Reliability
2. What are the Methods of Checking Test Reliability
3. What are Factors Affecting Reliability
4. How to Practicality of the Reability

C. Purpose of the Paper


1. To know the concept of Reliability
2. To know the Methods of Checking Test Reliability
3. To know the Faktors Affecting Reability
4. To know how to practicality of the Reability

5
CHAPTER II
BODY
A. The Concept of Reliability
Reliability is a necessary characteristic of any good test. Reliability
is of primary importance in the use of both public achievement and
proficiency test and classrooms test. Reliability refers to the accuracy ,
consistency, and stability of measurement of such consistency or stability
calls for correlating at least two measurement. Reliability is the next
important characteristic of assessment results after validity. Testers should
be certain that the assessment has a high degree of validity and must also
be reliable.
Reliability refers to its consistency. A reliable instrument is one
which is consistent enough that subsequent measurements give
approximately the same numerical status to the thing or person being
measured. If a reliable test is given two or three times to the same group,
each person in the group should get approximately the same score on all
tests.
The Standard for Educational and Psychological Testing in
Athanasou and Lamprianou (2002:175) define reliability as “the degree
to which test scores for a group of test takers are consistent over
repeated applications of a measurement procedure and hence are
inferred to be dependable, and repeatable for an individual test taker.”
Reliability relates to questions about the stability and consistency of
results for a group. The questions that need to be answers are:
a. Would the ranking of students’ results on an assessment be
similar when it is repeated?
b. Do two versions of an assessment produce the same results?
c. Can I increase the reliability of results by lengthening a test?
d. Are all the responses to the items or tasks homogeneous and

6
consistent with each other?
In a sense, reliability is part of validity because a test with high
validity should measure quality with consistency and accuracy. It would
be possible to have a very reliable test with very little validity for the
purposes for which it was designed. For example, an algebra test might
be reliable but not valid if it were used to measure achievement in
English grammar. Reliability or stability of results is necessary but not
sufficient to ensure validity. Like validity, reliability refers to results,
not necessarily to a specific assessment. In other words, we cannot say
that a particular test is reliable,\but we can say that a set of results is
reliable. Unlike validity,which is based on evidence or inference,
reliability is largely a statistical approach and is reported primarily as a
correlation coefficient.Unfortunately there is never a margin of error in
educational evaluation. The biggest source of error or unreliability is
humans (testers). This is because people arenaturally inconsistent in
their responses, responses and reactions tosituations.The accuracy or
consistency of measurement depends onthree types of errors:
1. Instrument-related errorsFor example: if someone wanted to
measurethe dimensions of a room and could choose a
measuring tape or a 50\foot steel tape, he might make a
mistake in choosing an instrument\in choosing a measuring
device.
2. Errors in the use of the instrumentFor example: in the above
case, even if henot choosing steel tape, he may still have
aslight error in measuring the space.if heruns out.
3. Errors due to topic responseFor example: errors due to topic
responseshave many causes, such as poor motivation,lack of
interest, inappropriate testing environment, poor\knowledge
and illness.

7
B. Methods of Checking Test Reliability
There are numbers of ways in estimating reliability of a test. Lado
mentioned, there are three widely used practice in estimating the reability of
test1. First, from simplest technique of test-retest method, alternated or
equivalent methods, and split-halves methods.
1. Test-retest or Stability
Test-retest is a method of estimating reliability requires the test
is tested twice to the same students with a length of time. The result
of the two administrations of the same test will be calculated by mean
of the correlation coefficient. On the other hand, this method has
certain disadvantage. The facts of testing the same test to the same
students with too short an interval of time may cause student‟s
memories of the previous test. If the time interval between the two
test is too long as mean to overcome the „memory factor‟ the
students‟ proficiency may have change as the factor as interviewing
learning, growth, and has lower self correlation in learning. Thus,
reliability scores could be underestimated.
2. Alternate forms or Equivalent Method
Robert Lado in his book, alternate-form method is the method
that uses two different forms of tests which are equivalent to same
students and correlate the scores. The equivalent test must be in the
same of length, difficulty, time limits and content, it must be similar
but not identical.
As any other method, the alternate method has disadvantage, the
most issue problem is the teachers take a great deal of effort to
develop one good test, let alone two. The test must be homogeny and
measure the same indicators.

1
Robert Lado, Language Testing : The Constructing and Use of Foreign Language Tests. (New York:
McGraw Hill Book Company, 1961),p.332.

8
3. Split-halves Method
This technique is come up because of the difficult of the first
two techniques, test-retest and alternate form. The technique of this
method is by giving a single test to the individuals then separate the
scores for each half then computing the correlation coefficient
between the two halves of scores obtained. Robert Linn assumed that
reliability can be estimated from a single test form of students on one
occasion, known as a split-halves method. The procedure of this
method is by splitting a single test into two equivalents halves, an
odd number and even number. Through this procedure every student
receives a score of each half of the test; each half will be calculated
to know the reliability coefficient by correlating the scores. Since this
reliability holds only for half of the test then the reliability of the
whole must be obtained, the formula for estimating the reliability of
the whole test from two halves is use the Spearman-Brown formula.
Lado points, in his book, “From a theoretical point of view we are
more interested in knowing the “real” reliability of the test.” 2The
advantages of Split-Halve technique, it may eliminate such variables
that may affect some students interest to repeat a test, the situation of
the second test administration that held may be different, the
changing of the students‟ performance, etc. For this purpose, the
Split-Halve technique is conducted to get a reliability coefficient
more exactly depend on the test itself.
The reliability level of a test is influenced by several factors:
1. Number of items in the test: More items generally result in higher
reliability.
2. Variation in item difficulty: Reliability tends to increase when item
difficulty is consistent.
3. Spread of scores: Limited score distribution can decrease reliability.

2
Robert Lado, op.cit.,p.333

9
4. Difficulty level of items: Items with a difficulty level around 0.5 can
maximize item variance and therefore increase test reliability.
Teacher-made assessments typically have reliability levels ranging
between 0.60 and 0.85, which is generally considered satisfactory for
instructional decisions. However, lower reliability levels may be acceptable in
certain situations, such as when decisions are minor, reversible, or can be
confirmed by other information, or when decisions apply to a class rather than
individual students, or when the effects of decisions are temporary.

C. Factors Affecting Reliability


Factors impacting reliability can be controlled and fall into several
categories:
1. Characteristics of the Test Itself: This involves considerations
such as the length of the test. Tests need to strike a balance
between being concise enough to be practical and
comprehensive enough to yield reliable results. For instance,
excessively short tests like daily 10-item true/false quizzes are
deemed unreliable for measurement purposes.
2. Characteristics of the Testee/Test-taker: This encompasses
factors like the variability within the group being tested.
Greater reliability is observed in tests administered to diverse
groups. Conversely, administering tests to highly homogeneous
groups may yield minimal score variations, potentially
affecting reliability. Additionally, the difficulty level of the test
can influence score variability and, consequently, reliability.
3. Tester/Test Maker (Scoring Process): This pertains to the
objectivity of the scoring process. Even if a test is inherently
reliable, scoring errors or subjectivity can diminish its
reliability. Tests with more objective scoring mechanisms tend
to exhibit greater reliability.

10
4. Test Administration: This includes factors related to the actual
administration of the test. Testing conditions, such as lighting,
ventilation, and temperature, can impact reliability by
introducing human error, particularly in suboptimal
environments.

D. Practicality
Practicality refers to its usability. It is the third desirable quality of
tests. A test should be applicable to our particular situation. For a test to
have a high degree of usability, it should:
1. be easy to administer
2. be easy to score
3. be economical to use, both in terms of teacher time and of
materials required
4. have good format
5. have meaningful norms.
Ensure the usability of a teacher-made test by:
1. Typing and duplicating the test for each student.
2. Providing clear directions for each test section.
3. Designing the test to fit the class period's time limits.
4. Creating a scoring system that is easy to use.
5. Planning the test efficiently to save time on construction,
duplication, and scoring.
6. Establishing performance norms based on test results.
In the preparation of a new test or the adoption of an existing one,
we must keep in our mind a number of very practical considerations.
1. Economy
A test should be economy in cost and time. In writing or
selecting a test, we should certainly pay some attention to how
long the administering and scoring of it will take. This point is of

11
particular importance when the test must be administered in the
classroom and scored by the classroom teacher.
2. Ease of administration and scoring
Other considerations of test usability involve the ease with
which the test can be administered. Scoring procedures, too, can
have a significant effect on the practicality of a given instrument.
3. Ease of interpretation
If a standard test is being adopted, it is important that we
examine and take into account the data which the publisher
provides. If we plan to use the test over a long period of time, we
shall almost certainly wish to develop local norms of our own.

12
CHAPTER III
CONCLUSION

In the realm of education, particularly in language learning assessment,


reliability stands as a cornerstone for ensuring the accuracy and consistency
of assessment outcomes. This paper has delved into the intricate concept of
reliability, exploring its significance, methods of assessment, factors
influencing it, and practical considerations.
Reliability, as defined in educational and psychological testing
standards, emphasizes the consistency and stability of measurement over
repeated applications. It ensures that assessment results maintain
dependability and repeatability, crucial for making informed decisions about
students' abilities and progress. Various methods, such as test-retest, alternate
forms, and split-halves, have been discussed for checking test reliability. Each
method offers its advantages and challenges, with considerations for factors
like the number of items, variation in item difficulty, and the scoring process.
Factors affecting reliability have been categorized into characteristics of the
test itself, characteristics of the testee, the tester, and test administration.
Understanding and controlling these factors are essential for maintaining
reliability and ensuring the validity of assessment results.
Moreover, the paper has underscored the importance of practicality
alongside reliability. A reliable test must also be practical, easy to administer,
score, and interpret, while being economical in terms of time and resources.
In conclusion, this paper has provided a comprehensive overview of
reliability in language learning assessment, shedding light on its critical role
in ensuring the effectiveness and accuracy of assessment practices. By
understanding the concept, methods, influencing factors, and practical
considerations of reliability, educators can enhance the quality of language
assessment, ultimately contributing to more informed decision-making and
improved learning outcomes.

13
BIBLIOGRAPHY

Athanasou James A and Iasonas Lamprianou. 2002. A Teacher’s Guide to


Assessment. Sydney: Social Science Press.
Brown, H. Douglas. 2004. Language Assessment: Principles and Classroom
Practices. Longman: Pearson Education, Inc.
Hendriani, S., & Suzanne, N. (2013). Language testing. In Language Testing.
https://doi.org/10.4324/9781315784717-2
Robert Lado, Language Testing : The Constructing and Use of Foreign Language Tests.
(New York: McGraw Hill Book Company, 1961),p.332.

14

You might also like