An Analysis of English Summative Test by Englishteachers

A. Background of The Study

In teaching learning proses, evaluation has an important role. Every

teacher need evaluation to measure and evaluate their teaching learning

activity. By evaluation the teacher can know about are the students

understanding about the material that given accepted or not. According to

Gronlund, Evaluation is a systematic process of determining the extent to

which instructional objectives are achieved by pupil.1 The result of the

evaluation is used for judging the students’ progress and achievement.One of

ways to collecting data for evaluation is by using Test. The test that is usually

used by teachers in order to know how far students have mastered the lessons is

the achievement test.2

By a test, the teacher can know how far the students understanding about

the lessons in teaching learning activity. A test is defined as a systematic

procedure for observing and describing one or more characteristics of a person

with the aid of either a numerical scale or category system.3 It means the

teacher can measure the student’s ability and knowledge with number of task

and question.

Groundlund, Norman E. Measurement and Evaluation in Teaching. USA: Mc. Page. 22.
Arthur, Hughes. Testing for Language Teachers Second Edition. Cambridge: Cambridge
University Press.1989. Page. 13.
Nitko, Anthony. Educational Test and Measurement An Introduction. New York:
Harcourt Brace Jovanovic, Inc. 1983. Page. 6.

There are four types of Test based on education implementation function,

the first is formative test, the second is summative test, the third is pretest, and

the last is posttest.4 In the achievement tests are only formative and summative

test. According to Brown,achievement test has two types, namely final

achievement test and progress achievement test.5 Progress achievement test

known as the formative tests and final achievement tests which is usually

known as summative test.

Formative test is all the tests that are given during the learning process that

are still ongoing, so that students and teachers get information (feedback) about

the progress that has been achieved. While Summative test is a test that usually

administered at the end of the course.6 Based on explanations above, the

researcher focuses only on Summative test. Moreover, summative test is given

periodically to determine at a particular point in time what students know and

do not know about the material that given by the teachers.Summative test is

typically come at the end of a course or unit instruction.

By conducting summative tests the teacher can see the achievement of the

students' abilities during the teaching and learning process that has been carried

out.Summative tests are very important for teachers and students, because

almost every school uses 40% -60% of the results of the summative test are the

main reference for the teacher to determine which students can go up to the

Djiwandono, Soenardji. Tes Bahasa Pegangan bagi Pengajar Bahasa. Jakarta: PT indeks.
2011. Page. 90.
Arthur, Hughes. Testing for Language Teachers Second Edition. Cambridge: Cambridge
University Press. 1989. Page. 15.
Djiwandono, Soenardji. Tes Bahasa Pegangan bagi Pengajar Bahasa. Jakarta: PT indeks.
2011. Page. 93.

next class and which students cannot go up to the next class.For parents of the

students, the results of sumatif tests are very useful for them to know the

development of their child's learning process.Therefore it is important to make

summative test questions that have good quality, in accordance with the

applicable curriculum and syllabus.

To construct a good test that is fair is not easy to do. A teacherneeds to

work hard. Sets out several stages of testconstruction which consist of

determining test objectives, drawing uptest specifications, devising test tasks,

scoring, grading, and givingfeedback.7 To produce a better one, a teacher must

follow the availablesyllabus and deal with many references related to the rules

on how thetest items should be made. As a consequence, the teacher is not

allowedto make a test based on his own desires without referring to

thesyllabus.The summative test to be learned is taken from Integrated Islamic

Junior High School (SMPIT) Khairunnas Bengkulu Cityfor first semester in

academic year 2018 - 2019. Based on the results of interviews with SMPIT

Khairunnas teachers, the summative test questions there were made by

themselves by considering the syllabus and curriculum that their school

currently uses.

Bese on preliminary observation, Integrated Islamic Junior High School

(SMPIT) Khairunnas, is a school under the auspices of the Khairunnas

Foundation Bengkulu which was established on February 17, 2012. The school

was designed as a school model that combines Intellectual, Spiritual,

Arthur, Hughes. Testing for Language Teachers Second Edition. Cambridge: Cambridge
University Press. 1989. Page. 16.

Emotional, Life Skills based on the Curriculum KEMENDIKNAS,

KEMENAG, and the Khairunnas Foundation curriculum which will be

expected to produce strong generations who are ready to face the challenges of

globalization and reach happiness in the world and the hereafter.SMPIT

Khairunnas Bengkulu City has a good reputation and popularity as a new

private school because because it has several advantages and quality assurance,

including: literacy and memorizing Al-Qur’an, even SMPIT Khirunnas is the

only school in Bengkulu Province that has Japanese language subjects, so

Quality assurance (guaranteed quality) Bengkulu SMPIT Khairunnas dare to

compete. Beside that, the SMPIT Khairunnas students are very good at English

lessons. One of the competition that they won is English debate. But the

teachers who teach there are still buried young graduates who still don't have

much experience in teaching professionally.

Based on that information, I was interested in knowing how these young

teachers made summative test questions. Is there a standard of good quality for

making good questions or not. Based on my preliminary observations, the

analysis of summative test questions at SMPIT Khairunnas never been done,

both for administrative purposes and to find out the quality of the summative

test. Not yet analyzing on choosing items that are valid, reliable and have

different power and level of difficulty so that it can be used as a standardized


In this research, the researcher chooses to analyze the English summative

test for eight grader students at Integrated Islamic Junior High School (SMPIT)

Khairunnas Bengkulu. The researcher chooses eight class because the

researcher want to get the accurately data from Integrated Islamic Junior High

School (SMPIT) Khairunnas Bengkulu, so the researcher choose the center

grade of the student that is eight class. Moreover the researcher analysis

English summative test, it will be more comfort when the researcher chooses

eight grade. If the researcher chooses seven grade, the students were still

influenced by the atmosphere of elementary school. And if the researcher

chooses nine grade, the students and teacher will focus on the graduation exam.

From that reason, the researcher chooses eight class as the subject on this


Based on the above description, the research are interested in examining

the issue to the thesis titled “AN ANALYSIS OF ENGLISH SUMMATIVE

TEST BY ENGLISH TEACHERS” (A Content Analysis At Eighth Grade

Students At Integrated Islamic Junior High School (SMPIT) Khairunnas

Bengkulu City In The Academic Year 2019/2020)

B. Identification of the Problem

Based on the above background, several problems can be identified as


1. The teacher has not done a thorough item analysis.

2. The quality of the questions about the end of the odd semester of English

subjects in the 2019/2020 school year is unknown.

3. The teacher has not conducted an overall item analysis in terms of validity,

reliability, level of difficulty, discriminating power and the distractors for


multiple choice questions by using an application or program used to

analyze the items.

C. Objective of The Research

To describe the quality of summative test items related to the validity,

reability, level of difficulty, discriminating power and the distractors of

summative test items for eighth grade students at Integrated Islamic Junior

High School (SMPIT) Khairunnas Bengkulu City.

D. Limitation of the Study

In order for research more focused and not widespread from the

discussion in question, in this thesis the research limit it on the scope of

research as follows:

1). The research focused only on the of English summative test for eighth grade

students of Integrated Islamic Junior High School (SMPIT) Khairunnas

Bengkulu City.

2). The research only focused to analysis the quality of summative test items

related on the validity, reability, level of difficulty, discriminating power

and the distractors.

E. Research Question

Based on pravious beckground, some problems need to be answered from

this research as follows:


1. How is the quality of summative test items for eighth grade students at

Integrated Islamic Junior High School (SMPIT) Khairunnas Bengkulu City

related to the validity, reability, level of difficulty, discriminating power

and the distractors?

F. Benefits of the research

The results of this study are expected to provide the following benefits:

1. Theoretical benefits: The results of the research are expected to add to the

repertoire of knowledge in the field of evaluation of learning English.

2. Practical benefits: The research is expected to determine the ability of Junior

High School No.07 Seluma in understanding and answering the questions

final test even so it can be used as a references towards improvement.

3. The teachers : As an input to the preparation of teachers about the

procedures and criteria of a good test, as well as alternative methods of

analysis most appropriate test items in a simple manner using classical test


4. For schools: The results of this study can be used as a follow-up decision

making information in the preparation and development of the test, and

evaluation of learning.

5. The readers: As a reference for similar research, in improving the quality of

the test instrument in the world of education.


G. Definition Key Terms

1. Test

Test is a measurement tools that organized by questions, command, and

directions for the test taker to get response or answer appropriate by that

directions .8

2. Summative Test

Summative test is a test that usually administrated at the end of the

course test items on the syllabus. Summative test is typically come at the

end of a course or unit instruction.9

3. Validity

Validity as appropriate of the result of test as an evaluations tool, but

more simply Validity as an appropriate test as a measurements tool which

main target that measureable.10

4. Reliability

According Sudijono, reliability is a series of measurements or series of

measuring instruments have consistency when measurements made by the

measuring instrument it is done repeatedly.11 So, i can conclude test

Thoha, Chabib. Teknik Evaluasi Pendidikan. Jakarta: Raja Grafindo Persada. 1991. Page.
Tinambunan, Wilmar. Evaluation of Student Achievement. Jakarta: Departemen
Pendidikan dan Kebudayaan. 1988. Page. 8.
Djiwandono, Soenardji. Tes Bahasa Pegangan bagi Pengajar Bahasa. Jakarta: PT indeks.
2011. Page. 164.
Anas, Sudijono. Pengantar Evaluasi Pendidikan. Jakarta:PT. Raja Grafindo
Persada.2005.Page. 65.

reliability is a test which believable, if that test used to measure in many

time, the result is same.

5. Difficulty Level

Item difficulty is the percentage of the total group that got the item

correct. It is important because it explains whether an item is too easy or too

hard. The optimal item difficulty is an item, which is not too difficult or too

easy and depends on the question type and on the number of possible


6. Discrimination Power

The discrimination power is a useful measure of item quality that tells

how an item differentiates between the proportion of the upper group who

got an item right and the proportion of the lower group who got the item


7. Distractor

Distractors efficiency is important in measuring the multiple choice

items in a test. Efficiency of distracters is the extent to which (a) the

distracters ‘lure’ a sufficient number of test takers, especially lower-ability

ones, and (b) those responses are somewhat evenly distributed across all


5. Teacher Made Test

Teacher-made test is a test developed by teachers without the help of

another teacher and applied for its own class with moderate or low

reliability narrow to cover all aspects.




A. Theoretical Description

1. Evaluation

Evaluation is a one of step that cannot be separated in teaching

learning process. According to Djiwandono, evaluation is a process to

collect information about the teaching learning process as a basic to make a

decision.12 According to Bloom, Evaluation is an important activity for the

teachers and students.13 Although the focus in the evaluation phase is on the

student’s self-evaluation, teachers are also engaged in evaluation activities.

In order to know how well the result of teaching and learning process, a

teachers must evaluate it. By evaluation the teachers can collect information

or can have picture describing how well the teaching learning activity

succeeded. According to Mardhapi define that evaluation is an activity to

increasing the quality, performance, and productivity of an institution on

their programs.14 Griffin and Nix state that evaluation is a judgment for

score or implication from the measurements result. According

Tyler,evaluation is a determining process how far the educations purpose is

reached. It can be used to improve the teaching and learning activities which
Djiwandono, Soenardji. Tes Bahasa Pegangan bagi Pengajar Bahasa. Jakarta: PT
indeks. 2011. Page. 10.
Bloom, Benjamin. Handbook on formative and Summative on Students Learning.
London: Longman. 1971. Page. 15.
Mardapi, Djemari. Teknik Penyusunan Instrumen Test dan Nontes. Yogyakarta: Mitra
Cendikia. 2008. Page. 8.

are done by the teachers and the students. Through the evaluation the result

of learning process can be seen whether it succeeds or not.

Based on some definition above the researcher can conclude that

Evaluation is the step on teaching learning process that is held on the last of

learning process. The teacher can use evaluation as a media to collect

information about the learning process and the student‟s achievement to

make a decision or defining whether the learning process is succeeded or


2. Test

One of the evaluation instruments is a test. There are several

definitions of the test. According to Thoha, test is a measurement tools that

organized by questions, command, and directions for the testee to get

response or answer appropriate by that directions.15 Furthermore, Brown

states that test is a method of measuring a person‟s ability, knowledge, or

performance in a given domain.16 Then, according to Mardapi, define that

Test some of question which have correct or incorrect answer.17 Tests also

have meaning as some of questions which need answer, or need responses.

With purpose is measure level of someone‟s people or to collect some

information about the test taker.

Thoha, Chabib. Teknik Evaluasi Pendidikan. Jakarta: Raja Grafindo Persada. 1991.
Brown, Douglas. Language Assessment: Principles and Classroom Practice.USA:
Longman. . 2003. Page. 22.
Mardapi, Djemari. Teknik Penyusunan Instrumen Test dan Nontes. Yogyakarta: Mitra
Cendikia. 2008. Page. 67.

Based on definition above, the researcher can conclude Test is a

method to measure a person ability that organized by question, command, and

directions that systematic and objectives procedures for collecting data.

3. The Purpose of Test

Mardhapi Classifieds purpose of test in four aspects:

a. Understanding the level of students‟ knowledge.

b. Measure the development and growth of the students.

c. Diagnostic learning‟s difficulties of students.

d. Understanding the output of teaching process.

e. Understanding the output of learning process.

f. Understanding the curriculums reach.

g. Encourage the students on study.

h. Encourage the teachers to teach better.18

4. The Kind of Test

There is much kind of tests that can be used to evaluate or to measure the

student‟s achievement.

According to Djiwandono in his book, Test Bahasa pegangan bagi

pengajar bahasa classified a kind of test in five aspects: 1. Test based on ways

to scoring. 2. Test based on arrangement. 3. Test based on the function of

organization. 4. Test based on education implementation function.19

Mardapi, Djemari. Teknik Penyusunan Instrumen Test dan Nontes. Yogyakarta: Mitra
Cendikia. 2008. Page. 67.
Djiwandono, Soenardji. Tes Bahasa Pegangan bagi Pengajar Bahasa. Jakarta: PT
indeks. 2011. Page. 95.

a. Test based on ways to scoring

1) Objective Test

Objective test is a test that the scoring using high level objectivity.

Heaton state that Objective test a form from questioning which has a

correct single answer.20 Objective test usually have only one correct

answer. According to Louis and Marilyn there three kind of objective

test is (a) true false test, (b), matching test and (c) multiple choice test.21

a) True false test

True false is simply a declarative statement which the students

must judge as true or false.22 According to Mardapi, true false test is a

form of test that consists of some statement with proportion true or



T F People has ten finger.

b) Matching test

Mardapi said, A test that form matching or pairing are consist of

a premise, a list answers possibility, and a directions to match the each

of premise with the one of possibility answer.24

Heaton, J. Writing English Language Test. Singapore: Longman. 1977. Page. 25.
Louis, Marilyn. Psycology Testing and Assesment, An Introductions to Testing and
Measurements. London: Mountain View. 1978. Page. 425.
Ibid. Page . 425.
Mardapi, Djemari. Teknik Penyusunan Instrumen Test dan Nontes. Yogyakarta: Mitra
Cendikia. 2008. Page. 71.
Ibid. Page . 71.


1. President of Republic Indonesia -Anis Baswedan

2. Governor of central java -Joko Widodo

-Ganjar Pranowo

c) Multiple choice

Multiple choices is one of the most popular and effective of all

the objective test that consist of two part: (1) the steam, which state th

problem, and (2) list of option, one of them is to be selected as the

correct answer.25


Sony : Can I use your computer, please

Dina : ………….I‟m using it

a. Of course b. no problem c. I‟m sorry d. sure

2) Subjective Test

Subjective test is a test that the scoring is subjective only or

impossible to scoring in objective. If the answer of the test be correcting

by 2 different people or more, the result of the test will be different also.

According to Djiwandono there are four kind of subjective test is:

a) Essay test.

Essay test is concern on test which the essays answer in many

writings style like descriptive and argumentative, based on the

Louis, Marilyn. Psycology Testing and Assesment, An Introductions to Testing and
Measurements. London: Mountain View. 1978. Page. 425.

problem on the question. Suwandi (2009:47) states that Essay test is a

form of question that demand of students answer in front of

description their own language.


How to make a coffee? Explain!

b) Test using question word.

This test is consisting of items test which design in form question

sentences which started question word.


Where is Manahan Stadium?

a. In Solo.

b. In Yogyakarta.

c. In Madiun.

d. In Sragen.

c) Short answer test.

This test is consisting of items test which design used question

word, generally used Wh-question-words.


Who the chief of education ministry of Indonesia?


d) Completing test.

This test item is consisting of short sentences which must be

completing by the test taker in the empty part of the sentences in

middle, beginning, or the ending of sentences.26


Today is my birthday. I usually have a big (1) ……. in my

birthday. My (2)….Friends always come to my party. Today‟s party is

different from last year‟s. I am having a (3)… than last year,

with only my parents, my aunt, and my sister.

1. a party b. anniversary c. ceremony d. Celebration

2. a fine b. good c.well d better

3. a. large b. larger c. small d. smaller

b. Test based on arrangement

1) Standardized test

Standardized test is a test that arranged and developed based on the

rules, requirement, and procedures that examined based on planning.

2) Test made by The Teacher

Test by The Teacher is a test that more simply that standardized test.

Each of procedures are making by the teacher itself.

Djiwandono, Soenardji. Tes Bahasa Pegangan bagi Pengajar Bahasa. Jakarta: PT
indeks. 2011. Page. 97.

c. Test based on the function of organization

1) Selection Test

Selection test or often called entrance test is a test that held to

determine the pupils are accepted or not.

2) Placement Test

Placement test is the test that is arranged for measuring student‟s

ability in language and assigning student‟s major in an appropriate grade

of Educational organization. According to Brown, the purpose of

placement test is to place a student into an appropriate level or section of

a language curriculum or school.27

3) Achievement Test

Achievement Test is a test that used to understanding the pupils

achievement in the educational organization. According to Brown, An

Achievement test is related directly to classroom lessons, units, or even a

total curriculum.28 Basuki and hariyanto said, Achievement Test is type

tests which design to measure the level of knowledge on subject of


4) Proficiency Test

Proficiency Test is a test concerned to evaluate the level of student‟s

skills in certain subject without correlating with other subject. According

Brown, Douglas. Teaching by Principles: An Interactive Approach to Language
Pedagogy..USA: Longman. . 2003. Page. 390.
Ibid . Page. 390.
Ismet, Basuki & Hariyanto, M.S. Assesmen Pemelajaran. Bandung: Remaja Rosdakarya.
2014. Page. 35.

to Brown in his book Teaching by Principles he stated that Proficiency

test is not intended to be limited to any one course, curriculum, or single

skill in the language.30 Proficiency test have traditionally consisted of

standardized multiple choice items on grammar, vocabulary, reading

comprehension, aural comprehension, and sometimes a sample of


5) Aptitude Test

Aptitude Test is a test to measure the personal ability to learn and to

successful in undertaking. According to Brown, A language Aptitude

Test is designed to measure a person’s capacity or general ability to learn

a foreign language and to be successful in that undertaking.31 Acoording

to Basuki and hariyanto, aptitude test is tests which aim to measure

proficiency someone to developing skill and knowledge.32

d. Test based on education implementation function

1) Formative Test

The teacher administers formative test during the learning progress

with the aim of using the result to improve instruction and to provide

continuous feedback to both students and teacher. Suwandi stated that

Formative test carry out during the learning process take place or in the

Brown, Douglas. Op.Cit. Page. 390.
Brown, Douglas. Teaching by Principles: An Interactive Approach to Language
Pedagogy..USA: Longman. . 2003. Page. 390.
Ismet, Basuki & Hariyanto, M.S. Assesmen Pemelajaran. Bandung: Remaja Rosdakarya.
2014. Page. 35.

end of a unit of a study.33 Tinambunan says formative test intended to

monitor learning progress during the instruction and to improve

continuous feedback to both pupil and teacher concerning learning

successes or failures.34

2) Summative Test

Summative test is a test that usually administered at the end of the

course. Moreover, summative test is given periodically to determine at a

particular point in time what students know and do not know about the

material that given by the teachers. Tinnambunan stated that summative

test is a test that usually administrated at the end of the course test items

on the syllabus.35 Summative test is typically come at the end of a course

or unit instruction.

According to Suwandi, there are four types of summative test that be

used in classroom include:

a) Performance task, Students are asked to complete the task and

determine what the students knows and are capability in doing A

rubric, checklist, or other form of scoring guide should accompany

this type of test.

b) Written product, students are asked to write an original selection.

There are many written forms that teachers can used to get the

students to write. Student may be asked to write about a previous

Suwandi, Sarwiji. Model Asesmen dalam Pembelajaran. Surakarta: Yuma Pustaka.
2009. Page. 46.
Tinambunan, Wilmar. Evaluation of Student Achievement. Jakarta: Departemen
Pendidikan dan Kebudayaan. 1988. Page. 8.

activity such as field trip or guest speaker. Student may be asked to

write their experience.

c) Oral product, students are asked to prepare an oral piece of work.

d) Test, students are asked to write the test at the end to demonstrate what

they know.

3) Pretest

According to Basuki and hariyanto in their book Assesment

Pembelajaran defined that pretest is a preface test which held to

understanding about the basic knowledge of students and to know how

the students get ready or not in learning experience.36

4) Posttest

According to Basuki and hariyanto state that this test will be held after

the teaching learning process to scoring the students understanding about

the material of learning.37

5. Characteristic of Good Test

According to Brown in his book Language Assessment, he classified the

characteristic of Good Test be 3 parts: 1. Validity, 2. Reliability, and 3.


Ismet, Basuki & Hariyanto, M.S. Assesmen Pemelajaran. Bandung: Remaja Rosdakarya.
2014. Page. 35.
Ibid. Page. 20.
Brown, Douglas. Language Assessment: Principles and Classroom Practice.USA:
Longman. . 2003. Page. 22.

a. Validity

The most complex criterion in a god test is validity. The extent to which

inferences made from assessment result are appropriate, meaningful, and

useful in terms of the purpose of the assessment.39 Hammensley said

Validity, I mean truth; interpreted as extent to which an account accurately

represent the social phenomena to which it refers. According to

Djiwandono, Validity as appropriate of the result of test as an evaluations

tool, but more simply Validity as an appropriate test as a measurements

tool which main target that measureable.40 Then, according to Mardhapi,

Validity is support evident and theory toward tests score based on the

purpose of using test.41 That‟s why Validity as the most basic fundament

on develops and evaluates a test. The process of validation includes

collecting the evidences to showing the scientific interpreting of tests score

which is planned. Based on the definitions above the researcher can

conclude Validity is the truth which includes of support evident and theory

as appropriate the result or the score of a test as an evaluation tool.

Brown divides validation in three types of validation which important

in teaching learning proses includes: 1. Content validity, 2. Face Validity,

3. Construct Validity. 42

Gronlund, Norman E. Measurement and Evaluation in Teaching. New York:
Macmillan Publishing. 1981. Page. 226.
Djiwandono, Soenardji. Tes Bahasa Pegangan bagi Pengajar Bahasa. Jakarta: PT
indeks. 2011. Page. 164.
Mardapi, Djemari. Teknik Penyusunan Instrumen Test dan Nontes. Yogyakarta: Mitra
Cendikia. 2008. Page. 16.
Brown, Douglas. Language Assessment: Principles and Classroom Practice.USA:
Longman. . 2003. Page. 22.

1) Content Validity

If a test actually samples the subject matter about which conclusions

are to be drawn, if it requires the testtaker to perform behavior that

being measured, it can claim Content Validity. Content validity often

used in learning‟s assessment. The main purpose is to understanding

how far the students understanding about the material which have

delivered by the teacher and the psychologies changes which shown

after join on learning.43 According to Thoha, state that Content Validity

is deal with is the content of test item which tested is reflected of the

curriculum or not.44

2) Face Validity

A concept that very closely related content validity is Face Validity,

which asks the question “Does the tests on the “face” of it, Appear from

the learners perspective to test what is designed to test? To achieve

“peak“ performance on a test, the learner need to be convinced that the

test is indeed testing what it claim to test. Face validity is almost always

perceived in terms of content: if the test samples the actual content of

what the learner has achieved or expect to achieve, then face validity

will be perceived. Mardapi said the evidence of Face validity is

obtained through correction toward the test item to making conclusion

Arifin, Zainal. Evaluasi Pembelajaran: Prinsip, Teknik, Prosedur. Bandung: Remaja
Rosdakarya. 2012. Page. 248.
Thoha, Chabib. Teknik Evaluasi Pendidikan. Jakarta: Raja Grafindo Persada. 1991.
Page. 48.

that the tests are relevant on measured.45 Arifin state that this validation

uses very simple criteria, because just look on the face side from that

instrument itself. It means, if the test assumed as good test, so that test

is required on face validity and no needed judgment more detail.46

Basuki and hariyanto state that this validity is kind of validity which

most lower in the meaning, moreover some scientists assumed this

validation not valid.47

3) Construct Validity

A third category of validity that the teachers must be aware of in

considering language tests is construct validity. One way to look at

construct validity is to ask the question “Does this test actually tap into

the theoretical construct as it has been defined?” “Proficiency” is a

construct. “Communicative competence” is a construct. “Self-esteem“

is a construct. Arifin said Construct Validity is a concept which

observable and measurable.48 Construct validity often called as logical

validity. Construct validity concern about the questions until how the

tests are really observe and measure the psychological function which is

being behavior description of the test taker. Thoha state that Construct

validity is a test where the tests item built mindset appropriate on the

Mardapi, Djemari. Teknik Penyusunan Instrumen Test dan Nontes. Yogyakarta: Mitra
Cendikia. 2008. Page. 16.
Arifin, Zainal. Evaluasi Pembelajaran: Prinsip, Teknik, Prosedur. Bandung: Remaja
Rosdakarya. 2012. Page. 248.
Ismet, Basuki & Hariyanto, M.S. Assesmen Pemelajaran. Bandung: Remaja Rosdakarya.
2014. Page. 121.
Arifin, Zainal. Op.Cit. Page. 230.

instructional purpose.49 In other word, Construct validity is a test which

the items test are measure think aspect based on the concept or

approach which used for explain that concept.

b. Reliability

A Reliable test is consistent and dependable. Source of unreliability

may lie in the test itself or in the scoring of the test, known respectively as

the test reliability or rater (on scoring) reliability. Scorers Reliability is the

consistency of scoring by two or more scorers. Hammersley state that

reliability refers to the degree of consistency with which instances are

assigned to the same category by different observers or the same observers

on different occasions. According to Arifin, reliability is level or degree of

consistency from an instrument.50

Djiwandono in his book Tes Bahasa said that a test can be says

reliable if the result of score is real and believable because consistently

and not changes. Thoha said test reliability is a test which believable, if

that test used to measure in many time, the result is same.51

According the definition above the researcher can conclude Reliability

is reliable or believable and consistency. A test can be categories as

reliability if the tests are consistent and if the test used to measure in many

time, the result is same and not changes.

, Chabib. Teknik Evaluasi Pendidikan. Jakarta: Raja Grafindo Persada. 1991. Page. 110.
Arifin, Zainal. Op.Cit. Page. 230.
Thoha, Chabib. Teknik Evaluasi Pendidikan. Jakarta: Raja Grafindo Persada. 1991. Page. 118.

c. Practically

A good test is practical. It is within the means of financial limitations,

time constraints, case of administrations, and scoring and interpretations.

Djiwandono state that Practically means simply and easily on procedural

and administration or practice of tests are held.52

6. Syllabus and Curriculum

Syllabus is not separated with the curriculum, because the instructions or

the content in the curriculum will be developed into syllabus based on the

competency standard and basic competence. Posner said that curriculum is

the set of instructional strategies teacher plan to use.53

a. Syllabus

Syllabus is designed based on the school and levels. But when the

teacher made syllabus, it must suitable with the student‟s ability and

system or situation. Brown said that syllabus is predominantly concerned

with the choices necessary to organize the language content of a course or

program.54 It can be understood that when the teachers will teach the

materials in the classroom, they can choose a lot of courses and organize

the materials that needed. Whole must concerned to the syllabus so the

materials on course not lateral.

Djiwandono, Soenardji. Tes Bahasa Pegangan bagi Pengajar Bahasa. Jakarta: PT
indeks. 2011. Page. 190.
Richards, Jack. Curriculum Development in Language Teaching. Cambridge: Cambridge
University Press. 2001. Page. 2.
Brown, Douglas. Teaching by Principles: An Interactive Approach to Language
Pedagogy..USA: Longman. . 2003. Page. 141.

b. Curriculum

Every course or school need curriculum to make their school be equal

with others. Curriculum made by the government. Posner said that

curriculum is the set of instructional strategies’ teacher plan to use. 55 That

statement mean if curriculum have an important role in a school or course.

Because curriculum as instructional strategy or basic reference on teaching

learning process.

B. The Previous of Study

The research about analysis test has already done by several researchers.

The following are the summary that the researcher read from the result of the

research before.

The first previous study is the journal from Bernasela, Tanjung Pura

University Pontianak. With the title “An Analysis On English Summative Test

Items”. Although this have same subject in analysis item test because the aim

of this research is to measure the appropriateness of the difficulty level,the

index of discriminating, and the distractor of the English summative test items.

She used descriptive study research. Based on the analysis of the English

summative test item for the fourth semester of the eleventh grade students in

academic year 2012/2013 the researcher draws the conclusion that there are 33

good test items which still can be used for the next summative test, 6 test items

should be discarded or changed by the other test item and 11 test items should

be revised. The ineffective distractors in test items which are need revision

Posner, George. Analyzing the Curriculum. New York: McGraw-Hill Companies, Inc.
2004. Page. 5.

should be replaced since as stated in the discussion the ineffective distractors

affected the test item both the difficulty level and discriminating power of the

test items.

The second previous study is the journal from Rusma Setiyana, University

of Syiah Kuala, Banda Aceh. The title is “Analysis Of Summative Tests For

English”. Although this have same subject in analysis item test because the

aim of this research is to measure the appropriateness of the difficulty level,the

index of discriminating, and the distractor of the English summative test items.

She used quantitative method in this her research. The results of her research

are the difficulty level of summative test item is in moderate level, and then the

index of discrimination of English summative test items is in good level, and

the distractor of English summative test items is acceptable.

The third previous is is the journal from Hanik Huzaimatul Husna and

Fachrurrazy, State University of Malang. The title is “An Analysis Of An

English Summative Test For 6th Grade Students In Three Public Elementary

Schools In Udanawu District, Blitar Regency”. Although this have same

subject in analysis item test because the aim of this research is to measure the

appropriateness of the difficulty level,the index of discriminating, and the

distractor of the English summative test items. They used qualitative method

with the collected documents from the three schools for their research. They

finding There are fifty items of the English summative test. Based on the result

of the content validity analysis, the teachers use three standard of competences;

they are speaking, reading, and writing.


In this research, the researcher has similarity with the three previous

studies above, the similarity is about evaluation and summative test. The

subject of the research is on the English Summative Test Item on the first

semester of Islamic Integrated Junior High School (SMPIT) Khairunnas

Bengkulu City in academic year 2018/2019.

C. Theoretical framework

English Summative

Content analysis

Reability Level of
Validity difficulty

The distractors

Figure 2.1 Theoretical framework of the research

The diagram above shows the framework of the concepts will construct

in this research. Summative test is one of the kinds of language assessment.

Summative test aims to measure, or summarize, what a student grasped, and

typically occurs at the end of a course of unit of instruction. Item Analysis is


related to the several items of statistical analysis in analyzing characteristics

and features of a test. They consist of validity, reliability, level of difficulty,

discriminating power and the distractors.




A. Reasearch Design

To know the quality of the tests, the test items should be analyzed. The test

item should also be prepared well, because the test result will be influenced by

the quality of the test. The quality of test is influenced by the quality of each

item. The teacher should focus on the quality of the test items, so it is very

important for the teacher to do the content analysis. Because, by analyzing the

content, the teacher can identify the quality of each item, find out which items

appropriate with criteria, which items must be removed, and which items

should be revised.

Content analysis is a significant and necessary step that has to be done in

the preparation of a good test. It provides information about how well each

item in the test functioned. It is important for each school to conduct the item

analysis while administering a test, especially in administering the summative

test. So the researcher use the content analysis to analyze the summative test

for eighth grade students at Integrated Islamic Junior High School (SMPIT)

Khairunnas Bengkulu City.

B. Research Subject

The subject of this research was English summative test items for eighth

grade students of Integrated Islamic Junior High School (SMPIT) Khairunnas

Bengkulu City academic year 2018/2019.


C. Time and Place of the Study

The place of this study is take at Integrated Islamic Junior High School

(SMPIT) Khairunnas Bengkulu City ,Hibrida XV Street No.51, Sido Mulyo,

Gading Cempaka, Kota Bengkulu, Bengkulu 38211. This research was

beginning by asking for the English summative test sheets, answer sheets,

answer keys and the Syllabus to the headmaster and English teacher. The

research starts on November 2019.

D. Technique of Collecting Data

The term data refers to the kinds of researchers obtain information on the

subjects of reviews their research.56 A study will be able to solve the problem

completely, for can obtain valid data requires a data collection techniques are

good also. Researhcer will use some tools to her to work easier, systematic,

effective, ad intensive durig the research. To collect the data in this research,

the researcher will use some methods of data collectig and research some

instruments. In collecting the data, the researcher used documentation study.

Documentation study may refer to technique of collecting data by gathering

and analyzing documents, while document is any communicable material (such

as text or ducument) used to explain some attributes of anobject,systems or

procedures. The researcher did the document analysis. Data in the form of a

document like this can be used to dig up information that accrued in the past.

The researcher use documentation to collect the summative test sheets, answer

Homby. Oxford Advance Learner’s Dictionary of Current English. Oxford: Oxford
University Press. 2000. Page : 127.

sheets, answer keys and the Syllabus from IntegratedIslamic Junior High

School (SMPIT) Khairunnas Bengkulu City.

E. Research Instrument

The research instrument which was uses by the researcher in collecting the

data is documentation. The documents are:

1. Paper/ test booklet

The researcher asks the English summative test paper from the school.

The test which is analyzed is the English summative test. For the eighth

grade the total items are 50 items English summative test, it is 50 items of

multiple choice and 0 items of essay.

2. Answer sheets

The answer sheets use to know the answer distribution. They as

analyze in order to find out the validity, reability, difficulty

level,discriminating power and the distractors to answer the problem


3. Answer keys

This answer keys use as a valid guide to scoring each item.

F. Technique of Analyzing Data

In order to give the clear explanation, the researcher explains the data

analysis technique in separating based on the problems statement :

2. To answer the problem statement number 1 “How is the quality of

summative test items for eighth grade students at Integrated Islamic Junior

High School (SMPIT) Khairunnas Bengkulu City related to the validity,


reability, level of difficulty, discriminating power and the distractors ?” the

researcher used the formula as follows;

a. Measuring the Validity of test items :

After finding the correlation coefficient by using above pattern,

then the result compared with the critical value of product moment

adopted from Arikunto.57 Arikunto states that if the result of r in a test

item is higher than table of Product Moment, it means that the item is

considered to be valid.58 In addition, the validity level could be found out

by the classification of validity indeks as follows :

Suharsimi, Arikunto. Dasar-Dasar EvaluasiPendidikan. Jakarta: Bumi Aksara. 2013. Page.402.
Ibid. Page .46.

Table 3.1
Thevalidity classification

0.80-1.00 Excellent

0.60-0.80 Good

0.40-0.60 Satisfactory

0.40-0.60 Poor

0.00-0.20 Very Poor

b. Measuring the Reability of the test items:

Where :

After finding the correlation coefficient by using above pattern, then

theresult compared with the critical value of product moment from

Arikunto.59 Arikunto inalso states that if the result of r in atest item is

higherthan table of Product Moment, it means that the item isconsidered to

Suharsimi, Arikunto. Dasar-Dasar EvaluasiPendidikan. Jakarta: Bumi Aksara. 2013. Page.402.

be reliable.60 In addition, the realibility level could be found out bythe

classification of realibility indeks asfollows:

Table 3.2
The realibility clasification

0.00 < r11 ≤ 0, 20 Very low

0.20 < r11 ≤ 0, 40 Low

0.40 < r11 ≤ 0,60 Medium

0.60 < r11 ≤ 0,70 High

0.70 < r11 ≤ 1 Very High

c. Measuring the Deficulty level of the test items:


Where :

P = Indeks of difficulty level

NP = Number of test-takers answering correctly

N = Number of test-takers responding to that item.

The difficulty level could be found out by the classification of

difficulty level indeks as follows:

Suharsimi, Arikunto. Dasar-Dasar EvaluasiPendidikan. Jakarta: Bumi Aksara. 2013. Page.47.

Table 3.3
The difficulty level classification
P Classification

P = 0.00 Too Difficult

0.00 < P ≤ 0.30 Difficult

0.30 < P ≤ 0.70 Medium

0.70 < P ≤ 1.00 Easy

P=1 Too Easy

d. Measuring the Discriminating power of test items :

𝐷𝑃 =

Where :

DP = Discriminating power.

WL = The number of students who anwered incorrectly from lower group.

WH = The number of students who anwered incorrectly from top group.

n= Sample size from one group.

The discriminating power could be found out by the classification of

discriminating power indeks as follows:


Table 3.4
The discriminating power level classification
Discriminating power Interpretation

Negative Very Poor

0,00 - 0,20 Poor

0,20 - 0,40 Satisfactory

0,40 - 0,70 Good

0,70 - 1,00 Exellent

e. Measuring the Distractor of test items :

The effectiveness of the deception on the test questions can be

determined by using the following formula:

𝐷𝐸 = x 100%

Where :

DE : Distractor indexs

P : The number of who choose distractor.

N : The number of studen who take the exam.

B : The number of student who answered correctly on each test question.

n : The number of alternative answers.

1 : Fixed number.

The distractor power could be found out by the classification of

distractor power indeks as follows:


Table 3.5
The distractor level classification
DE Classification

76% - 125% (4) Excellent

51% - 75% or 126% - 150% (3) Good

26% - 50% or 151% - 175% (2) Satisfactory

0% - 25% or 176% - 200% (1) Poor

More than 200% (0) Very Poor

f. Measuring the Quality of test items :

Table 3.6
The Quality level classification
The number of creteria met ( Quality of Revision Save in the
validity, reability, difficulty the test question
level, discriminating power and items. bank
the distractors)
4 Excellent No Need Yes
3 Good Revision Not Yet
2 Satisfactory Revision Not Yet
1 Poor Discarded No
0 Very Poor Discarded No

This following is an explanation of the question quality criteria table


A. If the item meets the criteria of good questions namely validity, suitability,

differentiation, and deception, then the question can be questioned very well

and can be used in the matter of the bank.

b. If the items meet three of the four good items namely validity, suitability,

differentiation, and acquisition of deceit, then the question needs to be


questioned as a good question and cannot be saved in the bank question. This

issue needs to be revised until it meets the criteria.

c. If the items meet two of the four good question criteria, namely validity,

difficulty, differentiation, and deception effectiveness, then the problem is

said to be a medium problem and cannot be stored in the question bank. The

question needs to be revised until it meets four criteria.

d. If the items meet one of four good question criteria, namely validity, difficulty

level, distinguishing power, and effectiveness of deceit, then the problem is

said to be a bad question and cannot be stored in the question bank. The

question needs to be significantly revised so that it is better to discard it or not

keep it in the question bank.

e. If the items do not meet the four criteria of good questions, namely validity,

level of difficulty, distinguishing power, and effectiveness deceitful, then the

problem is said to be a very bad question and cannot be stored in the question

bank. This problem requires a significant revision so it is better to discard it.

f. In addition to item-based requirements, the test as a whole must be reliable

with the conditions previously explained

G. Procedure of Collecting Data

To colect the date, the researcher visited the school to ask the documents.

These include the English summative test items, answer sheets,answer keys

and syllabus of the English summative test for eighth grade students at

Integrated Islamic Junior High School (SMPIT) Khairunnas Bengkulu City to

be analyzed.

In the process of writing this research, the researcher did following this

steps :

1. Collecting the English summative test pepers made by teachers for eighth

grade students at Integrated Islamic Junior High School (SMPIT)

Khairunnas Bengkulu City.

2. Analyzing the validity, reability, the difficulty level, discriminating power

and the distractors of each test item.

