2023 12 Tadris Vol - 8 No - 2 Itsna

Tadris: Jurnal Keguruan dan Ilmu Tarbiyah 8 (2): 283-301 (2023)
DOI: 10.24042/tadris.v8i2.13733
Development and Implementation of a Four-Tier Close-Ended Test to

Analyze Students' Misconceptions of Optical Instruments
Itsna Rona Wahyu Astuti1*, Achmad Samsudin1, Ida Kaniawati1, Endi Suhendi1, Bayram
Çoştu2
1
Department of Physics Education, Universitas Pendidikan Indonesia, Indonesia
2
Program of Science Education,Yildiz Technical University, Turkey
______________ Abstract: The research aims to develop a four-tier test for optical
Article History: instrument materials. The method used in this study is a 4D design that
Received: September 8th, 2022 includes defining, designing, developing, and disseminating. The
Revised: August 4th, 2023
Accepted: November 14th, 2023
instrument used consisted of fifteen items in the form of a four-tier
Published: December 30th, 2023 closed-ended test. The research participants were 60 female and 15
male students from West Java in grade 11 high school who were
randomly selected. The analysis is divided into four parts. The first
_________ analysis is a CVR and multi-rater Rasch measurement of the original
Keywords: validation results. The second analysis involves calculating the
Four-tier test, percentage of students' scores based on their conception scores. The
Misconception, third is a Rasch Model analysis of the instrument's validity and
Optical instruments reliability. The Rasch Model is used in the fourth analysis to examine
conceptions and misconceptions. Following the analysis, all items met
_______________________ the CVR value criteria. I2, I7, I9, I10, I12, I15, and I3 have logit values
*Correspondence Address: less than zero and are corrected based on expert feedback. The second
itsnarona@upi.edu analysis reveals that students continue to have misconceptions about
each item. According to the third analysis, all items were valid and
reliable, with a Cronbach Alpha value 0.78 in either category.
According to the fourth analysis, conception is inversely related to
misconception. The fewer misconceptions, the better the students'
conceptions, and vice versa. However, confidence can also be a
dissonant influence. Students who experience misconceptions need to
be given appropriate treatment to reduce misconceptions about optical
instrument materials. Hopefully, the four-tier closed-ended test that
has been developed can be used and developed into a better five-level
test to investigate the causes of each student's misconceptions.
INTRODUCTION Suprapto, 2020). Drawing on observations

Misconceptions are the difference and practical experience, the concepts
between student concepts and scientific acquired over time tend to become
theory concepts to understanding a ingrained in daily routines. Effecting a
phenomenon, and they are caused by shift in students' misconceptions, moving
competing beliefs that appear to be them from incorrect to accurate, proves to
supported by logical arguments (Hammer, be a challenging endeavor (Coştu et al.,
1996; Heyd-Metzuyanim & Schwarz, 2012; Greca et al., 2014; Kaltakci-Gurel et
2017; Husnah et al., 2020; Jauhariyah et al., 2017; Liu & Fang, 2016; Ohlsson &
al., 2018; Kaltakci-Gurel et al., 2017; Cosejo, 2014; Shen et al., 2017; Stein et
Kocakulah & Kural, 2010; Oberoi, 2017; al., 2008).
© 2023 URPI Faculty of Education and Teacher Training

Universitas Islam Negeri Raden Intan Lampung
Development and Implementation of … | I. R. W. Astuti, A. Samsudin, I. Kaniawati, E. Suhendi, B. Çoştu
Misconceptions can be influenced by period, articulate their thoughts in writing,

factors such as students' pre-existing uncover misconceptions in problem-
knowledge, teachers' prior understanding, solving, and assist students in overcoming
textbooks, the learning environment, and learning difficulties. The open-ended test
inaccuracies in transcribing terminology format enables respondents to express their
(Jauhariyah et al., 2018; Kocakulah & answers in their own words and allows the
Kural, 2010). According to Oberoi (2017), administration of a broader sample than
Misconceptions can arise due to interview tests (Gurel et al., 2015;
insufficient knowledge about concepts, Kaltakci-Gurel et al., 2017; Zhou et al.,
textbook confusion, linguistic ambiguities, 2016). Nevertheless, tests in the form of
or overgeneralization. Misconceptions descriptive responses come with practical
may also stem from learning strategies, limitations related to language use issues.
students' initial information, the challenge Students often exhibit a lack of enthusiasm
of connecting one concept to another, the in providing answers as complete
content presented in textbooks, and the sentences, necessitating more time for
influence of language and media. result analysis and assessment (Bautista &
According to Oberoi (2017) and Kaltakçi Boone, 2015; Kaltakci-Gurel et al., 2017;
& Didiç (2007), misconceptions among Kaltakçi & Didiç, 2007).
students can be caused by a variety of The four-tier test is structured with
factors, including both the students four levels. The first level involves answer
themselves and their educational choices (multiple-choice), the second level
environment. involves indicating the level of confidence
Misconceptions can be identified for the answers selected in the first level,
through the use of diagnostic tests, which the third level entails choosing reasons
are assessment tools designed to pinpoint (multiple-choice) for the answers selected
challenges or unresolved issues that in the first level, and the fourth level
learners may encounter in the learning involves indicating the confidence level
process (Fariyani et al., 2017; Gurel et al., for the reasons selected in the third level
2015; Pertiwi & Setyarsih, 2015; Rosita et (Kaltakci-Gurel et al., 2017). In the second
al., 2020). Diagnostic tests are also and fourth levels, the expressions of
characterized as assessments designed to confidence typically involve
identify students' weaknesses, facilitating categorizations such as "sure" and "not
the implementation of appropriate sure." The four-tier test represents an
measures to address those areas of concern advancement over a comparable
(Ismail et al., 2015; Rosita et al., 2020). diagnostic test with a three-tier format
Interviews, concept maps, open tests, comprising only three components. The
multiple-choice tests, two-tier multiple- three-level diagnostic test, in turn,
choice tests, three-tier tests, and four-tier enhances a two-level diagnostic test.
tests can all uncover students' Incorporating reasons for selecting
misconceptions. Interviews, open-ended answers is a notable improvement
tests, and multiple-choice tests are (Anggrayni & Ermawati, 2019; Hermita et
commonly employed in physics education al., 2017). Therefore, the four-level
research. Nevertheless, each diagnostic multiple-choice diagnostic test is the most
test instrument comes with its own set of accurate for detecting misconceptions
advantages and disadvantages when (Afif et al., 2017; Anggrayni & Ermawati,
compared to others (Gurel et al., 2015; 2019; Hermita et al., 2017). The
Kaltakci-Gurel et al., 2017). comprehension levels of the four-tier test
Open-ended tests or descriptive reveal that students' conceptions can be
assessments can prompt students to categorized into six distinct conceptual
contemplate a concept for an extended levels according to the level of
284 | Tadris: Jurnal Keguruan dan Ilmu Tarbiyah 8 (2) : 283-301 (2023)
understanding conveyed by Coştu (2008), instruments. Accordingly, a four-tier

the assessment by Kaltakci-Gurel et al. closed-ended test focusing on optical
(2017), and the concept category of devices has been devised, which deviates
(Amalia et al., 2019). The six categories from the open-ended format in the third
are Sound Understanding (SU), Partial tier.
Positive (PP), Partial Negative (PN),
Misconception (MC), No Understanding METHOD
(NU), and No Coding (NC). Research Design
Based on literature studies, students The research method followed the
still have misconceptions about some 4D model, encompassing the stages of
physics concepts (Coetzee & Imenda, defining, designing, developing, and
2012; Kocakulah & Kural, 2010; disseminating (Thiagarajan et al., 1974).
Kucukozer, 2010; Rohmanasari & The defined stage of literature studies on
Ermawati, 2020; Salamah et al., 2017; optical instrument misconceptions has
Salmadhia et al., 2021; Umar et al., 2021). been concluded. The subsequent design
One of the physics materials that still has stage involved establishing a construction
misconceptions is optical instrument distribution for each item, designing
material (Kaniawati et al., 2020; content for each item, and implementing a
Rohmanasari & Ermawati, 2020; four-tier test. The first, second, and fourth
Salmadhia et al., 2021). Consequently, tiers utilize closed-ended questions in this
researchers necessitate a suitable test, while the third tier adopts an open-
instrument to evaluate students' ended format.
misconceptions regarding optical
Figure 1. Research Chart of the Four-Tier Closed-Ended Questions on Optical Instruments.
In the development stage, the four- ended test into a four-tier closed-ended
tier open-ended format is expanded to test. The instrument underwent validation
involve students explaining a concept in by five experts who assessed each item
the third tier. Subsequently, all the reasons based on nine validation indicators. The
provided by the students in the third tier are validation results were then analyzed using
compiled and used as answer choices. This the Content Validity Ratio (CVR) and
restructuring transforms the four-tier open- multifaceted Rasch measurement. In the
Tadris: Jurnal Keguruan dan Ilmu Tarbiyah 8 (2): 283-301 (2023) | 285
dissemination stage, the finalized four-tier Data Analysis

closed-ended instrument, analyzed using The analysis conducted in this study
the Rasch Model, was tested. comprises four stages. The expert
validation results were initially
Participants scrutinized, employing CVR and multi-
The participants involved in this rater Rasch measurement for analysis. The
study were 75 high school students in experts evaluated the validity of the
grade 11 in West Java. The students subject under consideration, determining
consisted of 60 female students and 15 whether it is valid without revision, with
male students. Random sampling is used to revision, or invalid. Equation 1 was
select students. applied to calculate CVR.
Instruments
The research employed a four-tier 𝑁
𝑛𝑒 − 2
closed-ended instrument comprising 𝐶𝑉𝑅 =
𝑁
fifteen items related to optical instruments,
2
including topics such as cameras, eyes and
eyeglasses, magnifying glasses,
Description:
microscopes, and binoculars. The first tier 𝑛𝑒 = The number of validators who provide
presents a concept-based question, while valid assessments
the second tier gauges students' confidence 𝑁 = The total number of validators
in responding to the initial question. The
third tier requires students to provide The instrument is deemed valid if the
reasons or explanations for their answers calculated CVR result surpasses the
to the first tier. Lastly, the fourth tier minimum CVR value, as determined by
assesses the confidence level associated the Schipper Table (Wilson et al., 2012).
with the explanations provided in the third Table 1 shows the minimum CVR values
tier. All questions in the test are closed- for the various validator counts.
ended and take the form of multiple-choice
queries.
Table 1. The Minimum CVR Values for the Various Validator Numbers.
Number of Experts Minimum CVR Value Number of Experts Minimum CVR Value
5 0.736 15 0.425
6 0.672 20 0.368
7 0.622 25 0.329
8 0.582 30 0.300
9 0.548 35 0.287
10 0.520 40 0.260
The multi-rater Rasch measurement 𝑝𝑛𝑖𝑗𝑘−1 = Probability of examinee n receiving a

model is used to analyze the expert rating of k-1 on criterion i from rater
j
validation. The multi-rater Rasch 𝜃𝑛 = Proficiency of examinee
measurement model created by Linacre is 𝛽𝑖 = Difficulty of criterion i,
as follows (Eckes, 2019): 𝛼𝑗 = Severity of rater j,
𝑝𝑛𝑖𝑗𝑘
𝜏𝑘 = Difficulty receiving a k rating
ln [ ] = 𝜃𝑛 − 𝛽𝑖 − 𝛼𝑗 − 𝜏𝑘 relative to a k – 1
𝑝𝑛𝑖𝑗𝑘−1
The second step is categorizing
Description: students in each concept understanding
𝑝𝑛𝑖𝑗𝑘 = Probability of examinee n receiving a category based on their responses. The
rating of k on criterion i from rater j
results of the classification of concept misconception score are shown in Table 2

categories for students are expressed in (Amalia et al., 2019; Aminudin et al.,
percent form. The rubric of the conception 2019; Coştu, 2008; Kaltakci-Gurel et al.,
category, conception score, and 2017).
Table 2. The Score of Conceptions and Misconception.
Tier
Categories Score MC
1 2 3 4
Sound Understanding (SU) True Sure True Sure 4 0
Partial Positive (PP) True Sure True Unsure
True Unsure True Sure 3 0
True Unsure True Unsure
Partial Negative (PN) True Sure False Sure
True Sure False Unsure
True Unsure False Sure
True Unsure False Unsure
1 1
False Sure True Sure
False Sure True Unsure
False Unsure True Sure
False Unsure True Unsure
No Understanding (NU) False Unsure False Sure
False Sure False Unsure 0 3
False Unsure False Unsure
Misconception (MC) False Sure False Sure 0 4
No Coding (NC) (Incomplete answer) - -
The third analysis stage evaluates the CORR). Additionally, the

four-tier optical instrument questions unidimensionality output, indicating the
developed using the Rasch Model: raw variance explained by measures, is
utilized to ascertain instrument validity.
𝑒(𝛽𝑛 −𝛿𝑖 ) Rasch analysis was also employed to
𝑃𝑛𝑖 (𝑥𝑛𝑖 = 1|𝛽𝑛 , 𝛿𝑖 ) = assess the reliability of the instruments,
1 + 𝑒(𝛽𝑛 −𝛿𝑖 )
yielding results such as person reliability,
𝑃𝑛𝑖 (𝑥𝑛𝑖 = 1|𝛽𝑛 , 𝛿𝑖 ) is the probability of item reliability, and Cronbach's Alpha.
Person reliability gauges the consistency
respondent n in i to produce the correct
of students' responses, while item
answer (𝑥𝑛𝑖 = 1) with the respondent's
reliability reflects the quality of the
ability (𝛽𝑛 ) and item difficulty
instrument's items. Cronbach's Alpha
level (𝛿𝑖 ) (Sumintono & Widhiarsho, provides an overview of the overall
2015). interaction between individuals and items.
The instruments were subjected to In the fourth analysis, students'
data analysis, focusing on students' misconception scores on each item were
conception scores for each item. scrutinized using Rasch analysis. Output
Instrument analysis was employed to tables, such as output variable maps
assess the items' validity, reliability, and (Wright maps), were utilized in this
difficulty level. Rasch analysis served as analysis to interpret the findings.
the methodology for instrument analysis.
The instrument's validity is gauged by RESULT AND DISCUSSION
evaluating the appropriateness of each The focus is exploring alternative
item. Item validity is determined through conceptions derived from students'
the output of the Item Fit Order, responses in transitioning from the four-
considering outfit mean square (MNSQ), tier open-ended to the four-tier closed-
outfit Z-Standard (ZSTD), and point ended tests on optical instruments. The
measure correlation (PT MEASURE subsequent sections will provide a detailed
discussion of the stages of development magnifying glasses, microscopes, and

(define, design, develop, and disseminate) binoculars are the optical instrument
and the associated analysis within the materials investigated. Based on the
framework of the 4D model. literature review and the prediction of
optical instrument misconceptions that
Define students may encounter a four-tier test can
The define stage is a review of the be developed. The results of literature
literature on optical instrument studies, the misconceptions that occur
misconceptions. This stage is utilized to among students regarding optical
locate research sources. A literature review instruments are detailed as follows
of misconceptions about optical (Munawaroh et al., 2016; Kaniawati et al.,
instruments is performed on each sub- 2020; Salmadhia, Rusnayati, & Liliawati,
material. Eyes, cameras, eyeglasses, 2021).
Table 3. Students Misconceptions about Each Sub-material.
Sub Material Students Misconceptions
The near point (PP) of hyperopia is farther than the normal eye, so objects must be
placed closer than 25 cm.
The larger the field diameter of a lens, the more light comes in, so the image gets bigger.
The older you get, the better your eyes' accommodation power becomes.
Eyes, eyes
The pupil in the human eye has the same function as the diaphragm, regulating the
glasses, and
intensity of incoming light.
camera
The near point of the eye of a myopia sufferer is closer than the near point of a normal
eye (PP < 25cm).
The camera's distance to the object is closer when the camera is in a landscape position
than in a portrait position.
If the loupe lens is partially closed, the image formed is half of the object.
A convex lens is used as a loupe because it spreads light so that the image of an object
Magnifying
is enlarged from its original size.
glass
The strength of the loop is not affected by the medium in which the loop is used, meaning
that the strength of the loop in air and water is the same.
The function of a microscope is to see small objects so that they appear large and clear.
Observations using a microscope with maximum accommodation occur when the image
Microscope
formed by the objective lens is exactly in the focus of the eyepiece lens.
Concave mirrors and convex lenses have the property of scattering light.
All lenses on stage binoculars are convex lenses that can collect light.
In unaccommodated observations, the magnification of the image from a star telescope
Binoculars
is influenced by the length of the telescope tube.
The more lenses a binocular has the greater the angular magnification.
Design options: "sure" and "unsure." However,

The instrument design stage requires students have no choice but to fill in at the
distributing the construction of questions third tier of the four-tier open-ended as in
for each sub-material of optical Figure 2a. At the same time, the fourth tier
instruments, designing content for each of confidence is comparable to the second
item, and designing questions in the form tier. After obtaining alternative concepts
of a four-tier open-ended test. The first tier from students' answers at the third tier,
is a regular multiple choice, while the design a four-tier close-ended test as in
second tier is about confidence, with two Figure 2b.
Question 1.1 Question 1.1

(Multiple choice questions with five answer choices (Multiple choice questions with five answer choices
and according to item construction). and according to item construction).

Are you sure about your answer to question 1.1? Are you sure about your answer to question 1.1?
A. Sure A. Sure
B. Unsure B. Unsure

Reasons for the answer to question 1.1: Reasons for the answer to question 1.1:
.............................................................................. (There are five answer choice)

Are you sure about your answer to question 1.3? Are you sure about your answer to question 1.3?
A. Sure A. Sure
B. Unsure B. Unsure
(a) (b)
Figure 2. The Design of Four-tier: (a) Open-ended Test, and (b) Close-ended Test
The distribution of the question included in one sub-material because

construction contains the form of they work similarly. This sub-material
questions and choices in the first tier. also has the most questions compared to
Each set of questions and choices in the the sub-materials of magnifying glasses,
first tier can be in statements, pictures, or microscopes, and binoculars. This aligns
tables, such as the item construction with (Kaniawati et al., 2020; Munawaroh
distribution in Table 3. In the et al., 2016) that the most common
construction distribution of eye, camera, misconceptions are in the eye, camera,
and glasses sub-materials, they are and glasses sub-materials.
Table 4. Distribution of Item Construction.
Sub Material Item Number Item Construction
1 Statement, Figure – Table
2 Statement – Statement
3 Table – Statement
Eyes, eyes glasses, and camera
4 Figure – Table
5 Statement – Table
6 Statement, Figure – Statement
7 Statement, Figure – Figure
Magnifying glass 8 Figure, Statement – Statement
9 Table – Statement
10 Figure – Statement
Microscope 11 Statement – Figure, Table
12 Statement – Figure
13 Figure, Statement – Figure
Binoculars 14 Table – Statement
15 Statement, Figure – Statement
Develop alternative conceptions of students in the

During the development stage, third tier with open answers.
open-ended four-tier instruments will be Furthermore, the alternative concept
converted to four-tier close-ended obtained will be an option on the third tier
instruments. The four-tier open-ended of the four-tier closed-ended instrument.
instruments consist of 15 items. This Figure 3 shows an example of a third-tier
stage was carried out to investigate change from open to closed.
Figure 3. The Example of the Four-tier Close-ended Test on Optical Instruments.
In the four-tier closed-ended test's Subramaniam, 2010; Gurel et al., 2015;

Question 6.3, the answer choices Kaltakci-Gurel et al., 2017), reasoning
represent reasons given by students. Each can differentiate between a correct
reason is categorized, modified, and used answer due to the right rationale
as a response option. This approach is (scientific concept) and a correct answer
highly beneficial for probing student based on flawed reasoning (false
responses. According to (Caleon & positive).
Item AAN N Ne CVRi CVRa Item AAN N Ne CVRi CVRa Item AAN N Ne CVRi CVRa Item AAN N Ne CVRi CVRa
1 5 5 1 1 5 5 1 1 5 5 1 1 5 5 1
2 5 5 1 2 5 5 1 2 5 5 1 2 5 5 1
3 5 5 1 3 5 5 1 3 5 5 1 3 5 5 1
4 5 5 1 4 5 5 1 4 5 5 1 4 5 5 1
1 1.000 5 1.000 9 1.000 13 1.000
5 5 5 1 5 5 5 1 5 5 5 1 5 5 5 1
6 5 5 1 6 5 5 1 6 5 5 1 6 5 5 1
7 5 5 1 7 5 5 1 7 5 5 1 7 5 5 1
8 5 5 1 8 5 5 1 8 5 5 1 8 5 5 1
9 5 5 1 9 5 5 1 9 5 5 1 9 5 5 1
1 5 5 1 1 5 5 1 1 5 5 1 1 5 5 1
Table 5. Analysis Utilizing CVR.
2 5 5 1 2 5 5 1 2 5 5 1 2 5 5 1
3 5 5 1 3 5 5 1 3 5 5 1 3 5 5 1
4 5 5 1 4 5 5 1 4 5 5 1 4 5 5 1
2 1.000 6 1.000 10 0.956 14 1.000
5 5 5 1 5 5 5 1 5 5 5 1 5 5 5 1
6 5 5 1 6 5 5 1 6 5 5 1 6 5 5 1
7 5 5 1 7 5 5 1 7 5 5 1 7 5 5 1
8 5 5 1 8 5 5 1 8 5 4 0,6 8 5 5 1
provide valid; CVRi: CVR index; CVRa: CVR average.

9 5 5 1 9 5 5 1 9 5 5 1 9 5 5 1
1 5 5 1 1 5 5 1 1 5 5 1 1 5 5 1
2 5 4 0,6 2 5 5 1 2 5 5 1 2 5 5 1
3 5 4 0,6 3 5 5 1 3 5 5 1 3 5 4 0,6
4 5 5 1 4 5 5 1 4 5 5 1 4 5 5 1
3 0.867 7 1.000 11 0.956 15 0.956
5 5 5 1 5 5 5 1 5 5 5 1 5 5 5 1
6 5 5 1 6 5 5 1 6 5 5 1 6 5 5 1
7 5 5 1 7 5 5 1 7 5 5 1 7 5 5 1
8 5 4 0,6 8 5 5 1 8 5 4 0,6 8 5 5 1
9 5 5 1 9 5 5 1 9 5 5 1 9 5 5 1
1 5 5 1 1 5 5 1 1 5 5 1
2 5 5 1 2 5 5 1 2 5 5 1
3 5 5 1 3 5 5 1 3 5 5 1
4 5 5 1 4 5 5 1 4 5 5 1
4 1.000 8 1.000 12 1.000
5 5 5 1 5 5 5 1 5 5 5 1
6 5 5 1 6 5 5 1 6 5 5 1
7 5 5 1 7 5 5 1 7 5 5 1
8 5 5 1 8 5 5 1 8 5 5 1
*AAN: Assessment Aspect Number; N: the total number of validators; Ne: the number of validators who
9 5 5 1 9 5 5 1 9 5 5 1
Figure 4. Multi-rater Validation Test Results.
Figure 5. Expert Measurement Report.
After converting all items to a four- language that adheres to the rules of the
tier closed-ended format, expert Indonesian language; 5) Language used
validation is conducted with input from is accessible and understandable for
five validators. The validators assessing students; 6) Answer choices and reasons
the four-tier closed-ended test include exhibit homogeneity and logical
three physics education lecturers, a alignment with the material; 7) There is
teacher, and a researcher in the same only one correct answer key; 8)
field. The evaluation conducted by the Questions do not provide hints or clues to
validators encompasses nine aspects: 1) the correct answer; 9) Answer choices do
Items are crafted based on not include statements like "all answers
misconceptions; 2) Consistency of the are correct" or "answers are wrong." The
concepts in the questions with those results of the validation analysis utilizing
advanced by the experts; 3) Items are CVR are presented in Table 5.
designed to assess the understanding of As per Table 5, all items exhibit an
students' concepts; 4) Utilization of average CVR value greater than or equal
to 0.736. Given that the smallest CVR assessment aspect indicates that fulfilling
value with five validators is 0.736, it can an item in that aspect is easier.
be concluded that the questions are Conversely, a higher logit value suggests
considered valid and can be utilized greater difficulty for the assessment
(Wilson et al., 2012). The average CVR aspect to be fulfilled in an item, as
calculation results for each item can be evaluated by the validator. Assessment
interpreted to mean that all items have aspects with similar logit values share the
valid expert validation results. However, same level of difficulty.
when each item on each aspect is According to Figure 4, assessment
reviewed, the assessment reveals that the aspects 4 (using language that follows the
CVR values on items 3, 10, 11, and 15 rules of the Indonesian language) and 9
need to be improved. It needs to be (answer choices do not use statements; all
improved, according to assessment answers are correct or answers are
aspects 2, 3, and 8 in item 3. Meanwhile, wrong) are the easiest aspects of judging
items 10 and 11 only require refinement because all items satisfy this assessment
regarding judging questions that do not aspect according to the validators. While
provide hints of the correct answer. Item assessment aspect 6 (the answer choices
15 was revised in response to the and reasons are homogeneous and logical
comments on assessment aspect number in terms of material) is the most difficult
3. aspect of the assessment, most question
The multi-rater Rasch items do not meet it in the expert's
measurement was utilized to analyze the opinion. According to all expert opinions,
results of expert validation. Figure 1 I1 is the only item that satisfies all aspects
illustrates the outcomes of the multi-rater of the assessment.
analysis, featuring five columns. The first Figure 5 depicts the quality of an
column, known as the size column (logit expert panel's assessment, sorted by item
transformation), displays measurement severity. Expert D is the most consistent
results with values ranging from +2 (top) when considering statistical fit criteria
to -5 (bottom), representing logit values. (Boone et al., 2014). Expert D has Outfit
The second column delineates the MNSQ and Outfit ZSTD values in the
distribution of logit values, spanning statistical suitability ranges of 0.5-1.5
from less than logit -1 (I11) to greater (Outfit MNSQ) and -2 to +2 (Outfit
than logit +2 (I1). The logit value of 0 ZSTD), respectively (Outfit ZSTD).
serves as the minimum criterion for item Experts A and B are the worst because
quality, with experts considering values they have the lowest infit value. The
above this threshold as indicative of reliability between raters is sufficient
good-quality items and values below as (0.67), indicating that the experts give
representing items of lesser quality. quite different scores, but some are the
Figure 4 shows eight items same (Koçak, 2020). The rater's tendency
considered unfavorable by experts: item influences the reliability value (Bond &
numbers I2, I7, I9, I10, I12, I15, I11, and Fox, 2013). The obtained data aligns with
I3. Meanwhile, the expert deemed seven the measurement model, and this
items qualified, including I14, I13, I8, I4, alignment is corroborated by the Chi-
I5, I6, and I1. The third column in Figure square test value (p < 0.01). The
4 illustrates information about the agreement in assessment by the five
difficulty level of the assessment aspects. experts (inter-rater agreement) stands at
This column displays the distribution of 86.3%, signifying minimal divergence in
the assessment aspects. According to the evaluating all items among the five
experts, a lower logit value for an experts. Existing studies consistently
indicate variations in raters' judgment the categories created. The second

tendencies, with rater behaviors such as analysis analyzes the closed-ended four-
leniency and severity influencing rater tier instrument based on score
reliability (Brookhart et al., 2006; conceptions. The third analysis includes a
Darmana et al., 2021; Güler, 2014). detailed description of conception and
misconception and a comparison using
Disseminate Rasch analysis.
The disseminated stage is a Based on the results, the conception
concrete stage for putting the instrument score of all students on each item can be
for utilization. The results are then determined. The conception score per
examined in three stages of analysis. The maximum conception value shows the
first analysis determines the percentage percentage value of each conception.
of each conception category derived from Figure 6 depicts the proportion of
student scores. Thus, students' conception categories for each item.
conceptions can be distributed based on
70,00%
60,00%
SU
50,00%
Percentage
PP
40,00%
PN
30,00%
MC
20,00% NU
10,00% NC
0,00%
I1 I2 I3 I4 I5 I6 I7 I8 I9 I10 I11 I12 I13 I14 I15
Item Number
Figure 6. The Conception Categories for Each Item.
The results of the conception is item number 6, accounting for 9.33%,

category on each item indicate that the while the lowest is item number 3,
highest sound understanding category is registering at 0.00%. Notably, all items
item number 3 (60.00%), and the lowest exhibit a "no coding" category of 0.00%,
is item numbers 2, 5, 10, and 15 (0.00%). indicating that all students responded to
The highest partial positive category is all tiers for each item.
item 9 (9.33%), and the lowest is items 1, The analysis of the four-tier closed-
4, 5, and 8 (0.00%). The highest partial ended format involved the application of
negative category is number 2 (65.33%), the Rasch Model. This analysis aimed to
and the lowest is item 3 (34.67%). The ascertain the validity, reliability, and
highest misconception category is item difficulty level of four-tier closed-ended
number 15 (54.67%), and the lowest is optical instruments. The outcomes
number 3 (4.00%). The category with the related to instrument validity are
highest incidence of "no understanding" presented in Table 6.
Table 6. OUTFIT MNSQ, OUTFIT ZSTD, and PT MEASURE CORR for Each Item of the Four-tier Close-
ended Test.
Outfit
Item Number PT Measure Corr
MNSQ ZSTD
I1 1.46 1.57 0.29
I2 0.62 -1.42 0.40
I3 1.15 0.58 0.51
I4 0.89 -0.32 0.49
I5 0.94 -0.20 0.21
I6 1.39 1.37 0.49
I7 1.14 0.62 0.50
I8 1.42 1.51 0.62
I9 1.31 1.27 0.49
I10 0.66 -1.35 0.57
I11 1.03 0.20 0.48
I12 0.96 -0.07 0.61
I13 1.20 0.89 0.56
I14 1.34 1.30 0.53
I15 0.96 -0.08 0.51
Table 7. The Unidimensionality of Four-tier Closed-ended Test.

Raw variance explained by measures 43.0%
Unexplained variance in 1st contrast (Eigenvalue) 8.1%
Unexplained variance in 1st contrast (Observed) 2.1271
The results of the four-tier closed- content. Table 7 illustrates the impact of
ended test analysis presented in Table 6 unidimensionality, showing that the raw
reveal that items I1 and I5 do not meet the variance explained by measures is 43.0%,
criteria for PT MEASURE CORR. surpassing the 40% threshold. This result
However, they are retained because they indicates that the overall validity of the
satisfy the criteria for OUTFIT MNSQ four-tier closed-ended test falls into the
and OUTFIT ZSTD values. On the other "good" category (Sumintono &
hand, the remaining four-tier test items Widhiarsho, 2015). This good category
meet all the criteria for item suitability. shows that the four-tier closed-ended test
The examination of the instrument's has good validity in measuring students'
unidimensionality evaluates the validity misconceptions. Furthermore, the value
of the Rasch model on each item of each unexplained variance is less than
individually and as a whole. 15%. As a result, all four-tier closed-
Unidimensionality is a criterion to ended test items are valid and can be used
determine if the developed instrument in total without revision.
can effectively measure its intended
Table 8. The Value of Item Reliability, Person Reliability, and Cronbach Alpha.
No Item Value
1 Person Reliability 0.65
2 Item Reliability 0.05
3 Cronbach Alpha 0.78
The Rasch Model reliability test Table 8, item reliability is 0.95, which is
consists of Cronbach Alpha values, categorized as special. Still, personal
person reliability, and item reliability. reliability shows a value of 0.65 and is
The results of the reliability test utilizing categorized as weak. Cronbach Alpha has
Rasch are shown in Table 8. Based on the a value of 0.78, which can be categorized
results of the four-tiers test analysis in as good (Sumintono & Widhiarsho,
2015). The conclusion from the analysis I5 and I15. The 35F and 39F can only
of the four-tier test shows that the items answer items I6 and I10 and below. In
have very good quality, but the contrast, 64F students' abilities fall short
consistency of the answers given by of all question items. I3 is the item with
students is still weak. In addition, the the lowest level of difficulty. Students
interaction between students and the 04F, 20F, 63M, 67F, 72F, 14F, 03F, 61M,
questions is good. and 64F have abilities that fall below item
Students with conception scores of I3. As a result, the students struggle to
35F and 39F have the highest ability, answer item I3 questions correctly. In
while students with conception scores of contrast, I5 and I15 have the highest
64F have the lowest ability. Although the difficulty levels. There are no students
35F and 39F students appear to have the who are capable of answering questions
best abilities, they still fall short of items I5 and I15.
Figure 7. Wright Map Conceptions Score and Wright Map Misconceptions Score.
Based on student misconception The analysis revealed that item I3

scores, students 20F and 64F have the had the lowest difficulty in conception, and
highest misconception scores, while none of the participants responded with
students 45F have the lowest misconceptions. While item I5 presents the
misconception scores. According to Figure most difficult and most frequently
7, all students under I3 indicate that they answered questions about conception.
did not answer with misconceptions. In Students in 64F had the lowest conceptions
comparison, item I15 is the item most and the highest misconception scores.
commonly answered with misconceptions. Meanwhile, the highest conception value
for students of 35F and 39F was not the
student with the lowest misconception. S. R., Zulfikar, A., Sholihat, F. N.,
Participants whose lowest misconception Jubaedah, D. S., Setyadin, A. H.,
scores were 45F, below 35F, and 39F on Purwanto, M. G., Muhaimin, M. H.,
conception scores. The disparity among Bhakti, S. S., & Afif, N. F. (2019).
students is due to their self-confidence Diagnosis of student’s misconception
level. This shows that student self- on momentum and impulse trough
confidence influences student inquiry learning with computer
misconceptions. The more students who simulation (ILCS). Journal of
believe in mistaken concepts, the more Physics: Conference Series, 1204(1).
students will experience misconceptions. https://doi.org/10.1088/1742-
6596/1204/1/012073
CONCLUSION Aminudin, A. H., Adimayuda, R.,
There are four conclusions based on Kaniawati, I., Suhendi, E., Samsudin,
the data analysis and discussion results. A., & Coştu, B. (2019). Rasch
First, all items met the CVR scoring analysis of Multitier Open-ended
criteria, and items I2, I7, I9, I10, I12, I15, Light-Wave Instrument (MOLWI):
and I3 were corrected based on expert Developing and assessing second-
advice. Second, students have years sundanese-scholars alternative
misunderstandings about each item. Item conceptions. Journal for the
I5 (49.33 percent) and I15 (49.33 percent) Education of Gifted Young Scientists,
have the most misconceptions (54.67 7(3), 557–579.
percent). Third, all items are valid and https://doi.org/10.17478/jegys.57452
reliable, with a Cronbach Alpha value of 4
0.78 in the good category. The fourth Anggrayni, S., & Ermawati, F. U. (2019).
conclusion is that conception and The validity of Four-Tier’s
misconception are inversely related. The misconception diagnostic test for
fewer misconceptions, the better the work and energy concepts. Journal of
student's understanding, and vice versa. Physics: Conference Series, 1171(1).
However, misconceptions can also occur https://doi.org/10.1088/1742-
due to each student's confidence level. 6596/1171/1/012037
Students with misconceptions about Bautista, N. U., & Boone, W. J. (2015).
optical instrument materials should be Exploring the impact of TeachMETM
given appropriate treatment, such as Lab Virtual Classroom Teaching
appropriate learning. The developed four- Simulation on early childhood
level closed test is expected to be used and education majors’ self-efficacy
improved into a better five-level test to beliefs. Journal of Science Teacher
investigate the causes of each student's Education, 26(3), 237–262.
misconceptions. https://doi.org/10.1007/s10972-014-
9418-8
REFERENCES Bond, T. G., & Fox, C. M. (2013).
Afif, N. F., Nugraha, M. G., & Samsudin, Applying the Rasch model
A. (2017). Developing energy and fundamental measurement in the
momentum conceptual survey human science. In Applying the Rasch
(EMCS) with four-tier diagnostic test Model. Routledge.
items. AIP Conference Proceedings, https://doi.org/10.4324/97814106145
1848. 75
https://doi.org/10.1063/1.4983966 Boone, W. J., Yale, M. S., & Staver, J. R.
Amalia, S. A., Suhendi, E., Kaniawati, I., (2014). Rasch analysis in the human
Samsudin, A., Fratiwi, N. J., Hidayat, sciences. In Rasch Analysis in the
Human Sciences. https://doi.org/10.1088/1742-

Brookhart, S. M., Walsh, J. M., & 6596/1819/1/012038
Zientarski, W. A. (2006). The Eckes, T. (2019). Quantitative data
dynamics of motivation and effort for analysis for language assessment
classroom assessments in middle volume i chapter Many-facet Rasch
school science and social studies. measurement Implications for rater-
Applied Measurement in Education, mediated language assessment (1st
19(2), 151–184. Editio). Routledge.
https://doi.org/10.1207/s15324818a Fariyani, Q., Rusilowati, A., & Sugianto,
me1902_5 S. (2017). Four-tier diagnostic test to
Caleon, I., & Subramaniam, R. (2010). identify misconceptions in
Development and application of a geometrical optics. Unnes Science
three-tier diagnostic test to assess Education Journal, 6(3), 1724–1729.
secondary students’ understanding of Greca, I. M., Seoane, E., & Arriassecq, I.
waves. International Journal of (2014). Epistemological issues
Science Education, 32(7), 939–961. concerning computer simulations in
https://doi.org/10.1080/09500690902 science and their implications for
890130 science education. Science and
Coetzee, A., & Imenda, S. N. (2012). Education, 23(4), 897–921.
Alternative conceptions held by first https://doi.org/10.1007/s11191-013-
year physics students at a South 9673-7
African university of technology Güler, N. (2014). Analysis of open-ended
concerning interference and statistics questions with Many Facet
diffraction of waves. Research in Rasch Model. Eurasian Journal of
Higher Education Journal, 16(July), Educational Research, 55, 73–90.
13. https://doi.org/10.14689/ejer.2014.55
https://www.aabri.com/manuscripts/ .5
121097.pdf Gurel, D. K., Eryilmaz, A., & McDermott,
Coştu, B. (2008). Learning science L. C. (2015). A review and
through the PDEODE teaching comparison of diagnostic instruments
strategy: Helping students make to identify students’ misconceptions
sense of everyday situations. Eurasia in science. Eurasia Journal of
Journal of Mathematics, Science and Mathematics, Science and
Technology Education, 4(1), 3–9. Technology Education, 11(5), 989–
https://doi.org/10.12973/ejmste/7530 1008.
0 https://doi.org/10.12973/eurasia.201
Coştu, B., Ayas, A., & Niaz, M. (2012). 5.1369a
Investigating the effectiveness of a Hammer, D. (1996). More than
POE-based teaching activity on misconceptions: Multiple
students’ understanding of perspectives on student knowledge
condensation. Instructional Science, and reasoning, and an appropriate
40, 47–67. role for education research. American
Darmana, A., Sutiani, A., Nasution, H. A., Journal of Physics, 64(10), 1316–
Sylvia, N. S. A., Aminah, N., & 1325.
Utami, T. (2021). Analysis of multi https://doi.org/10.1119/1.18376
rater with facets on instruments Hermita, N., Suhandi, A., Syaodih, E.,
HOTS of solution chemistry based on Samsudin, A., Isjoni, Johan, H., Rosa,
Tawheed. Journal of Physics: F., Setyaningsih, R., Sapriadil, &
Conference Series, 1819(1). Safitri, D. (2017). Constructing and
implementing a four tier test about Education, 35(2), 238–260.

static electricity to diagnose pre- https://doi.org/10.1080/02635143.20
service elementary school teacher’ 17.1310094
misconceptions. Journal of Physics: Kaltakçi, D., & Didiç, N. (2007).
Conference Series, 895(1). Identification of pre-service physics
https://doi.org/10.1088/1742- teachers’ misconceptions on gravity
6596/895/1/012167 concept: A study with a 3-tier
Heyd-Metzuyanim, E., & Schwarz, B. B. misconception test. AIP Conference
(2017). Conceptual change within Proceedings, 899(April 2007), 499–
dyadic interactions: The dance of 500.
conceptual and material agency. https://doi.org/10.1063/1.2733255
Instructional Science, 45(5), 645– Kaniawati, I., Rahmadani, S., Fratiwi, N.
677. https://doi.org/10.1007/s11251- J., Suyana, I., Danawan, A.,
017-9419-z Samsudin, A., & Suhendi, E. (2020).
Husnah, I., Suhandi, A., & Samsudin, A. An analysis of students’
(2020). Analyzing K-11 students’ misconceptions about the
boiling conceptions with BFT-test implementation of active learning of
using Rasch Model: A case study in optics and photonics approach
the COVID-19 pandemic. Tadris: assisted by computer simulation.
Jurnal Keguruan Dan Ilmu Tarbiyah, International Journal of Emerging
5(2), 225–239. Technologies in Learning, 15(9), 76–
https://doi.org/10.24042/tadris.v5i2.6 93.
871 https://doi.org/10.3991/ijet.v15i09.12
Ismail, I. I., Samsudin, A., Suhendi, E., & 217
Kaniawati, I. (2015). Diagnostik Koçak, D. (2020). Investigation of rater
miskonsepsi melalui listrik dinamis tendencies and reliability in different
four tier test view project conceptual assessment methods with many facet
change and mental model in the Rasch model. International
physics conceptions view project. Electronic Journal of Elementary
Prosiding Simposium Nasional Education, 12(4), 349–358.
Inovasi dan Pembelajaran Sains Kocakulah, M. S., & Kural, M. (2010).
2015, June. Investigation of conceptual change
https://www.researchgate.net/publica about double-slit interference in
tion/301523361 secondary school physics.
Jauhariyah, M. N. R., Suprapto, N., International Journal of
Suliyanah, Admoko, S., Setyarsih, Environmental and Science
W., Harizah, Z., & Zulfa, I. (2018). Education, 5(4), 435–460.
The students’ misconceptions profile Kucukozer, M. (2010). Peasant rebellions
on chapter gas kinetic theory. Journal in the age of globalization: The EZLN
of Physics: Conference Series, in Mexico and the PKK in Turkey.
997(1). https://doi.org/10.1088/1742- City University of New York.
6596/997/1/012031 Liu, G., & Fang, N. (2016). Student
Kaltakci-Gurel, D., Eryilmaz, A., & misconceptions about force and
McDermott, L. C. (2017). acceleration in physics and
Development and application of a engineering mechanics education.
four-tier test to assess pre-service International Journal of Engineering
physics teachers’ misconceptions Education, 32(1), 19–29.
about geometrical optics. Research in Munawaroh, R., Setyarsih, W., Fisika, J.,
Science and Technological Matematika, F., Ilmu, D., & Alam, P.
(2016). Identifikasi miskonsepsi mengidentifikasi miskonsepsi siswa

siswa dan penyebabnya pada materi SMP pada konsep cahaya. In
alat optik menggunakan three-tier Prosiding Seminar Nasional
multiple choice diagnostic test. Pembelajaran Ipa Ke-2.
Jurnal Inovasi Pendidikan Fisika http://ipa.fmipa.um.ac.id/
(JIPF), 05(02), 79–81. Salmadhia, F., Rusnayati, H., & Liliawati,
http://etd.lib.metu.edu.tr/upload/1260 W. (2021). Five-tier geometrical
6 optics test feasibility to identify
Oberoi, M. (2017). Review of literature on misconception and the causes in high
student’s misconceptions in science. school students. Berkala Ilmiah
International Journal of Scientific Pendidikan Fisika, 9(2), 141.
Research and Education, 5(3), 6274– https://doi.org/10.20527/bipf.v9i2.88
6280. 74
Ohlsson, S., & Cosejo, D. G. (2014). What Shen, J., Liu, O. L., & Chang, H. Y.
can be learned from a laboratory (2017). Assessing students’ deep
model of conceptual change? conceptual understanding in physical
Descriptive findings and sciences: an example on sinking and
methodological issues. Science and floating. International Journal of
Education, 23(7), 1485–1504. Science and Mathematics Education,
https://doi.org/10.1007/s11191-013- 15(1), 57–70.
9658-6 https://doi.org/10.1007/s10763-015-
Pertiwi, C. A., & Setyarsih, W. (2015). 9680-z
Konsepsi siswa tentang pengaruh Stein, M., Larrabee, T. G., & Barman, C.
gaya pada gerak benda menggunakan R. (2008). A study of common beliefs
instrumen force concept inventory and misconceptions in physical
(FCI) Termodifikasi. Jurnal Inovasi science. Journal of Elementary
Pendidikan Fisika (JIPF), 04(02), Science Education, 20(2), 1–11.
162–168. https://doi.org/10.1007/bf03173666
Rohmanasari, F., & Ermawati, F. U. Sumintono, B., & Widhiarsho, W. (2015).
(2020). Using four-tier multiple Aplikasi pemodelan rasch pada
choice diagnostic test to identify assesment pendidikan. Trim
misconception profile of 12th grade Komunikata.
students in optical instrument Suprapto, N. (2020). Do we experience
concepts. Journal of Physics: misconceptions?: An ontological
Conference Series, 1491(1). review of misconceptions in science.
https://doi.org/10.1088/1742- Studies in Philosophy of Science and
6596/1491/1/012011 Education, 1(2), 50–55.
Rosita, I., Liliawati, W., & Samsudin, A. https://doi.org/10.46627/sipose.v1i2.
(2020). Pengembangan instrumen 24
five-tier newton's laws test (5TNLT) Thiagarajan, S. A. O. (1974). Instructional
untuk mengidentifikasi miskonsepsi development for training teachers of
dan penyebab miskonsepsi siswa. exceptional children: a sourcebook.
Jurnal Pendidikan Fisika Dan Indiana Univ., Bloomington. Center
Teknologi, 6(2), 297–306. for Innovation in Teaching the
https://doi.org/10.29303/jpft.v6i2.20 Handicapped.
18 Umar, F. A., Samsudin, A., Ramalis, T. R.,
Salamah, L. M., Yulianti, E., & Arif, H. Sa’diyah, L. H., Dalila, A. A., &
(2017). Pengembangan instrumen Komalasari, K. (2021). Uyo and nanu
diagnostik three-tier untuk misconception investigation
(UNAMI) on sound-light waves

materials in North Sulawesi. Journal
of Physics: Conference Series,
2098(1).
https://doi.org/10.1088/1742-
6596/2098/1/012027
Wilson, F. R., Pan, W., & Schumsky, D. A.
(2012). Recalculation of the critical
values for Lawshe’s content validity
ratio. Measurement and Evaluation in
Counseling and Development, 45(3),
197–210.
https://doi.org/10.1177/07481756124
40286
Zhou, S., Wang, Y., & Zhang, C. (2016).
Pre-service science teachers’ PCK:
Inconsistency of pre-service teachers’
predictions and student learning
difficulties in Newton’s third law.
Eurasia Journal of Mathematics,
Science and Technology Education,
12(3), 373–385.
https://doi.org/10.12973/eurasia.201
6.1203a

2023 12 Tadris Vol - 8 No - 2 Itsna

Uploaded by

Copyright:

Available Formats

You might also like

2023 12 Tadris Vol - 8 No - 2 Itsna

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2023 12 Tadris Vol - 8 No - 2 Itsna

Uploaded by

Copyright:

Available Formats

Tadris: Jurnal Keguruan dan Ilmu Tarbiyah 8 (2): 283-301 (2023)

Development and Implementation of a Four-Tier Close-Ended Test to

INTRODUCTION Suprapto, 2020). Drawing on observations

© 2023 URPI Faculty of Education and Teacher Training

Misconceptions can be influenced by period, articulate their thoughts in writing,

understanding conveyed by Coştu (2008), instruments. Accordingly, a four-tier

Figure 1. Research Chart of the Four-Tier Closed-Ended Questions on Optical Instruments.

dissemination stage, the finalized four-tier Data Analysis

The multi-rater Rasch measurement 𝑝𝑛𝑖𝑗𝑘−1 = Probability of examinee n receiving a

results of the classification of concept misconception score are shown in Table 2

The third analysis stage evaluates the CORR). Additionally, the

discussion of the stages of development magnifying glasses, microscopes, and

Design options: "sure" and "unsure." However,

Question 1.1 Question 1.1

Question 1.2 Question 1.2

Question 1.3 Question 1.3

Question 1.4 Question 1.4

The distribution of the question included in one sub-material because

Develop alternative conceptions of students in the

Figure 3. The Example of the Four-tier Close-ended Test on Optical Instruments.

In the four-tier closed-ended test's Subramaniam, 2010; Gurel et al., 2015;

provide valid; CVRi: CVR index; CVRa: CVR average.

Figure 4. Multi-rater Validation Test Results.

Figure 5. Expert Measurement Report.

indicate variations in raters' judgment the categories created. The second

Figure 6. The Conception Categories for Each Item.

The results of the conception is item number 6, accounting for 9.33%,

Table 7. The Unidimensionality of Four-tier Closed-ended Test.

Based on student misconception The analysis revealed that item I3

Human Sciences. https://doi.org/10.1088/1742-

implementing a four tier test about Education, 35(2), 238–260.

(2016). Identifikasi miskonsepsi mengidentifikasi miskonsepsi siswa

(UNAMI) on sound-light waves

You might also like