Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 31

OBJEKTIF

1.Dapat menjelaskan maksud kesahan dan


keboleh percayaan sesuatu alat
pengukuran penyelidikan
2.Dapat menghuraikan jenis-jenis kesahan
dan keboleh percayaan alat pengukuran
yang digunakan dalam penyelidikan
KESAHAN
(VALIDITY)

KEBOLEHPERCAYAAN
(RELIABILITY)
Validity refers to the degree in
which our test or other measuring
device is truly measuring what
we intended it to measure.

Sejauh mana alat


mengukur apa yang ia
sepatutnya ukur
Kesahan bermaksud kebolehan ujian mengukur apa
yang sepatutnya diukur, Youngman & Eggleston,
1982; Sax & Newton, 1997)
Kesahan sesuatu alat pengukuran merujuk kepada
sejauh manakah alat yang digunakan mengukur data
yang dikehendaki untuk mencapai objektif kajian
(Mohd Majid Konting, 1990)
Based on Internal Structure
Construct
(determination of the
significance, meaning,
Kesahan Gagasan purpose, and use of the
scores)

Based on Relations
to Other Variables

Based on content
Kesahan Kriteria
Criterion-referenced (scores are a Kesahan Kandungan
predictor of an outcome or criterion Content (representative of
they are expected to predict) all possible questions that
could be asked)
Concurrent Predictive Content validation is usually carried
Evidence Evidence out by experts
Kesahan Kandungan
(Content Validity)

 Sejauh mana alat merangkumi kandungan sesuatu


bidang.
 Matlamat utama ialah untuk memastikan semua isi
dan kandungan bidang yang diukur menggambarkan
bidang tersebut.
 Berdasarkan kepada skop dan objektif dan
kandungan sesuatu bidang yang dikaji.
 Pendapat pakar atau penilai luar diperlukan bagi
menilai kesesuaian butiran bagi domain yang dipilih.
…is concerned with a test’s ability to include or represent
all of the content of a particular construct. The
question “1 + 1 = ___” may be a valid basic addition
question. Would it represent all of the content that
makes up the study of mathematics? It may be
included on a scale of intelligence, but does it
represent all of intelligence? The answer to these
questions is obviously no. To develop a valid test of
intelligence, not only must there be questions on math,
but also questions on verbal reasoning, analytical
ability, and every other aspect of the construct we call
intelligence. There is no easy way to determine
content validity aside from expert opinion.
1. Do the items appear to represent the thing
you are trying to measure?
2. Does the set of items underrepresented
the construct’s content (i.e., have you
excluded any important content areas or
topics?)
3. Do any of the items represent something
other than what you are trying to measure
(i.e., have you included any irrelevant
items?)
Sebelum sesuatu instrumen itu dikatakan
mempunyai kesahan kandungan, lima syarat ini
perlu dipenuhi:

1. Bidang kandungan mestilah dinyatakan dalam bentuk


tingkah laku yang secara umum diterima maknanya.
2. Bidang mestilah dihuraikan dengan jelas.
3. Bidang mestilah relevan dengan tujuan penggunaan
ujian.
4. Hakim-hakim yang berkelayakan mestilah bersetuju
bahawa bidang telah disampel secara mencukupi.
Evidence Based on Internal
Structure

 To measure several components or dimensions of a


construct.
 Use Factor Analysis to analyzes correlations among
test items and tells you the number of factors
present. Its tell you whether the test is
unidimensional or multidimensional.
 Unidimensional – all the item measure are single
construct.
 Multidimensional – different set of item tap
different construct or different component of a
broader construct.
…… Internal Structure

 Factor analysis tell you how many dimensions or


factors your test items represent.
 Also can obtain a measure of test homogeneity
(i.e., the degree to which the different items
measure the same construct or trait)
 Use coefficient alpha (Alpha Cronbach) for the test
of homogeneity.
 If the alpha is low (e.g., <.70) for the test, then
some items might be measuring different
constructs or some items might be bad.
 Examine the items that are contributing to your
low coefficient alpha and consider eliminating or
revising them.
Kesahan Kriteria
(Criterion Validity)

 Obtained by relating your test scores to a relevant criterion.


 A criterion is the standard or benchmark that you want to
predict accurately on the basis of scores from your test.
 Sejauh mana kaitan antara alat dengan kriteria luaran yang
berkecuali (sama ada item mengukur kriteria yang hendak
diukur).
 Ditentukan dengan analisis korelasi antara dua set markah.
 Calculate correlation coefficients for the study of validity –
validity coefficients.
Concurrent Validity refers to a measurement device’s ability to vary
directly with a measure of the same construct or indirectly with a
measure of an opposite construct. It allows you to show that your
test is valid by comparing it with an already valid test. Administering
the focal test and criterion test at approximately the same point in
time (i.e., concurrently) and then correlating the two set of scores. If
the two sets of scores highly correlated, you have concurrent
evidence.

e.g.

A new test of adult intelligence, for example, would have


concurrent validity if it had a high positive correlation with the
Wechsler Adult Intelligence Scale since the Wechsler is an
accepted measure of the construct we call intelligence. An
obvious concern relates to the validity of the test against
which you are comparing your test. Some assumptions must
be made because there are many who argue the Wechsler
scales, for example, are not good measures of intelligence.
• Obtain predictive evidence of validity by measuring your
participants at one point in time on your test and then, at a
future time, measuring them on the criterion measure.
• Take more time and effort than concurrent evidence, but it
can provide superior evidence that your test does what
you want it to do.

In order for a test to be a valid screening device for some


future behavior, it must have predictive validity. The SAT is
used by college screening committees as one way to predict
college grades. The GMAT is used to predict success in
business school. And the LSAT is used as a means to predict
law school performance. The main concern with these, and
many other predictive measures is predictive validity
because without it, they would be worthless
Reliability is synonymous with the consistency of a test, survey,
observation, or other measuring device. Imagine stepping on your
bathroom scale and weighing 140 pounds only to find that your weight on
the same scale changes to 180 pounds an hour later and 100 pounds an
hour after that. Base on the inconsistency of this scale, any research
relying on it would certainly be unreliable. Consider an important study on
a new diet program that relies on your inconsistent or unreliable bathroom
scale as the main way to collect information regarding weight change.
Would you consider their results accurate?
 Sejauh mana instrumen mengukur dengan tekal apa
yang hendak diukur.
 Scores from measuring variables that are stable and
consistent
Test-retest
Reliability

Internal Equivalent
Consistency Forms
Reliability Reliability
Merujuk kepada ketekalan atau stabiliti markah
ujian jika dilakukan pada masa yang berbeza.

Contoh:
Ujian diberikan kepada 100 individu untuk satu masa dan diulangi
pada masa berlainan. Dua set markah ini dikorelasikan. Sekiranya
individu memperoleh markah tertinggi dalam ujian 1 juga
memperolehi markah tertinggi dalam ujian 2, begitu juga individu
yang mendapat markah terendah dalam ujian 1 juga mendapat
markah terendah dalam ujian, maka dikatakan mempunyai korelasi
yang tinggi. Oleh itu soalan ujian tersebut mempunyai
kebolehpercayaan yang tinggi.
 Refers to the consistency of a group of individual’s scores on two
equivalent forms of a test designed to measure the same
characteristic.
 Menggunakan satu alat yang dibina dan satu lagi yang piawai.
 Ditadbir ke atas subjek yang sama dan pada masa yang sama atau
masa yang lain.
 Equivalent form means that two tests are constructed so that
they are identical in every way except for the specific items
asked on the test.
 This means that they have the same number of items, the items
are the same difficulty level, the item measure the same
construct, and the test is administered, scored, and interpreted
in the same way.
 The two set of scores are than correlated. If this reliability
coefficient to be very high and positive, that is the individuals
who do well on the first form of the test should also do well on
the second form, and individuals who performed poorly on the
first form of the test should perform poorly on the second test.
 Internal consistency refers to how consistently the items on a
test measure a single construct or concept.
 The test-retest methods of assessing reliability are general
methods that can be used with just about any test.
 Internal consistency measures are convenient and are very
popular with researchers because they require one group of
individuals to take the test one time.
 Two indexes of internal consistency:
o Split half reliability
o Coefficient alpha
Split-half reliability

• Splitting a test into two equivalent halves and then


assessing the consistency of the scores across the two
halves of the test.
• Divide the test into halves and correlate the scores
from the two halves.
• Compute the correlation between scores on the two
halves of the test using Spearman-Brown formula.
• The low correlation indicates that the test was
unreliable, a high correlation indicates that the test was
reliable.
Coefficient alpha

• Lee Cronbach 1951) developed coefficient alpha.. Alpha


Cronbach
• Coefficient alpha tells you the degree to which the items
are interrelated.
Rule of thumb:
• At a minimum, greater than or equal to .07 for research
purposes and somewhat greater than that value (e.g. ≥
.09) for clinical testing purposes.
 Pernyataan item mestilah jelas dan tepat.
 Arahan mestilah jelas dan ringkas.
 Item hendaklah bentuk sejenis.
 Situasi dan masa pengukuran hendaklah piawai,
serupa dan terkawal.
 Elakkan gangguan ke atas subjek.
 Elakkan kebimbangan subjek dengan memberi
jaminan keselamatan dan kerahsiaan ke atas
maklumat yang diberi.
Fasa terakhir tinjauan
sebelum pengumpulan
data bermula.

Matlamatnya adalah untuk


mencari masalah dalam soal
selidik, termasuk soalan yang
lemah, arahan yang tidak lengkap
dan item yang sukar dijawab.
Tidak boleh gunakan
kumpulan fokus
sebenar.

Jumlah responden tidak


Untuk kajian ditentukan dengan tepat,
baharu, dicadangkan sekurang-
lakukan dua kurangnya 25 orang, lebih baik
kali ujian rintis. antara 50 – 75 orang.
Develop
standard written
procedures for
Obtain
administering an
Train permission to
instrument
researchers to collect and
collect use public
observational documents
data

Respect individuals and sites


during data gathering (ethics)
Institutional or
organizational
(e.g., school
district)

Parents of
participants who are Campus approval (e.g.,
not considered adults university or college) and
Institutional Review
Board (IRB)

You might also like