Topic 8F Validity Reliability and Sources of Error

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 24

Validity, Reliability, and

Sources of Error
Finals
Validity
Validity means the degree to which a test or measuring
instrument measures what it intends to measure. The response
should have a veracity or truthfulness of results or responses.
As an example, the test item is “Who is the goddess of beauty?” Of the
100 students in Methodology, 90 or 90 percent answered“Venus” as
the goddess of beauty. Hence, 90% of the responses is valid because
the correct answer is“Venus”. Another example, the test item is “How
many grams are there in 5 kilograms?” Of the 50 students whotook the
test in mathematics, 50 or 100% answered “5,000 grams,” hence, the
test is valid.
Types of Validity
• Content Validity
• refers to the extent to which the content or topic of the test is truly
representative of the content of the course. It involves, essentially the
systematic examination of the test content to determine whether it covers a
representative sample of the behavior domain. The content validity
depends on the relevance of the individual’s responses to the behavior
area under consideration rather on the apparent relevance of thecontent.
• This type of validity is commonly used in evaluating achievements test
Example
A researcher wishes to validate a questionnaire in Science. He requests experts in
Science to judge ifthe items measure the knowledge, skills, and values supposed to
be measured.

Below is an example of open-ended questionnaire on the teaching strategies used in


teaching science.For content validity, the experts are directed to check [] the items
to be retained as (3) retain; (2) needsimprovements; and (1) delete.

Directions: Below are teaching strategies used in the teaching of Science. Indicate
the extent to which eachstrategy is used in teaching Science in your school by
encircling one of the options on the right column.
The options 4,3,2, and 1 represent the extent of use, thus:4 – very often 2 –
sometimes3 – often 1 – never
• The researcher requires a selected group of experts to validate the
content of the questionnaire on thebasis of the foregoing questions. If
the weight mean is 2.5 and above, the item is retained; 1.5 to 2.4,
needsimprovement; and 1.4 and below, delete.
• For example, there are five experts to validate the above
questionnaire. In Item 1, “Curriculumenrichment,” 4 experts rated
3 and 1, rated 2. Weighted mean is used in this example. The
computation is asfollows: retain
• Thus, the Item 1 “Curriculum enrichment” should be retained
• Concurrent validity
• refer to the degree to which the test agrees or correlates with a criterion set
up as an acceptablemeasure. The criterion is always available at the time of
testing and it is applicable to test employed for thediagnosis of existing status
rather than for prediction of future outcome.
• Example: A researcher wishes to validate a Mathematics achievement test he
has constructed. He administeredthis test to a group of Mathematics
students. The result of this test is correlated with an acceptable
Mathematics test which has been previously proven as valid. If the
correlation is” high,” the Mathematics testhe has constructed is valid
• Predictive validity
• According to Aquino et al. 1974), this validity is determined by showing how
well predictions made fromthe test are confirmed by evidence gathered at
some subsequent time. The criterion measured against thistype of validation
is so important for the reason that the outcome of the subjects is predicted.
• Example: Suppose the researcher wants to estimate how well a high school
student may be able to do in collegecourse on the basis of how well he has
done on tests he took in high school subjects. The criterion measuredagainst
which the test scores are validated and obtained are available and obtained
are available after a longperiod of interval.
• Construct validity
• The construct validity of a test is the extent to which the test measures a
theoretical construct or trait.This involves such test as understanding,
appreciation, and interpretation of data.
• Example: Suppose a researcher wishes to establish the validity of an IQ
( Intelligence Quotient) using SCRIT(Safran Cultures-Reduced Intelligence
Test). He hypothesized that pupils with high IQ also have
highachievement and low IQ, low achievement. He, therefore, administers
both SCRIT and achievement test to two groups of pupils with high and low
IQ, respectively. If the results show that those with high IQ, have high
scoresin the achievement tests, the test is valid.
Reliability
• The word reliability means the extent to which a research instrument
is dependable, consistent, andstable (Meriam, 1975). It means that
the test agrees with itself. If a person takes the same test twice, the
testyields the same results. However, a reliable test may not always
be valid.

• Example: A test item is “Who is the goddess of love?” of the 70 students in


Mythology, 70 or 100 % answered“Venus” as goddess of love. In statistical
sense, the responses are reliable for it is consistent, but not valid forthere is n
o veracity or truthfulness of the answer because the goddess of love is
“Diana.” Hence, it is reliablebut not valid. Likewise, a reliable test or research
instrument is not always valid if it may be reliable
Methods in Testing the Reliability of a Good
Research Instrument
• Test -retest method
• The same instrument is administered twice to the same group of subjects and
the correlation coefficientis determined. The disadvantage of this method are
the: (a) when the time interval is short, the respondentsmay recall their
previous responses and this tends to make the correlation coefficient high; (b)
when the timeinterval is long, such factors as forgetting, unlearning,
among other, may occur and may result in lowcorrelation of the test;
and (c) regardless of the time interval separating the two administrations,
other varyingenvironmental conditions such as temperature, lighting, noise,
and other factors may affect the correlationcoefficient of the research
instrument
A Spearman rank correlation coefficient or Spearman rho is the
statistical tool used to measure the relationship between paired ranks
assigned to individual scores on two variables of test-retest methods.
• Parallel-forms method
• Parallel or equivalent forms of a test may be administered to the group of
subjects, and the pairedobservations correlated. “In estimating reliability by
the administration of parallel or equivalent forms of a test,criteria parallelism
is required.” (Ferguson et al., 1989). The two forms of the test must be
constructed so thatthe content, type of item, difficulty, instruction for
administration, and many others are similar but not identical.
Example: In the conversion process, “convert 5 kilometers to metrs” in
Form A is parallel or equal to “convert5,000 meters to kilometers” in
Form B. Moreover, these two forms should have approximately the
same meanand variability of scores.
The correlation between the scores obtained on paired observations of
these two forms represents thereliability coefficient of the test. If the
coefficient correlation (r) value obtained is high, hence, the
researchinstrument is reliable.
• Split-half method
• This method may be administered once, but the test items are divided into
two halves. The commonprocedure is to divide a test into odd and even
items. The two halves of the test must be similar but notidentical in content,
number of items, difficulty, means, and standard deviations. Each students
obtains twoscores, one on the odd and another one on the even items in the
same test. The scores obtained in the twohalves are correlated. The result is
reliability coefficient for a half test. The reliability coefficient of the wholetest
is estimated by using the Spearman-Brown formula.
• Internal-consistency method
• This method is used with psychological tests which consists of
dichotomously scored items. Theexaminee either passes or fails in an item.
A score of 1 (one) is assigned for a pass and 0 (zero) for failure.The method of
obtaining reliability coefficient in this method is determined by Kuder-
Richardson Formula 20. This formula is a measure of internal consistency or
homogeneity of the research instrument.
Usability
• Usability means the degree to which the research instrument can be
satisfactorily used by the teachers,researchers, and school managers
without undue expenditure of time, money, and effort. In other
words,usability means practicability.
Factors to determine usability
• Ease of administration
• To facilitate the administration of a research instrument, instruction should be
complete and precise.As a rule, group test is easier to administer than
individual tests. The former is easier to administer becausedirections are
given only once and the instrument is simultaneously administered to a group
of students, thus,saving time and energy on the part of the examiner or
researcher.
• Ease of scoring
• Ease of scoring a research instrument depends upon the following aspects;
• Construction of the test in the objective type;
• Answer keys are adequately prepared; and
• Scoring directions are fully understood.
• In can be said that scoring is easier when all subjects are instructed to write
their responses in onecolumn in numerical form or letter form and with
separate answer sheets for their responses.
• Ease of interpretation and application
• The results of tests are easy to interpret and apply if table is provided. All
scores must be givenmeaning from the tables of norms without the necessity
of computation. As a rule, norms should be based bothon age and year level,
as in the case of school achievement tests. It is also desirable if all
achievement testshould be provided with separate norms of rural and urban
subjects as well as for learners of various degreesof mental ability
• Low cost
• It is more practical if the test is low cost material-wise. It is more economical
also if the researchinstrument is of low cost and can be reused by future
researchers.

• Proper mechanical make-up


• A good research instrument should be printed clearly in an appropriate size
for the grade or year levelfor which the instrument is intended. For example,
if the research instrument is test, the font size of Grade one to three is
biggest, i.e., font size of 18; Grades four to six, font size of 16; Secondary
students, font size of 14;and Collage students, font size of 12
Sources of Error
• Participant Error
• Error associated with the participant includes many factors, including mood,
motivation,fatigue, health, fluctuation in memory and performance, previous
practice, specific knowledge, and familiaritywith the test items.

• Testing Error
• The testing error is related to how clear and complete the directions are, how
rigidly the instructions are followed, and whether supplementary directions
or motivation is applied.
• Scoring Error
• Errors in scoring relates to the competence, experience, and dedication of the
scorers and to the nature of the scoring itself. The extent to which the scorer
is familiar with the behavior being tested and the test items can greatly affect
scoring accuracy. Carelessness and in attention to detail produce

• Measurement Error
• Measurement error is a source of error because of instrumentation which
includes such obvious causes as inaccuracy and lack of calibration of
mechanical and electronic equipment. It is also refers to the inadequacy of a
test to discriminate between abilities and to the difficulty of scoring some
tests.
End of Slide.

You might also like