Criteria of Good Instruments: Validity & Reliability

CRITERIA OF GOOD INSTRUMENTS:
VALIDITY & RELIABILITY
This paper is made to fulfil assignment of:
“Quantitative Research Methodology”
The Lecturer:
Dr.Sri Wahyuni, M.Pd.
Arranged by :
1. Zasqia Wina Wulansari (932213918)

2. Zahra Zahtira Priyananda (932217018)
3. M. Novan Dianata (932217318)
ENGLISH DEPARTMENT
FACULTY OF EDUCATION AND TEACHERS TRAINING
STATE ISLAMIC INSTITUTE (IAIN) OF KEDIRI
2020
I. INTRODUCTION
A. Background of The Paper
In research methodology there are two most important and
fundamental features in the evaluation of any measurement
instrument for a good research. It is also known as reliability and
validity. These are appropriate concepts for introducing a
remarkable setting in research. Reliability is referred to the stability
of findings, whereas validity is represented the truthfulness of
findings. Without assessing reliability and validity of the research, it
will be difficult to describe for the effects of measurement errors on
theoretical relationships that are being measured. The purpose of this
research is to discuss the validity and reliability of measurement
instruments. By using various types of methods to collect data for
obtaining true information, a researcher can enhance the validity and
reliability to collect the data.
B. Purpose of the Paper

1. Knowing the definition of validity and reliability.
2. Knowing the concept of external and internal validity.
3. Knowing the types of validity and reliability.
4. Knowing the threats of validity and the way to minimize them.
II. DISCUSSION
1. Validity
Validity of research is an extent at which requirements of
scientific research method have been followed during the process of
generating research findings. Validity explains how well the
collected data covers the actual area of investigation. In quantitative
research validity is the extent to which any measuring instrument
measures what it is intended to measure. But, in qualitative research
it is when a researcher uses certain procedures to check for the
accuracy of the research findings. There are two essential parts of
validity, such as internal (credibility) and external (transferability).
Internal validity indicates whether the results of the study are

legitimate because of the way the groups were selected, data were
recorded or analyses were performed. External validity shows
whether the results given by the study are transferable to other
groups of interest. A researcher can increase external validity by:
- achieving representation of the population through strategies,

such as, random selection,
- using heterogeneous groups,
- using non-reactive measures, and
- using precise description to allow for study replication or
replicate study across different populations, settings, etc.
1.1 Types of Validity
a. Content Validity
It is the extent to which the questions on the instrument and
the scores from these questions represent all possible
questions that could be asked about the content or skill.
Content validity usually depends on the judgment of experts
in the field. For example, if we want to test knowledge on
Bangladeshi Geography it is not fair to have most questions
limited to the geography of Dhaka, the capital city of
Bangladesh.
b. Face Validity
Face validity refers to researchers’ subjective assessments
of the presentation and relevance of the measuring
instrument as to whether the items in the instrument appear
to be relevant, reasonable, unambiguous and clear.
c. Construct validity
Construct validity refers to how well you translated or
transformed a concept, idea, or behaviour that is a construct
into a functioning and operating reality, the
operationalization. Construct validity has two components,
convergent and discriminant validity:
- Discriminant Validity
It is established when, based on theory, two variables
are predicted to be uncorrelated. For example, surveys
that are used to identify potential high school drop-outs
would have discriminant validity if the students who
graduate score higher on the test than students who
leave before graduation.
- Convergent Validity
Convergent validity, a parameter often used in
sociology, psychology, and other behavioural sciences,
refers to the degree to which two measures of
constructs that theoretically should be related, are in
fact related.
d. Criterion validity
Criterion validity is an alternative perspective that de-
emphasizes the conceptual meaning or interpretation of test
scores. Test users might simply wish to use a test to
differentiate between groups of people or to make
predictions about future outcomes. For example, a hands-
on driving test has been shown to be an accurate test of
driving skills. The test can be repeated by the written test to
compare validity of the test. It can be established by:
i) The concurrent validity
Concurrent validity is a type of evidence that can be
gathered to defend the use of a test for
predicting other outcomes. Example, a new simple
test is to be used in place of an old troublesome one,
which is considered useful; measurements are
obtained on both at the same time.
ii) The predictive validity
It is often used in program evaluation studies, and is
very suitable for applied research. It indicates the
ability of the measuring instrument to differentiate
among individuals with reference to a future
criterion. For example, by administering
employment tests to job applicants and then seeing
if those test scores are correlated with the future job
performance of the hired employees.
1.2 Threats to Validity
a. Threats to internal validity
Internal validity threats are experimental procedures,
treatments, or experiences of the participants that threaten the
researcher’s ability to draw correct inferences from the data
about the population in an experiment. Types of threats to
internal validity :
- History or Intervening Events
The threat of “history” or “intervening events” applies to
all time-series studies. That is, many events (included in
Z) besides changes in the program that is being studied
(X) could have happened during the period of the study. It
is always possible that the observed change in Y may be
due to the intervening event (Z) and not to change in the
program being studied (X).
- Maturation or Secular Change or Trends
In time-series designs, besides the intervention due to the
program under study (X), outputs or outcomes (Y) may be
systematically affected by long-term, underlying trends
(Z), which can also be characterized as maturation or
secular change. We may observe a change in Y before and
after a program change, but the change may be due to a
long-term underlying trend, not to the program.
- Testing
“Testing” refers to instances in which the method of
measuring the outcome Y can affect what is observed. For
example, collecting administrative data about my age,
years of employment, place of work, rank, and salary will
not affect my behavior, especially if I do not know about
it (more about this issue later). Testing is a threat to
internal validity in both CS and TS designs.
- Instrumentation
In TS or CS, change in the calibration of the measurement
procedure or instrument (Z) may partly or entirely cause
the outcome (Y) to change, rather than the treatment (X).
- Regression
Participants with extreme scores are selected for the
experiment. Naturally, their scores will probably change
during the experiment. Scores, over time, regress toward
the mean.
- Selection
Participants can be selected who have certain
characteristics that predispose them to have certain
outcomes.
- Mortality
Participants drop out during an experiment due to many
possible reasons. The outcomes are thus unknown for
these individuals.
- Diffusion of treatment
Participants in the control and experimental groups
communicate with each other. This communication can
influence how both groups score on the outcomes.
- Compensatory/Resenftul Demoralization
The benefits of an experiment may be unequal or resented
when only the experimental group receives the treatment.
- Compensatory Rivalry
Participants in the control group feel that they are being
devalued, as compared to the experimental group, because
they do not experience the treatment.
b. The threats external validity
External validity threats arise when experimenters draw
incorrect inferences from the sample data to other persons,
other settings, and past or future situations. For example,
threats to external validity arise when the researcher
generalizes beyond the groups in the experiment to other
racial or social groups not under study, to settings not
examined, or to past or future situations. Types of Threats to
External Validity :
- Interaction of selection and treatment
Because of the narrow characteristics of participants in the
experiment, the researcher cannot generalize to individuals
who do not have the characteristics of participants
- Interaction of setting and treatment
Because of the characteristics of the setting of participants
in an experiment, a researcher cannot generalize to
individuals in other settings.
- Interaction of history and treatment
Because results of an experiment are time-bound, a
researcher cannot generalize the results to past or future
situations.
1.3 The ways to minimize validity
a. Normality, scores in each population are normally
distributed.
b. Homogeneity of variance, the variances of the distributions
from each population are equal.
c. Independence of observations, observations are all
independent of one another (Hinkle, Wiersma, & Jurs,
2003; Howell, 2004).
2. Reliability
2.1 Definition Reliability
In quantitative research, reliability refers to the consistency,
stability and repeatability of results, that is, the result of a
researcher is considered reliable if consistent results have been
obtained in identical situations but different circumstances.
But, in qualitative research it is referred to as when a
researcher’s approach is consistent across different researchers
and different projects.
2.2 Types of Reliability
Reliability is mainly divided into two types as stability and
internal consistency reliability. Stability is defined as the ability
of a measure to remain the same over time despite uncontrolled
testing conditions or respondent themselves. A perfectly stable
measure will produce exactly the same scores time after time.
Two methods to test stability are: i) test-retest reliability, and
ii) parallel-form reliability. Then Internal consistency
reliability is a measure of reliability used to evaluate the degree
to which different test items that probe the same construct
produce similar results. It can be established in one testing
situation, thus it avoids many of the problems associated with
repeated testing found in other reliability estimates. Types of
reliability:
a. Test-retest reliability
The test-retest reliability indicates score variation that
occurs from testing session to testing session as a result of
errors of measurement. For example, employees of a
Company may be asked to complete the same questionnaire
about employee job satisfaction two times with an interval
of three months, so that test results can be compared to
assess stability of scores. The correlation coefficient
calculated between two set of data, and if it found to be
high, the test-retest reliability is better.
b. Parallel-forms reliability
It is a measure of reliability obtained by administering
different versions of an assessment tool to the same group
of individuals. For example, the levels of employee
satisfaction of a Company may be assessed with
questionnaires, in-depth interviews and focus groups, and
the results are highly correlated. Then we may be sure of
the measures that they are reasonably reliable.
c. Inter-rater reliability
It is the extent to which the way information being collected
is being collected in a consistent manner. Reliability is
determined by the correlation of the scores from two or
more independent raters, or the coefficient of agreement of
the judgments of the raters. It is useful because human
observers will not necessarily interpret answers the same
way; raters may disagree as to how well certain responses
or material demonstrate knowledge of the construct or skill
being assessed.
d. Split-half reliability
It measures the degree of internal consistency by checking
one half of the results of a set of scaled items against the
other half. It requires only one administration, especially
appropriate when the test is very long. It is done by
comparing the results of one half of a test with the results
from the other half.
III. CONCLUSION
In this paper, the writer explain about validity and reliability as a
significant research instrument tool were reviewed. To perform a good
research validity and reliability tests are needed to take very carefully.
As discussed, there are four main validity test of the questionnaire
namely; face validity, content validity, construct validity and criterion
validity. We have also included the threat to reliability andvalidity when
a researcher tries to do a good research.
REFERENCES
Langbein, Laura and Claire L. Felbinger. Public Program Evaluation A Statistical

Guide. 2006 London : M.E Sharpe
Cresswell, John W. Research Design : Qualitative, Quantitative, and Mixed
Methods Approaches. 2014. Singapore : Sage
Mohajan, Haradhan. 2017. Two Criteria for Good Measurements in Research:

Validity and Reliability. MPRA. Premier University, Chittagong,
Bangladesh.
Taherdoost, Hamed. 2016. Validity and Reliability of the Research Instrument;

How to Test the Validation of a Questionnaire/Survey in a Research.
International Journal of Academic Research in Management (IJARM).
Vol. 5, No. 3, 2016.

Criteria of Good Instruments: Validity & Reliability

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Criteria of Good Instruments: Validity & Reliability

Uploaded by

Copyright:

Available Formats

CRITERIA OF GOOD INSTRUMENTS:

VALIDITY & RELIABILITY

This paper is made to fulfil assignment of:

“Quantitative Research Methodology”

Dr.Sri Wahyuni, M.Pd.

1. Zasqia Wina Wulansari (932213918)

FACULTY OF EDUCATION AND TEACHERS TRAINING

STATE ISLAMIC INSTITUTE (IAIN) OF KEDIRI

B. Purpose of the Paper

Internal validity indicates whether the results of the study are

- achieving representation of the population through strategies,

Langbein, Laura and Claire L. Felbinger. Public Program Evaluation A Statistical

Mohajan, Haradhan. 2017. Two Criteria for Good Measurements in Research:

Taherdoost, Hamed. 2016. Validity and Reliability of the Research Instrument;

You might also like