Questionnaire Reliability Validity

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 29

Questionnaire

Reliability & Validity


Questionnaire

A questionnaire is a formalized set of questions for


obtaining information from respondents.
A questionnaire must uplift, motivate, and encourage the
respondent to become involved in the interview, to
cooperate, and to complete the interview.
A questionnaire should minimize response error.
Random and Systematic Error
Random Error
1) fluctuations in the person’s current mood.
2) misreading or misunderstanding the questions
3) measurement of the individuals on different days or in different
places.

These error may cancel out as you collect many samples

Systematic Error
Sources of error including the style of measurement, tendency
toward self-promotion, cooperative reporting, and other
conceptual variables are being measured.
So, we have to reduce these errors to prove scientific findings

How well do our measured variables “capture” the conceptual


variables?

Reliability
The extent to which the variables
are free from random error, *
usually *
**
determined by measuring the
variables more than once. CVs * *
*
Construct Validity *

The extent to which a measured


* **
variable actually measures the *
conceptual variables that is
design *
CVs
**
to assess the extent to which it is **
known to reflect the conceptual
variable other measured
variables
Reliability
Definition
The degree of stability exhibited when a measurement is
repeated under identical conditions.
Lack of reliability may arise from divergences between
observers or instruments of measurement or instability of
the attribute being measured.
The degree to which measures obtained with an
instrument are consistent measures of what the
instrument is intended to measure
Assessment of reliability

Reliability is assessed in 3 forms:


Test-retest reliability
Alternate-form reliability
Internal consistency reliability
Test-retest reliability

Most common form in surveys


Measured by having the same respondents complete a
survey at two different points in time to see how stable the
responses are.
Usually quantified with a correlation coefficient (r value).
In general, r values are considered good if
r  0.70.
Test-Retest Reliability

The extent to which scores on the same


measured variable correlate with each
other on two different measurements
given at two different time.

Questionnaire 9/20 Questionnaire 9/27


4 I feel I do not have much proud
___ 4 I feel I do not have much proud of.
___
of.
3 4 On the whole, I am satisfied with myself
___
___
2 On the whole, I am satisfied with 1 I certainly feel useless at times
___
myself
1 1 At times I think I am no good at all
___
___ I certainly feel useless at times
4 4 I have a number of good qualities
___
___ At times I think I am no good at all
3 4 I am able to do things as well as others
___
___ I have a number of good qualities
___ I am able to do things as well as
others
Test-retest reliability

If data are recorded by an observer, you can have the same
observer make two separate measurements.
The comparison between the two measurements is intra-
observer reliability.
What does a difference mean?
Test-retest reliability

You can test-retest specific questions or the entire survey


instrument.
Be careful about test-retest with items or scales that
measure variables likely to change over a short period of
time, such as energy, pain, happiness, anxiety.
If you do it, make sure that you test-retest over very short
periods of time.
Alternate-form reliability
Use differently worded forms to
measure the same attribute.

Questions or responses are


reworded or their order is changed
to produce two items that are similar
but not identical.
Equivalent-Forms Reliability
The extent to which two equivalent
variables given at different time
correlate each other.

Example. GRE, SAT, GMAT, TOEFL

22 X 45 = 32 X 45 =

85 X (23-11) = 85 X (41-11) =

72-14 X 12 X (7-1) = 72-14 X 25 X (6-1) =


Alternate-form reliability
You can measure alternate-form reliability at the same
time point or separate time points.
Another method is to split the test in two, with the
scores for each half of the test being compared with the
other.
- This is called a split-halves method
- You could also split into thirds and administer
three forms of the item, etc.
Interrater Reliability

The extent to which the scores


counted by coders correlate
each other.

How Do You Measure


Interrater Reliability?
Aggression Code
Coder 1 Coder 2
Cohen’s Kappa
Hit boy A ______
1 ______
3
Hit boy B ______
3 ______
3
Hit girl A ______
3 ______
2
Hit girl B ______
1 ______
1
How to calculate
• Step 1: Calculate po (the observed proportional agreement):
20 images were rated Yes by both.
15 images were rated No by both.
So,
Po = number in agreement / total = (20 + 15) / 50 = 0.70.
• Step 2: Find the probability that the raters would randomly both
say Yes.
Rater A said Yes to 25/50 images, or 50%(0.5).
Rater B said Yes to 30/50 images, or 60%(0.6).
The total probability of the raters both saying Yes randomly is:
0.5 * 0.6 = 0.30.
• Calculate the probability that the raters would randomly
both say No.
Rater A said No to 25/50 images, or 50%(0.5).
Rater B said No to 20/50 images, or 40%(0.6).
The total probability of the raters both saying No randomly
is:
0.5 * 0.4 = 0.20.
• Step 4: Calculate Pe. Add your answers from Step 2 and 3
to get the overall probability that the raters would randomly
agree.
Pe = 0.30 + 0.20 = 0.50.
• Step 5: Insert your calculations into the formula and solve:
• k = (Po – pe) / (1 – pe = (0.70 – 0.50) / (1 – 0.50) = 0.40.
• k = 0.40, which indicates fair agreement.
• K should be more than 0.6.
Internal consistency reliability
Applied not to one item, but to groups of items that
are thought to measure different aspects of the same
concept.
Cronbach’s alpha (a)  not to be confused w/ Type I Error Measures
internal consistency reliability among a group of items
combined to form a single scale
It is a reflection of how well the different items complement
each other in their measurembent of different aspects of the
same variable or quality
Interpret like a correlation coefficient, a  0.70 is good.
Cronbach’s alpha (a)
Let,
𝑠𝑖2 = 𝑠𝑎𝑚𝑝𝑙𝑒 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑜𝑓 𝑞𝑢𝑒𝑠𝑡𝑖𝑜𝑛 𝑖
2
𝑠𝑡𝑒𝑠𝑡 = 𝑠𝑎𝑚𝑝𝑙𝑒 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑜𝑓 𝑡𝑒𝑠𝑡 𝑡𝑜𝑡𝑎𝑙

then,
σ𝑘𝑖=1 𝑠𝑖2 𝑘
𝛼 = 1− 2
𝑠𝑡𝑒𝑠𝑡 𝑘−1
Cronbach’s alpha (a)
σ𝑘𝑖=1 𝑠𝑖2 𝑘
𝛼 = 1− 2
𝑠𝑡𝑒𝑠𝑡 𝑘−1

High alpha is good and high alpha is caused by high


“test” variance.
But why is high test variance good?
High variance means you have a wide spread of scores,
which means subjects are easier to differentiate.
If a test has a low variance, the scores for the subjects are
close together. Unless the subjects truly are close in their
“ability”, the test is not useful.
t
Questionnaire 1 Test-Retest Reliability Questionnaire 1
Item 1 Item 1
Item 2 Item 2
Reliability as
Internal Consistency
Item 3 Item 3

Questionnaire 2

Equivalent-Forms Item 1
Reliability
Item 2

Interrater Reliability Item 3


Validity
Construct Validity
The extent to which a measured variable
actually measures the conceptual variable
(that is, the construct) that it is designed
to assess.

Criterion Validity
The extent to which a self-report measure
correlates with a behavioral measured
variables.
Construct Validity

Face Validity
The extent to which the measured
variable appears to be an adequate
measure of the conceptual variables

I don’t like Japanese


Strongly Disagree 1 2 3 4 5 6 7 8 Strongly Agree
Discrimination
towards Japanese

Measured Conceptual
Variable X Variable
Construct Validity
Content Validity

The degree to which the measured


variable appears to have adequately
sampled from the potential domain
of question that might relate to
the conceptual variable of interest.

Sympathy
Verbal Aptitude

Intelligence Math Aptitude


Construct Validity
Convergent Validity
Interdependence Scale
The extent to which a measured variable
is found to be related to other measured
variables designed to measure the same
conceptual variable. Collectivism Scale

Discriminant Validity

The extent to which a measured variable Independence Scale


is found to be unrelated to other measured
variables designed to measure the different
conceptual variables.
Interdependence Scale
Criterion Validity
Predictive Validity

The extent to which the scores


can predict the participants’
Example. GRE, SAT...
future performance.

Concurrent Validity

The extent to which the self-report


measure correlate with the behavioral
measure that is assessed at the same
time.
How Do You Improve the Reliability and Validity of
Your Measured Variables?

1. Conduct a pilot test, trying out a questionnaire or other


research instruments on a small group.
2. Use multiple measures.
3. Ensure variability that there is in your measures.
4. Write good items.
5. Get your respondents to take your questions
seriously.
6. Make your items nonreactive.
7. Be certain to consider face and content validity by choosing
reasonable terms and that cover a broad range of issues
reflecting the conceptual variables..
8. Use existing measures.
time

Conceptual Future
Variables behaviors

Face Validity
Predictive Validity

Other
Domain of the Measured Measured
CVs Variables Variables
(Self-Report) (Behavioral)
Content Validity Concurrent Validity

Similar Items-Scales Items-Scales Other Items-Scales

Convergent Validity Discriminant Validity

You might also like