Questionnaire Reliability Validity

Questionnaire
Reliability & Validity

Questionnaire
A questionnaire is a formalized set of questions for

obtaining information from respondents.
A questionnaire must uplift, motivate, and encourage the
respondent to become involved in the interview, to
cooperate, and to complete the interview.
A questionnaire should minimize response error.
Random and Systematic Error
Random Error
1) fluctuations in the person’s current mood.
2) misreading or misunderstanding the questions
3) measurement of the individuals on different days or in different
places.
These error may cancel out as you collect many samples
Systematic Error
Sources of error including the style of measurement, tendency
toward self-promotion, cooperative reporting, and other
conceptual variables are being measured.
So, we have to reduce these errors to prove scientific findings
How well do our measured variables “capture” the conceptual

variables?
Reliability
The extent to which the variables
are free from random error, *
usually *
**
determined by measuring the
variables more than once. CVs * *
*
Construct Validity *
The extent to which a measured

* **
variable actually measures the *
conceptual variables that is
design *
CVs
**
to assess the extent to which it is **
known to reflect the conceptual
variable other measured
variables
Reliability
Definition
The degree of stability exhibited when a measurement is
repeated under identical conditions.
Lack of reliability may arise from divergences between
observers or instruments of measurement or instability of
the attribute being measured.
The degree to which measures obtained with an
instrument are consistent measures of what the
instrument is intended to measure
Assessment of reliability
Reliability is assessed in 3 forms:

Test-retest reliability
Alternate-form reliability
Internal consistency reliability
Test-retest reliability
Most common form in surveys

Measured by having the same respondents complete a
survey at two different points in time to see how stable the
responses are.
Usually quantified with a correlation coefficient (r value).
In general, r values are considered good if
r  0.70.
Test-Retest Reliability
The extent to which scores on the same

measured variable correlate with each
other on two different measurements
given at two different time.
Questionnaire 9/20 Questionnaire 9/27

4 I feel I do not have much proud
___ 4 I feel I do not have much proud of.
___
of.
3 4 On the whole, I am satisfied with myself
___
___
2 On the whole, I am satisfied with 1 I certainly feel useless at times
___
myself
1 1 At times I think I am no good at all
___
___ I certainly feel useless at times
4 4 I have a number of good qualities
___
___ At times I think I am no good at all
3 4 I am able to do things as well as others
___
___ I have a number of good qualities
___ I am able to do things as well as
others
If data are recorded by an observer, you can have the same
observer make two separate measurements.
The comparison between the two measurements is intra-
observer reliability.
What does a difference mean?
You can test-retest specific questions or the entire survey

instrument.
Be careful about test-retest with items or scales that
measure variables likely to change over a short period of
time, such as energy, pain, happiness, anxiety.
If you do it, make sure that you test-retest over very short
periods of time.
Alternate-form reliability
Use differently worded forms to
measure the same attribute.
Questions or responses are

reworded or their order is changed
to produce two items that are similar
but not identical.
Equivalent-Forms Reliability
The extent to which two equivalent
variables given at different time
correlate each other.
Example. GRE, SAT, GMAT, TOEFL
22 X 45 = 32 X 45 =
85 X (23-11) = 85 X (41-11) =
72-14 X 12 X (7-1) = 72-14 X 25 X (6-1) =

Alternate-form reliability
You can measure alternate-form reliability at the same
time point or separate time points.
Another method is to split the test in two, with the
scores for each half of the test being compared with the
other.
- This is called a split-halves method
- You could also split into thirds and administer
three forms of the item, etc.
Interrater Reliability
The extent to which the scores

counted by coders correlate
each other.
How Do You Measure

Interrater Reliability?
Aggression Code
Coder 1 Coder 2
Cohen’s Kappa
Hit boy A ______
1 ______
3
Hit boy B ______
3 ______
3
Hit girl A ______
3 ______
2
Hit girl B ______
1 ______
1
How to calculate
• Step 1: Calculate po (the observed proportional agreement):
20 images were rated Yes by both.
15 images were rated No by both.
So,
Po = number in agreement / total = (20 + 15) / 50 = 0.70.
• Step 2: Find the probability that the raters would randomly both
say Yes.
Rater A said Yes to 25/50 images, or 50%(0.5).
Rater B said Yes to 30/50 images, or 60%(0.6).
The total probability of the raters both saying Yes randomly is:
0.5 * 0.6 = 0.30.
• Calculate the probability that the raters would randomly
both say No.
Rater A said No to 25/50 images, or 50%(0.5).
Rater B said No to 20/50 images, or 40%(0.6).
The total probability of the raters both saying No randomly
is:
0.5 * 0.4 = 0.20.
• Step 4: Calculate Pe. Add your answers from Step 2 and 3
to get the overall probability that the raters would randomly
agree.
Pe = 0.30 + 0.20 = 0.50.
• Step 5: Insert your calculations into the formula and solve:
• k = (Po – pe) / (1 – pe = (0.70 – 0.50) / (1 – 0.50) = 0.40.
• k = 0.40, which indicates fair agreement.
• K should be more than 0.6.
Internal consistency reliability
Applied not to one item, but to groups of items that
are thought to measure different aspects of the same
concept.
Cronbach’s alpha (a)  not to be confused w/ Type I Error Measures
internal consistency reliability among a group of items
combined to form a single scale
It is a reflection of how well the different items complement
each other in their measurembent of different aspects of the
same variable or quality
Interpret like a correlation coefficient, a  0.70 is good.
Cronbach’s alpha (a)
Let,
𝑠𝑖2 = 𝑠𝑎𝑚𝑝𝑙𝑒 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑜𝑓 𝑞𝑢𝑒𝑠𝑡𝑖𝑜𝑛 𝑖
2
𝑠𝑡𝑒𝑠𝑡 = 𝑠𝑎𝑚𝑝𝑙𝑒 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑜𝑓 𝑡𝑒𝑠𝑡 𝑡𝑜𝑡𝑎𝑙
then,
σ𝑘𝑖=1 𝑠𝑖2 𝑘
𝛼 = 1− 2
𝑠𝑡𝑒𝑠𝑡 𝑘−1
Cronbach’s alpha (a)
σ𝑘𝑖=1 𝑠𝑖2 𝑘
𝛼 = 1− 2
𝑠𝑡𝑒𝑠𝑡 𝑘−1
High alpha is good and high alpha is caused by high

“test” variance.
But why is high test variance good?
High variance means you have a wide spread of scores,
which means subjects are easier to differentiate.
If a test has a low variance, the scores for the subjects are
close together. Unless the subjects truly are close in their
“ability”, the test is not useful.
t
Questionnaire 1 Test-Retest Reliability Questionnaire 1
Item 1 Item 1
Item 2 Item 2
Reliability as
Internal Consistency
Item 3 Item 3
Questionnaire 2
Equivalent-Forms Item 1
Reliability
Item 2
Interrater Reliability Item 3

Validity
Construct Validity
The extent to which a measured variable
actually measures the conceptual variable
(that is, the construct) that it is designed
to assess.
Criterion Validity
The extent to which a self-report measure
correlates with a behavioral measured
variables.
Construct Validity
Face Validity
The extent to which the measured
variable appears to be an adequate
measure of the conceptual variables
I don’t like Japanese

Strongly Disagree 1 2 3 4 5 6 7 8 Strongly Agree
Discrimination
towards Japanese
Measured Conceptual
Variable X Variable
Construct Validity
Content Validity
The degree to which the measured

variable appears to have adequately
sampled from the potential domain
of question that might relate to
the conceptual variable of interest.
Sympathy
Verbal Aptitude
Intelligence Math Aptitude

Construct Validity
Convergent Validity
Interdependence Scale
The extent to which a measured variable
is found to be related to other measured
variables designed to measure the same
conceptual variable. Collectivism Scale
Discriminant Validity
The extent to which a measured variable Independence Scale

is found to be unrelated to other measured
variables designed to measure the different
conceptual variables.
Interdependence Scale
Criterion Validity
Predictive Validity
The extent to which the scores

can predict the participants’
Example. GRE, SAT...
future performance.
Concurrent Validity
The extent to which the self-report

measure correlate with the behavioral
measure that is assessed at the same
time.
How Do You Improve the Reliability and Validity of
Your Measured Variables?
1. Conduct a pilot test, trying out a questionnaire or other

research instruments on a small group.
2. Use multiple measures.
3. Ensure variability that there is in your measures.
4. Write good items.
5. Get your respondents to take your questions
seriously.
6. Make your items nonreactive.
7. Be certain to consider face and content validity by choosing
reasonable terms and that cover a broad range of issues
reflecting the conceptual variables..
8. Use existing measures.
time
Conceptual Future
Variables behaviors
Face Validity
Predictive Validity
Other
Domain of the Measured Measured
CVs Variables Variables
(Self-Report) (Behavioral)
Content Validity Concurrent Validity
Similar Items-Scales Items-Scales Other Items-Scales
Convergent Validity Discriminant Validity

Questionnaire Reliability Validity

Uploaded by

Copyright:

Available Formats

You might also like

Questionnaire Reliability Validity

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Questionnaire Reliability Validity

Uploaded by

Copyright:

Available Formats

Questionnaire

Reliability & Validity

A questionnaire is a formalized set of questions for

These error may cancel out as you collect many samples

How well do our measured variables “capture” the conceptual

The extent to which a measured

Reliability is assessed in 3 forms:

Most common form in surveys

The extent to which scores on the same

Questionnaire 9/20 Questionnaire 9/27

You can test-retest specific questions or the entire survey

Questions or responses are

Example. GRE, SAT, GMAT, TOEFL

72-14 X 12 X (7-1) = 72-14 X 25 X (6-1) =

The extent to which the scores

How Do You Measure

High alpha is good and high alpha is caused by high

Interrater Reliability Item 3

I don’t like Japanese

The degree to which the measured

Intelligence Math Aptitude

The extent to which a measured variable Independence Scale

The extent to which the scores

The extent to which the self-report

1. Conduct a pilot test, trying out a questionnaire or other

Similar Items-Scales Items-Scales Other Items-Scales

Convergent Validity Discriminant Validity

You might also like