Professional Documents
Culture Documents
Unit Three-Measurement Instruments V4 - 2 - 3
Unit Three-Measurement Instruments V4 - 2 - 3
3
Introduction
• Field work in epidemiological studies consists of collecting data in natural
and experimental settings to answer research questions using instruments
4
• Instruments need to:
5
Reproducibility
Dependability
Repeatability Precision
Reliability
Stability
Agreement
Variability Consistency
Concordance
6
• A reliable instrument:
• Yields the same results every time it is used to measure the same
object, assuming the object itself has not changed
• The extent to which scores for patients who have not changed are the
same for repeated measurement under several conditions (extended
definition)
7
Reliability
Internal Multiple
Test retest Equivalence/ Itra-rater Inter-rater
Alternate consistency form
8
Reliability
Internal Multiple
Test retest Equivalence consistency Itra-rater Inter-rater
form
• Same instrument (test) administered to same respondent at different times
• Results (scores) are from one time to the next are compared to obtain an correlation value
• r ≥ 0.70 considered (usually)
• Time interval between measurements needs to be:
• Sufficiently long?
• To prevent recall bias
• Not too long?
• To allow changes to occur in the characteristics of the respondents related to the
construct being measured
• The attributes of the construct being measured should be temporally stable
• It is not appropriate for a state construct to be expected to change over time,
such as mood
9
Reliability
Equivalence/ Internal Multiple Inter-
Test retest consistency Itra-rater
Alternate form rater
• If two different forms of a instrument are supposed to measure the same outcome
• Where measures have alternative formats
• E.g.: a version for interviewer administration and a version for self-administration, or a long and
shorter version – they should be highly correlated.
• Can be assessed by giving different forms of the instrument to two or more groups that have
been randomly selected.
• The forms are created either by using differently worded questions to measure the same
attributes or by reordering the questions
• To test for equivalence, administer the different forms at separate time points to the same
population, or if the sample is large enough, you can divide it in half and administer each of
the two alternate forms to half of the group.
• In either case, you would first compute mean scores and standard deviations on each of the forms
and then correlate the two sets of scores to obtain estimates of equivalence.
• Equivalence reliability coefficients should be at least 0.70.
10
Reliability
Equivalence/ Internal Multiple Inter-
Test retest consistency Itra-rater
Alternate form rater
• How internally consistent the items are in measuring the characteristics that they are supposed
to measure.
• The degree of interrelatedness among items
• Measure of the extent to which items assess the same construct
• Assessed via coefficient alpha (α)
• The best known parameter for assessing the internal consistency of a scale
• Describes how well different items complement each other in their measurement of the same
quality or dimension
• The basic principle of examining the internal consistency of a scale is to split the items in half
and see whether the scores of two half-scales correlate
• A scale can be split in half in many different ways
• The correlation is calculated for each half-split.
• Cronbach’s alpha represents a kind of mean value of these correlations, adjusted for test
length.
11
• A well-accepted guideline for the value of Cronbach’s alpha is between 0.70 and 0.90.
• Decide if your instrument needs to consider internal consistency
• In which statement/s do you think internal consistence is not important?
12
Interpretation of Cronbach’s alpha
• Internal consistency assessed requires only one measurement in a study population
• It is easy to calculate ,but often interpreted incorrectly
• Cronbach’s alpha does not measure
-Because adequate α suggests only that, on average, items in the scale are highly correlated
-They apparently measure the same construct, but this provides no evidence whether or not the
items measure the construct that they claim to measure.
- The items measure something consistently, but what that is, remains unknown.
- So, internal consistency is not a parameter of validity
-In the COSMIN taxonomy the measurement property ‘internal consistency’ is an aspect of
reliability
14
- The value of α is highly dependent on the number of items in the scale
- This principle is applied for item reduction:
- When α is high we can afford to delete items to make the instrument more efficient
When the value of α is too low, we can increase the value by formulating new items, which are
manifestations of the same construct.
- This principle also implies that with a large number of items in a scale, α may have a high value,
despite rather low inter-item correlations.
15
Reliability
Equivalence/ Internal Multiple
Test retest
Alternate consistency form
Itra-rate Inter-rater
• Reliability of the same rater’s scores, of the same subjects, on different occasions.
16
Sources of variation
• Repeated measurements may display variation arising from several sources:
• Observer variability
• Due to the observer, and includes choice of words in an interview and skill in using a
mechanical instrument
• Instrument variability
• Due to the instrument, and changing environmental factors (e.g., temperature), aging
mechanical components, different reagent lots, and so on.
• Subject variability
• Due to intrinsic biologic variability in the study subjects unrelated to variables under study,
such as variability due to time of day of measurements or time since last food or medication
17
Improving the reliability of measurements
• Reliability concerns the anticipation, assessment and control of sources of variation, and that the
ultimate aim of reliability studies is to improve the reliability of measurements.
• A number of strategies for this purpose:
• Restriction
• Avoid a specific source of variation
• E.g.: when we know that the amount of fatigue that patients experience increases during the day,
we can exclude this variation by measuring every patient at the same hour of the day.
19
2.1. Content validity
• The degree to which the content of an instrument is
adequately reflects the construct to being measured the
following aspects:
• Face validity
• Relevance
• Comprehensiveness
• Comprehensibility
22
• Content validation studies assess whether the measurement
instrument adequately represents the construct under study.
• Clear description of construct should be emphasized
23
• For multi-item questionnaires,
• Items should be both relevant and comprehensive for the construct to be
measured.
• Relevance
• Can be assessed with the following three questions:
• Do all items refer to relevant aspects of the construct to be
measured?
• Are all items relevant for the study population,
• E.g.: with respect to age, gender, disease characteristics,
languages, countries, settings?
• Are all items relevant for the purpose of the application of the
measurement instrument?
• Comprehensiveness
• Is the construct completely covered by the items?
24
Consider the following in content validation:
25
Information about content of the measurement
instrument
- In order to be able to assess whether a specific measurement instrument
covers the content of the construct, developers should have provided full
details about the measurement instrument, including procedures.
27
Assessing whether content of the measurement instrument
corresponds with the construct (is relevant and
comprehensive)
30
2.2.1. Concurrent validity
• An instrument being validated and a selected criterion
measured at the same time
• Both the score for the measurement and the score for the gold
standard is considered at the same time in its assessment
• Usually assessed for instruments to be used for evaluative and
diagnostic purposes
31
• In concurrent validity and predictive validity, there is usually only one hypothesis that is not
clearly stated but rather implicit.
• The measurement instrument under study is as good as the gold standard.
• In practice, the essential question is whether the instrument under study is sufficiently valid for
its clinical purpose.
• For the predictive validity, an instrument being assessed is administered first, and then a criterion
32
instrument is administered after an appropriate interval.
General design of criterion-related validation consists of the following steps:
• PROs, which often focus on subjective perceptions and opinions, almost always lack a gold
standard.
• An exception is a situation in which a shorter questionnaire for a construct is developed,
when a long version already exists.
• Gold standard: long version
• To be able to assess the adequateness of the gold standard, it is important that researchers provide
information about the validity and reliability of the measurement instrument, that is used as gold standard.
33
Identify appropriate sample of the target population in which the measurement
instrument will ultimately be used
- For all types of validation, the instrument should be validated for the target population
and situation in which it will be used.
-Independent application of the measurement instrument and the gold standard is:
- Requirement the validation of measurement instruments
- Measurement instrument should not be part of the gold standard, or influence it in any way.
- This could happen if the gold standard is based on expert opinion
- When a short version of a questionnaire is validated against the original long version, the scores
for each instrument should be collected independently from each other.
35
Determine the strength of the relationship between the instrument scores and
criterion scores
-To assess criterion validity, the scores from the measurement instrument to be validated are
compared with the scores obtained from the gold standard.
-Various statistical parameters used at various measurement levels of gold standard and
measurement instruments.
-If both gold standard and measurement instrument have a dichotomous outcome: sensitivity
and specificity
-If measurement instrument has an ordinal or continuous scale: ROCs
-If measurement instrument and gold standard are expressed in the same units: Bland and
Altman plots and ICCs can be used
36
2.3. Construct validity
• Three aspects:
37
2.3.1.Structural validity
• The degree to which the scores of a measurement
instrument are adequate reflections of the
dimensionality of the construct to be
measured
• Assessed by:
1. IRT/Rasch analysis
- Provides rich information about individual items that is
not available using classic test theory
2. Factor analysis (FA)
38
EFA CFA
• Used to reduce the number of items or to
explore the number of factors of a new • Used to test a hypothesized factorial structure
instrument in the absence of a prior based on a theory or previous empirical
hypothesis. evidence
• Applied if there are no clear ideas about • Fit-parameters are used to test whether the
the number and types of dimensions data fit the hypothesized factor structure
• (Often) performed when CFA (i.e. • It is possible to test whether the proposed
confirmation of the existence of
predefined dimensions) is inadequate
model is better than alternative models
• If there is no (or little) information • More appropriate than EFA for assessing the
available on the structure of a construct to structural validity of an instrument if there is
be assessed, it is recommended to conduct already information available on the
EFA to identify the structure, and then dimensionality of the instrument.
CFA can be used to confirm whether or
not the structure provides a good fit. 39
• More appropriate for validation purposes
2.3.2. Hypotheses-testing construct validity
• The relationships of scores on the instrument of
interest with the scores on other instruments
measuring similar constructs (convergent
validity) or dissimilar constructs (discriminant
validity), or the difference in the instrument
scores between subgroups of people (known-
groups validity)
40
• When designing a psychometric study, it is recommend to formulate hypotheses about the
expected direction and magnitude of the correlations or differences for the validation.
• Then, the validation can be performed by analyzing the data regarding whether or not the
formulated hypotheses were satisfied.
• However, many researchers determine the validity only based on the statistical significance of
the employed statistics, without considering the expected direction and magnitude.
• Although evidence for construct validity is typically assembled through a series of studies, the
process generally consists of the following steps:
• Hypotheses can be formulated with regard to expected relationships with instruments assessing:
• Related constructs
• Unrelated constructs, or
• With respect to expected differences between subgroups of patients
• E.g.: if the researchers want a measurement instrument to measure physical functioning, and not pain,
then the hypothesis could be formulated that the measurement instrument should have no correlation, or
only a slight correlation with measurement instruments that measure pain (discriminant validity).
42
Describe the comparable measurement instruments or subgroups to be
discriminated
-Describe measurement instruments with which the measurement instrument under study is
compared, in terms of the constructs they measure, and present data about their measurement
properties
-Present details about the other measurement instruments, to which the new measurement
instrument is related, in terms of the construct(s) they measure and their measurement
properties.
- To assess their similarity or dissimilarity, one must have insight into the content of these
comparable measurement instruments.
-There should be a description of what is known about the validity and other measurement
properties of these instruments in the specific situation under study.
- This part of the validation study is often taken too easily.
-There should references describing the content and measurement properties of these
instruments in the same target population.
-When hypotheses are formulated about differences between known groups, details about the
demographic, clinical and other relevant characteristics of these groups should be presented. 43
Gather empirical data
• Attention must be paid to the population and situation in which these data are collected.
• Validation is dependent on these issues, so the study sample and situation should be
representative of the target population and conditions in which the measurement
instrument will be used.
44
Assess whether the results are consistent with the hypotheses
• This step should also be straightforward if the previous steps have been performed correctly,
• It is just a matter of counting how many hypotheses were confirmed and how many were
rejected.
• However, if the hypotheses were vaguely formulated, this step becomes problematic.
• Can one say that a correlation coefficient of 0.35 is moderate, and can one conclude that
subgroups have a different mean value, if this difference did not reach statistical significance in a
small study?
• So, defining explicitly beforehand the correlations and magnitude of differences one considers
acceptable, will prevent the need for these post-hoc, data-dependent decisions.
45
Explain the observed findings
• Discuss the extent to which observed findings could be explained by rival theories or
alternative explanations (and eliminate these if possible)
• Only validation studies with explicitly defined constructs, and hypotheses based on
theory or literature findings, make it possible to draw firm conclusions about the (lack of)
construct validity of the scores of a measurement instrument.
46
2.3.3. Cross-cultural validity/measurement invariance
• The degree to which the performance of items on a translated
or culturally adapted instrument are an adequate reflection
of the performance of items in the original version of the
instrument
• Assessed after translation of a questionnaire
• Usually using:
• Multiple-group CFA or
• Differential item functioning (DIF)
• Starts with an accurate translation process
• Validity of the new, cross-culturally adapted instrument, should be
checked by assessing its construct validity.
• There may be differences in cultural issues apart from differences
induced by the translations,
• Some items in a questionnaire may be irrelevant in other cultures.
• E.g.: the ability to ride a bicycle is very important in the Netherlands,
which almost everybody does for short distance transportation, while in
the USA, cycling is considered as a type of sport, and only a minority of
the population possesses a bicycle. 47
The translation process
• Consists of six steps
Step 1: Forward translation
• Two bilingual translators independently translate the questionnaire from original language
into target language
• They make a written report of the translation containing challenging phrases and uncertainties,
and considerations for their decisions.
• One translator should have expertise on the construct under study, the second one being a
language expert, but naive about the topic.
• These types of expertise are required to obtain equivalence from both a topic-specific and
language-specific perspective.
48
Step 2: Synthesis of the forward translation
• Two translators and a recording observer combine results of both translations (T1 and T2 into
T12)
• Results in one synthesized version of the translation
• Written report carefully documenting how they have resolved discrepancies is presented
• Translators are both language experts and are not experts on the constructs to be measured.
• Recommended because experts on the construct under study may know unexpected meanings of
items.
• They have background information about what aspects are relevant thereby increasing the
likelihood of detecting imperfect translations.
49
Step 4: Expert committee composes the pre-final version
• Consists of four translators together with
• Researchers
• Methodologists and
• Health and language professionals
• Contact with developers of the original questionnaire (if possible) to check whether the items
have maintained their intended meaning.
50
Step 5: Test of the pre-final version
• Completed by a small sample of the target population (15–30) for pilot-testing
• It is then tested for comprehensibility
• Special attention should be paid to whether respondents interpret the items and
responses as intended by the developers.
51
Assessment of measurement invariance
• A measurement instrument, scale or item functions in exactly the same way in different
populations.
• Several methods that can be used to assess measurement invariance.
• Factor analysis
• Logistic regression analysis
• IRT techniques
• In assessing measurement invariance, the factor structure of data gathered in both the original
and new populations are compared on three points.
• Are the same factors identified in both populations, and
• Are these factors associated with the same items across the two populations?
• Do the factors have the same meanings across the two populations
• i.e. Do the items show the same factor loadings in both populations?
• Do the items have the same mean values (intercepts) in both populations?
52
3. Responsiveness
• The most important objective of measurements in clinical practice and clinical and
health research is to assess whether disease status of patients has changed over time
• The ultimate goal of medicine is to cure patients
• This particularly applies to the scores for multi-item measurement instruments, the meaning
of which is not immediately clear.
• What does a mean value of 9.0 points on the 0–24 scale mean?
• In addition, is an improvement of 2.2 points meaningful for the patients?
• The degree to which one can assign qualitative meaning – i.e., clinical or commonly
understood connotations – to an instrument’s quantitative scores or change in scores
• The degree to which one can assign qualitative meaning to quantitative scores 54
Development of a measurement instrument
55
Why development of new measurement instrument needed?
• Technical developments and advances in medical knowledge mean that new measurement
instruments are still appearing in all fields.
• Existing instruments are continuously being refined and existing technologies are being applied
beyond their original domains.
• Measurement instruments used in various medical disciplines differ greatly from each other.
Details of development of measurement instruments must be specific to each discipline.
• However:
• Basic steps in the development of all measurement instruments are same (methodological viewpoint)
• Basic requirements with regard to measurement properties are similar for all measurement instruments.
56
• Before deciding to develop a new measurement instrument:
• SLR on existing instruments intended to measure the specific concept is indispensable
• LR is important for three reasons. searching for existing instruments:
2. Help to obtain ideas about what a new instrument should or should not look like
- Instruments that are not applicable, or insufficient quality can still provide a lot of
information, if only about failures that you want to avoid.
57
• Developing a measurement instrument is not something to be done on a rainy Sunday afternoon.
• If it is done properly, it may take years
• It takes time because the process is iterative.
• During the development process, we have to check regularly whether it is going well.
• Steps in the development of a measurement instrument
• In practice, these steps are intertwined, and one goes back and forth between these steps, in a
continuous process of evaluation and adaptation.
• The last steps in the development process consist of pilot-testing and field-testing.
• Essential parts of the development phase as in this phase the final selection of items takes place
• If the measurement instrument does not perform well it has to be adapted, evaluated again 58
Development & evaluation of instrument: steps
1. Definition and elaboration of the
construct to be measured
• Most essential questions to be addressed
o What do we want to measure?
o In which target population?, AND
o For which purpose?
• Target population
• Purpose of measurement must be considered
59
What do we want to measure? construct definition
• Definition of a construct starts with a decision concerning its level in the conceptual model
and considerations about potential aspects of the construct
• Which level in the conceptual model are we interested in?
by answering these questions we are
specifying in more detail what we
want to measure.
60
• If a construct has different aspects, and we want to measure all these aspects, the measurement instrument
should anticipate this multidimensionality.
• Thinking about multidimensionality in this phase is primarily conceptual, and not yet statistical.
• E.g.: in the development of the Multidimensional Fatigue Inventory (MFI), which is a multi-item questionnaire to
assess fatigue, the developers:
• Postulated beforehand that they wanted to cover five aspects of fatigue:
• General fatigue
• Physical fatigue
• Mental fatigue
• Reduced motivation and
• Reduced activity
• Developed the questionnaire in such a way that all of these aspects were covered.
• It is of utmost importance that before actually constructing a measurement instrument to decide which aspects
to include.
• This has to be done in the conceptual phase, preferably based on a conceptual model, rather than by finding
61
out
post hoc (e.g. by factor analysis) which aspects turn out to be covered by the instrument.
Which target population: target population
• The measurement instrument should be tailored to the target
population and so this must be defined.
• How a measurement instrument should be tailored to its
target population?
• Age, gender and severity of disease determine to a large extent the
content and type of instrument that can be used.
62
• Severity of a disease is also important
• Because pathophysiological findings and symptoms will
differ with severity, as will functioning and perceived
health status.
65
• Getting input for the items of a questionnaire: literature and experts
o Literature
- Examining similar instruments in the literature might help:
- To clarify the constructs we want to measure
- To provide a set of potentially relevant items
• At the level of symptoms, functioning and perceived health, the patients themselves are the key
experts.
Patients should be involved in the development of measurement instruments when their sensations,
experiences and perceptions are at stake.
• For the development of performance tests to assess physical functioning, patients can also indicate
which activities cause them the most problems.
• The best way to obtain information from clinicians or patients about relevant items is through FG or
IDIs.
• Developers need to have an exact picture in mind of the construct to be measured; otherwise, it is
impossible to instruct the FGs adequately and to extract the relevant data from the enormous yield of
67
information.
Formulating items: first draft
• Some new formulations or reformulations should always occur,
• Because the information obtained from experts and from the literature must be transformed into
adequate items.
The formulation of adequate items is a challenging task, but there are a number of basic rules.
68
Items should be comprehensible to the total target population, independent of their level of
education
- This means that difficult words and complex sentences should be avoided
- Items should be written in simple language that anyone over 12 years of age can understand them.
Terms that have multiple meanings should be avoided
- E.g.: the word ‘fair’ can mean ‘pretty good, not bad’, ‘honest’, ‘according to the rules’ and ‘plain’, and the
word ‘just’ can mean ‘precisely’, ‘closely’ and ‘barely’.
- Respondents may interpret these questions using these words differently, but they will not indicate that
the words are difficult.
Items should be specific
- E.g.: in a question about ‘severity of pain’ it should:
- Be specified whether the patient has to fill in the average pain or the worst pain.
- Be clear to which period of time the question refers
- Patient rate current pain, pain during previous 24 hours, or pain during the previous week?
Each item should contain only one question instead of two or more
Negative wording in questions should be avoided
69
Points to keep in mind
o Consider about the conceptual framework
• i.e. the direction of the arrows between the potential items and the construct
• Type of model has important consequences for selection of items for multi-item
measurement instrument
70
o In a reflective model:
• Items are manifestations (indicators) of the construct
→ items correlate with each other, and they may
replace each other
• i.e. they are interchangeable
• Not disastrous to miss some items that are also
good indicators of the construct.
• In the developmental phase, the challenge is to come up with as
many items as possible.
• Even items that are almost the same are allowed.
71
o In a formative model
- Each item contributes a part of the construct
- Together the items form the whole construct
72
• Life stress could be measured based on a
formative model.
• Items in that measurement instrument comprised
events that all cause stress.
73
o Difficulty of the items
• The difficulty of items in relation to the target population is another point that must be kept
in mind while selecting items
o Response options
• Statements or questions contained in items must correspond exactly with
response options.
74
Things to keep in mind in the selection and formulation of items
75
Scores for items
Scoring options
• Every measurement leads to a result, either a classification or a quantification of a response.
• Response to a single item can be expressed at nominal level, ordinal, interval or ratio level
76
3. Pilot-testing
• Development of measurement instrument progresses through a
number of phases: iterative process
78
Pilot-testing vs field-testing
• Pilot-testing
• Entails an intensive qualitative analysis of the items
• Relatively small number of representatives of the target population
• Field-testing
• Entails a quantitative analysis
• Uses quantitative techniques, such as
• Factor Analysis (FA)
• Item Response Theory (IRT)
• Require data from a large number of representatives of the target population
• For adequate field-testing a few hundred patients are required
79
Questionnaire Design
80
What is a questionnaire?
• A measurement tool consisting of a list of questions accompanied with:
• Instructions
• Response options, and/or
• Answering spaces
• Designed to elicit and record, or guide the elicitation and recording of, exposures from
subjects
• Errors at this point tend to considerably and sometimes irreversibly affect the validity of the
evidence generated in the study. 81
• Learning how to ask questions in written and spoken form is essential when
collecting field data.
82
What is questionnaire design?
• Questionnaire design is a big part of the whole process of data collection
and should be implemented completely before the fieldwork begins.
• Questionnaire design usually begins with selection of the items of data that must be
translated into questions.
• Just as the objectives of the study determine the variables to be measured as a whole, they also
determine the specific items to be covered in the questionnaire.
• If a question does not contribute to the achievement of the objectives, it has no place in the
questionnaire.
• Adequately detailed data should be sought for each essential exposure variable.
• E.g.: for exposure that happens frequently, it is usual to ask about the:
• Time exposure began
• Time it ended, and
• Frequency and intensity of the exposure and their variation over time.
84
• A comprehensive list of potential confounders and effect modifiers should also be
developed.
• A well thought out plan for data analysis, and a description of the algorithms that
will be used to create exposure dose variables and covariate variables, is essential
in determining the items and detail required.
• Developing the exposure and covariate algorithms that will be used at the end of
the study before questionnaire development at the beginning of a study is
important to avoid a surprising common problem—that at the time of data
analysis, the researcher realizes that an item needed to compute an exposure dose
variable had not been collected!
85
The response process
• Several psychological stages of the response
process during questionnaire administration.
86
Stage-1: Understanding the question
• Respondent attempt to understand requested
information
• Understanding is influenced by:
• Culture
• Language
• Individual interpretations
• ‘Context effects’ by:
• information that appears on the questionnaire
(e.g. previous questions), or
• Any suggestion that the researcher or the
research is interested in particular types of
behaviors or other characteristics.
87
• Comprehension errors arises if respondent:
• Forgetting is the major process leading to recall errors leading to recall bias.
• Experiences to be remembered for a long time: must be very stressful or highly impactful and
infrequent
• Implication
• Asking respondents to count and report a frequency of a common behavior in some defined
calendar period in the past is among the most difficult tasks
• Human memories tend to relate to typical episodes in personal history (event date), rather
than to the defined calendar time episodes the researcher would like to know about. 90
• Telescoping is a problem with event dating
• Forward telescoping
• Concerns stressful events remembered as more recent than they actually were
• May be the most common problem Event recalled
Event happened
Question on event time
• Backward telescoping
• Happens when recent events are remembered as more distant than they actually were.
Event recalled
Event happened Question on event time
91
• Satisficing: can occur at this stage
• Respondent settles for making little mental effort in tracing information
• Implication:
Questionnaire designers should verify the available evidence in the literature about what is a
reasonable recall period for the specific type of event of interest.
For events that are highly memorable, recall accuracy tends to increase by decomposing the recall
period in sub-periods about which separate questions are asked.
One should work back from more recent periods to earlier periods rather than the other way
around.
• Recall accuracy tends to increase when the participant is given more time to think
• Accuracy of retrieved information depends on how much effort the respondent is able and willing to make
to remember and/or lookup information.
Researchers should be aware that recalling relevant behaviors from memory can be time-
consuming and that satisficing may be induced by any form of pressure to speed up the response
92
process.
Stage-3: inference and estimation
• Implication: except for short options lists, response options should rather be 94
presented as separate questions.
Stage-5: Final editing and communication
95
Behavior is likely to be over-reported Behaviors that may be under-reported
• Illness and disabilities
• Being a good citizen: • Mental illness
• Interacting with government officials
• Taking a role in community activities • Illegal or contranormative behavior:
• Knowing the issues • Committing a crime, traffic violations
• Tax evasion
• Drug use
• Being a well-informed and cultured person • Sexual practices
• Reading newspapers, books, libraries…
• Going to cultural events such as concerts • Financial status:
• Participating in educational activities • Savings and other assets
• Income:
• Lower income groups may under- report
• Fulfilling moral and social responsibilities: income anticipated financial assistance or
• Giving to charity over-report to avoid stigma
• Participating in family affairs and child rearing
• Being employed
• Wealthier participants may under-report
income to avoid social or tax repercussions
96
• Possible consequences of SDB in epidemiological studies:
• Under- estimation of the frequency and/or magnitude of socially undesirable
attributes
• Overestimation of the frequency and/or magnitude of socially desirable
attributes
o People may take various reference points as a basis for making their judgment.
o E.g.:
1. Would you say that your own health in general is excellent, good, fair, or poor?
2. When you answered question-1 about your health, what were you thinking about?
• Others of the same age?
• Myself at a younger age?
• Myself now as compared to 1 year ago?
• Other
98
• Implication:
o Split the question into several questions each with a specific reference point.
Personal reference points for judgments may shift considerably over time.
• Important for the validity of assessing changes in subjective attributes
99
End aversion
• Reluctance to use extreme options in an options list of answers
• Results in under-estimation of frequencies of extreme categories
• Possible solutions:
• Broaden extreme categories to minimize the effects of this phenomenon
• Conceal the true extreme categories by adding extremes of a nearly impossible
magnitude that nobody is expected to choose.
• Note: remember that age, illness, sickness, and treatments can affect all stages
of the response process.
100
What is the objective of questionnaire design?
• To create an instrument easy for both the interviewer and subject to use, and is
easy to process and analyze.
101
Practical considerations in questionnaire design
102
Types of items in questionnaires
• Fully structured item
- Responses are preselected for the respondent
→ Response choices must be known in advance
- Preferred by respondents because some are either
- Unwilling or
- Unable to express themselves
- More difficult to write than open ones
• Advantages
• Results lend themselves more readily to statistical analysis and interpretation
103
• Semi-structured item
- A clear range of options, but one or more of the options trigger a sub-question, the
response to which is to be recorded as free text.
- Item is only structured to a certain level
• E.g.:
• “If ‘other,’ please specify: _____” or
• “If yes, please explain reasons: ___________.”
104
• Fully unstructured item
- Respondent or interviewer can freely write a textual answer to the question
- When respondents to use their own words is required
- Useful to:
• Explore unknown issues of a topic
• Get unanticipated answers
• Describe the world as the respondent sees it
Disadvantage
- Responses are often difficult to interpret and compare
- Subjects cannot be influenced by response options unlike closed-ended questions
- best sought to simple factual data
Make sure that experts and a sample of potential respondents review all
questions even if you are using already existing and validated instrument.
107
• Evaluate questions obtained from other sources for their:
Design adequacy
Appropriateness to objectives of the study, AND
Suitability for use in the population
108
Question wording
• Questions should always be stated as complete sentences.
o Complete sentences express one entire thought
- E. g:
- Question: Place of birth? Poor: why? Improve?
o Place of birth means different things to different people.
- I might give the city in which I was born, but you might tell the name of
the country or hospital.
109
Words in a questionnaire: general principles
Should be the usual ‘working tools’ of the respondents
110
• What is wrong with the question: Have you ever had an ECG?
Solution?
• How old were you when you first began to smoke regularly?
112
• Vague!
• How old were you when you first began to smoke regularly?
• It would be better to ask subjects about their ‘usual’ action over a specific
period of time (e.g. the past 12 months) than simply to ask about their ‘usual’
intake.
113
• Questions containing words that vary substantially in their meaning
among different people.
• Usually
• Normally
• Regularly Three commonly used vague descriptors of frequency
114
Too precise questions
• Precision is desirable when estimating amount or duration of exposure
• Respondent burden may be increased unduly if too much precision is requested.
• Too precise!
• It might be tempting to ask smokers to estimate their daily cigarette consumption
for each year of their smoking life.
• Unreasonably burdensome
• Prone to substantial error in recall
• Better approach
• Ask subjects about major changes in daily cigarette intake (e.g. An increase or
decrease of 10 or more cigarettes a day), and document the time of each of these
115
changes.
Biased questions
• Questions that suggest a respondent that a particular answer is preferred from
among all possible answers.
• NB: beware of items that contain works such as ‘and’, ‘or’, or ‘because’!
• Pre- testing with a group similar to the intended audiences could reveal
that a problem may exist.
117
Sensitive/threatening questions ?
• Questions that ask respondents about behaviours that are:
• Illegal
• Contra-normative (deviant)
• Not discussed in public without tension
• Relate to issues of self-preservation
118
• Fall into distinct classes: those that ask about:
119
• Techniques to maximize reporting of socially undesirable behaviors:
Use of words familiar to the respondent, and
Open-ended questions
Explicit assurances of confidentiality before interview or before sensitive
question
Use of a long introduction to the question
• A question on walking : Many people find it difficult to find time to get regular exercise, like
walking. In the past year did you walk for exercise at least once a week?
120
• The significance of a behavior may also be minimized by use of the phrase: ‘did you
happen to …’
• In the last month, did you ever happen to forget to use a condom?
o Sometimes people drink a little too much beer, wine, or whisky so that they act
differently from usual. What word do you think we should use to describe people when
they get that way, so that you will know what we mean and feel comfort able talking
about it? And
o In the past year, how often did you become (respondent’s word) while drinking any
kind of alcoholic beverage? 121
Double negatives ?
• Can arise whenever a question that is phrased negatively can have a
negative answer.
• E.g.: Should the hospital manager not be responsible for the failures of
services in the hospital?
Yes □
No □
• It is unnatural to say ‘yes’ when the answer really means ‘no’ (that the
hospital manager should not be responsible to the failure), and so answers to
this question would be ambiguous.
122
Mutually exclusive answers
• A subject could reasonably select more than one answer to a question:
• Subject becomes uncertain about which alternative to choose →non-response
Solution 1:
o How do you evaluate your daily physical exercise on your health?
Very important □
Some what important □
Not important □
I do not exercise
125
Solution 2
• Ascertain first whether or not the respondent exercises daily and skip to a
succeeding question
1. Do you have physical exercise on daily bases?
• Yes □
• No □ (Go to question 2)
126
Unambiguous time reference
o “In the past year, did you walk for exercise at least once a week?”
• Improved : .: In 2023, did you walk for exercise at least once per week?
o Think about your diet over the past year. How often did you eat a serving of
fruits or vegetables? Do not include juices, salads, potatoes, or beans.
• Potential issues? Solution?
• Fruit and vegetable intake may form a single exposure in a study, most
subjects would consider fruits and vegetables as separate categories
• Subjects would typically think through the answer by adding the number of
times they eat vegetables to the number of times they eat fruit.
129
Questions that require calculations
• Certain questions which may appear to be a simple concept may actually require
subjects to perform a calculation to derive an answer.
o In the past year, about how many hours per week did you walk for exercise?
• To answer this question, most subjects would need to break it down into
• The number of times they walked each week
• The minutes they walked per session, and
• Then multiply these together (and then convert to hours!)
• E.g.: A researcher may want to define walking for exercise as walking at least
once a week for at least 20 minutes per session and at least at a moderate
pace.
• The question might be phrased as: Over the past year, did you walk for
exercise at least once a week for 20 minutes or more per session? Do not
include casual walking.
• If the question were phrased this way, it would certainly annoy some
respondents with its complexity.
131
• Instead, part of the concept in the question can be incorporated into the answers
to sub-questions.
132
Question order
133
Placing demographic questions at the beginning is common practice
• Not a good idea! Why?
• These questions are:
• Comparatively low interest to the respondents, and
• Some of them are threatening
• Can be answered quickly (when respondent gets tired)
Questionnaire should begin with questions related directly to the topic
interest
• Command the subject’s interest
If, for some reason (e.g. to select particular respondents), it is necessary to place
demographic data at the beginning, some explanation for their position should134be
given to the respondent.
Place sensitive questions towards the end of a questionnaire, in order
of increasing threat. Why?
• Minimizes:
• Early termination of the interview
• Failure to complete the questionnaire
135
Place relatively easy-to-answer questions at end
• Long or difficult questionnaires respondents get tired answer last
questions carelessly or not answer them at all.
• Place demographic questions (age, income, gender, and other background
characteristics) at the end because these can be answered quickly.
136
Questions should appear to reasonable people to be in a logical order
• Do not switch from one topic to another unless you provide a transitional
statement to help the respondent make sense of the order.
138
Questionnaire structure
• Every questionnaire should contain:
Introduction
Conclusion
139
Introduction
• Interview takes the form of a standard statement read by the interviewer
140
General instructions
• Should be short and simple
• Skip instructions
142
• The following are some uses and examples of linking statement.
143
Skipping(branching)
• Necessary where some succeeding questions are not applicable to all
respondents.
• Paper and pencil questionnaires must rely on good instructions to follow skip
patterns
• Skip instruction:
• Should be placed immediately after the answer that leads to the branch point in the
questionnaire
• Most important requirement of skip instructions
• Should always be worded positively rather than negatively
• ‘Go to question 3’ than ‘Skip question 2’
144
• Skip patterns may be confusing to people
• Should be avoided in self-administered printed questionnaires
• E.g.: if the questionnaire tells the respondent, “If no, go to question 6,” the respondent who
answers “no” will automatically be sent to question 6.
145
• Important to make the path clear for those who are to complete the sub-questions
than for those who should skip them.
• Have every smoked?
□ No
□ Yes
• Complex branching designs are usually only possible in interviewer administered questionnaires.
• Including “Inapplicable” category may be used to avoid skips in a self-administered questionnaire
• How often do you cut the fat off meat before you cook or eat it?
Never □
Less than half the time □
More than half the time □
Always □
I never eat meat □
• Provides an alternative answer for everyone and eliminates the need for a skip 146
Questionnaire length
• Topics to be covered and details covered in a questionnaire are limited by the
length of time that subjects are willing to spend on the questioning process.
147
Questionnaire formats: Standard components
Items:
• Main building blocks of a questionnaire
• Items about a common theme are arranged in clearly delineated sections and
linked through alphanumerical sequencing, combined with skip instructions
when appropriate.
148
Spaces:
• Serve as administrative or quality control purposes
• Each single page of a questionnaire has a header section that identifies, as a
minimum the:
• Study
• Questionnaire within the study (if several exist)
• Page number
• Participant identification number, and
• Date of completion
• A small footer indicates the version of the questionnaire and the printing date.149
150
Questionnaire Format: Principles
Different typefaces for questions, responses, and instructions
• Lead interviewer or respondent to the correct parts of the question
151
Principles…
Record responses to closed-ended questions by check boxes
• Easy to understand
20-30
Use vertical response formats (except for scales) 31-40
41-50
• Makes it clear which box goes with which response 51+
• Increase space between questions
• Avoids questionnaire congestion
• Enables to see the path of questions easily
153
Principles…
Pages and questions should be numbered consecutively
Sub-sections of questions should be indented and identified with
letters rather than numbers.
Certain visual changes, such as underlining words or using capital letters, can
be used to emphasize a change in concept
• Not be over-used
154
Principles of formatting individual questions
• Interviewer-administered questionnaires: •
Self-administered
questionnaires:
• CAPITAL LETTERS for questions
• Bold face for questions
• Bold face for alternative responses that are not
to be read to respondent • Regular typeface for
alternative responses
• Bold CAPITAL LETTERS for alternative
• Italics for instructions
responses to be read
• Questionnaire should be printed in booklet form so that will open flat on a table.
• First page should have title of the project and instructions for completing the
questionnaire.
156
Formatting self-administered…
• Make skip patterns clear, e.g. through use of arrows after response and
instructions
157
Aids to recall
• Disease may be influenced by exposures which occurred many years before
diagnosis.
• Aids may be used to assist subjects in recalling information
• Recall may be aided by:
Allowing subjects some time to think about the question: simple aid to recall
159
• Use of a life events calendar is an example of the use of autobiographical
sequences to aid recall.
• Autobiographical sequences
• Groups of events clustered in time, often organized within some wider
framework (e.g. a job or illness), within which memory appears to be
organized.
• E.g.: Asking first about illnesses which may have been indications for the
use of particular medications may assist in recall of those medications.
160
• Telephone interviews are at a disadvantage with regard to recall because:
• Silent pauses are more awkward on the telephone, and
• Subjects might respond too quickly to retrieve the memory fully
161
Pre-testing
• Essential part of all questionnaires development
• Regardless of being substantially based on previous questionnaires
163
• Pretesting typically involves several of these methods applied to different versions of the
questionnaire as it is revised and gets closer to its final form.
• Some of the techniques are more appropriate for earlier stages of pre-testing:
• Expert reviews
• Cognitive interviews
• Experiments
• Some authors use the word pre-test for the early testing of the questionnaire and pilot test for
later testing of the study field methods, including:
• Selecting subjects
• Recruitment, and
• Data collection
• Methods, such as intensive interviews and interactive coding, might be too costly in terms
164 of
• When time and budget are restricted, one should at a minimum:
• Seek expert review of the questionnaire (e.g. from colleagues)
• Conduct a pre-test on at least 20 test subjects followed by debriefings, and
• Perform an item distribution analysis
• For a first pre-test, one could use a small sample of convenience: co-workers, friends
• Can help determine whether all necessary items are included to meet study
aim, and provide advice on wording, format, etc.
• Data analyst also need to review content and format to identify problems,
before data collection begins, which might otherwise arise at the data
processing and analysis phase.
166
o Cognitive/intensive interviews with respondents
• To observe how items were understood and answered
• Cognitive interviews can also use item-specific probes to understand how specific
concepts are understood.
• E.g.: the following questions would be appropriate for the question, ‘How many
people are there in your household?’,
• Did the respondent include him/herself in the count?
• To what period did the respondent think the question related
• i.e. if a household member had been temporarily away, would he/she have been included?
• How did the respondent interpret the term household?
168
o Interviewer and respondent debriefings
• To identify items with problems
• Can be:
• Interviews conducted immediately or soon after
• Questionnaires, or the questionnaire is completed
• group meetings
170
o Interviewer–respondent interaction coding scheme
• Interviewer behaviors in question-asking
• Substantive change: makes a substantive change in reading question
• Incorrect prompt: repeats question not as written or suggests answer
• Skips question: skips applicable questions
• Reads wrong question :reads question that was not supposed to be read
• Respondent behaviors
• Interrupt: Interrupts question with an answer
• Uncertain: expresses uncertainty about question, requests clarification
• Uncodeable: response does not meet question objectives, uncodeable
• Don’t know: Offers a ‘don’t know’ response
• Refusal: refuses to answer
171
o Interaction or behaviour coding
• A monitor listens to the interview (usually a tape recording) and codes specific
behaviors of the interviewer and the respondent for each question.
• Length of pause between end of question and answer (reaction time or response
latency) is sometimes coded as an indicator of difficulty in answering the
question.
• By analyzing frequencies of behaviors for each question, one can determine problem
questions
• E.g.: which questions interviewers do not read as worded (often to attempt to
improve the meaning of the question).
174
o Percentage missing experiments
• As part of pre-testing, two (or more) versions of each question can be tested
• usually with each questionnaire version given to a different group of subjects.
• From this information, the researcher can select the best alternative of each
question for the final questionnaire.
176
o Revising the questionnaire
• Typically a questionnaire is tested and revised several times before being used in
the field.
• Once problems are identified during a pre-test, they can be resolved through
changes in:
• Questionnaire wording
• Questionnaire format, or
• Interviewer training
• However, often a revised question which solves problems identified by some
respondents will lead to problems for other respondents.
• Often adding explanations to make the question clearer to some respondents will
make the question longer, more burdensome, or even confusing to others.
• Thus it seems reasonable to modify only questions that are problematic for a
moderate proportion of subjects (e.g. ≥10%) and/or have a simple solution.
177
o Think-aloud techniques
o Paraphrasing the question by respondent
o Specific probes about how question was answered
o Debriefing questionnaires
o Group sessions
o Respondent
o Focus groups
o Observation of interviews and possible interaction coding
• To identify interviewer and respondent behaviors for each question, such
as rewording of question by interviewer or uncertainty about the answer
expressed by respondent
Item equivalence
If conceptual equivalence exists
Determines relevance and acceptance of specific items in the target population.
E.g.:
o It does not make sense to ask about a person’s ability to climb of stairs if the
questionnaire will be used in a setting consisting solely of single-story dwellings
o It maybe taboo in some cultures to inquire about certain topics
• These questions may have to be reworded or replaced before the translation begins
181
Semantic equivalence
Refers to the meaning attached to each item.
E.g.:
• In China white is the color for mourning, in others connote purity.
• Such problem exists even within the same language as spoken in different countries.
182
• One problem with translation and back-translation process has to do with the
translators themselves.
183
Operational equivalence
• Goes beyond the items themselves
• Looks at whether the same format of:
• The scale
• The instructions, and
• The mode of administration can be used in the target population
185
Measurement equivalence
• People should translate into their native tongue and should be aware of the
intent of each item and the scale as a whole.
186
Respondent burden
• Concerns level of demand placed on respondent necessary to answer questions.
187
• In general, the following will all add to respondent burden
188
• Consequences of increased burden on respondent:
189
• There is often a conflict in:
• Collecting necessary information necessary to the objectives of a study
• Keeping questionnaire to an acceptable length, and
• Minimizing respondent burden
190
General approach to questionnaire development
Avoid anything that could confuse, bore, embarrass, or otherwise burden either the interviewer or
respondent.
• Element encompasses:
• Making questionnaire as clear, short, simple, friendly, and attractive as possible, and
• Making all possible efforts to keep motivation high
Account for what is known about psychological response stages and influences of personal
characteristics.
Draw from what is known already about the validity of specific questions.
• It is unwise to produce a questionnaire item de novo if a suitable version of the item is:
• Known to exist
• Has been used in other studies, and
• Has produced reliable and accurate information, except when there are reasons to believe
that a translation, update, or cultural adaptation is necessary.
Make maximal use of possibilities to promote data integrity after questionnaire filling.
191
Scaling responses
• A method must be choose by which responses will be obtained once a set of
questions devised
192
1. Visual analog scale (VAS)
• Graphic rating method
o While the lower limit is often easy to describe (none of the attributes being
measured), the upper end is more problematic.
194
• The reliability of a scale is directly related to the number of items in the scale
• So that the one-item VAS test is likely to demonstrate low reliability in
comparison to longer scales.
• Strengths of VAS
• Its simplicity contributes to its popularity
195
4. Ordinalized scales
• Sometimes the measured attribute is continuous but the scale for
measurement is ordinalized
• Optimal number of levels is usually in the range of 5–7
• Ordinalized scales include the following types:
4.1. Horizontal options lists with circles
• Self-reported health
• Excellent/very good/good/fair/poor
197
3. Juster scale
• Variant of adjectival scale
• Combines descriptors of probabilities with numerical ones
• Used mostly for subjectively estimating the probability of an event
• Ordinal levels described by a numerical probability combined with a worded
interpretation of that same probability
198
4.2. Likert scales (‘agree–disagree’ scale)
202
One pole or two
• Number of ‘poles’ of the factor being assessed: major difference between VAS,
adjectival scales and Likert scales
- Interviewer’s attitude toward the study and the respondent will influence the
results.
204
Interviewer: Training
• Training is key
• Overall goal of training :
o To produce interviewers who know what is expected of them and how to
answer questions
o Know where to turn if problems arise unexpectedly in the field
• Prepare a manual
- Most efficient way to make sure trainees have all information they need to perform their job
- Can explain what they are to do and when, where, why, and how they are to do it.
205
Conducting interviews
Guidelines for conducting interviews:
• Make a brief introductory statement that will:
Describe who is conducting the interview
e.g. “Dr/Mr/Mrs/pro xx yy from university of Gondar ”
207
Monitoring interview quality
• To assure getting the most accurate data possible
• Go with an interviewer or spend time with interviewers to make sure what they
are doing is appropriate for the study’s purposes.
208
• Provide extra copies of all supplementary materials.
- If data collectors are to mail completed interviews back, make sure to give them extra forms and
envelopes.
• Provide an easy-to-read handout describing the purpose of the interview and the
content of the questions.
• Provide a schedule and calendar so that interviewers can keep track of their
progress
• Consider providing the interviewer with visual aids.
- Extremely important when interviewing people in-person whose ability to speak or read may be
limited.
• Consider the possibility that some interviewers may need to be retrained and
make plans to do so.
209
Questionnaire Administration
• For questionnaire administration it is important to keep in mind that anything
that can:
• Confuse
• Distract
• Bore
• Embarrass
• Burden the respondent or the interviewer tends to adversely affect accuracy
and completeness of the recorded responses.
210
• A self-administered questionnaire could also include a request for the
respondent to check that all questions have been answered.
• The conclusion should also contain the address for the return of a mailed
questionnaire; while an addressed return envelope will usually be included, it
may have become separated from the questionnaire.
211
• The important choices to make included:
212
• One should make sure to always record the type of respondent used
• E.g.:
• Self-about-self
• Mother-about-child
• Other-caregiver-about-child
• Etc.
213
Main styles of interviewing
• Style of interviewing tends influence on the accuracy of the responses
• Standardized interviewing
• All interactions with respondent are:
• Prescribed, and
• Written in the interviewer’s guide as a step-by-step process
• Rules out most interviewer influences on responses.
• Conversational interviewing
• Allows interviewers to interact freely with respondents
o Minimizes errors due to poor understanding of a question by the respondent
o Introduces some interviewer variance
214
• Conversationally flexible interviewing
• There can be a standardized approach for one question and a conversational one for
another question.
• Leads to:
• The same accuracy as standardized interviewing when the question is easy to answer
• Allow for better accuracy than standardized interviewing when the question is difficult
215
Training of questionnaire administration
• Provide detailed instructions in a user’s manual
• Train each interviewer and has manual during each interviewer including:
• Moving through the questionnaire at an appropriate pace
• Writing legibly
• Using permanent ink, etc.
• Sufficient training should ensure that the interviewer establishes rapport
• Each interviewer should be trained extensively on how and when to use the instruction sheets.
• It should be a formal obligation for the interviewers to have the instruction sheets available for
consultation during each interview.
• Prepare a Standard Operating Procedure based on the User Manual and field logistics
• Prevent deviation from the study protocol 217
Questionnaire administration: Ethical considerations
218
Questionnaire administration: Methods
• Having developed a questionnaire how to administer it is the next consideration.
219
• Face-to-face interviews
• A trained interviewer administering the questionnaire on a one-to-one basis
• Either in office or subject’s home (more usually)
• Interview at home serves to:
• Put the respondents at ease….. familiar surroundings, and
• Increase compliance……………..subjects do not have to travel
• Involves greater cost to the investigator
• Possibility of interruptions…………. Telephones, family members…
220
o Advantages of face-to-face interview
• Interviewer is sure who is responding
• Not the case with telephone or mail administration
• Anyone in the household can answer or provide a second opinion for the respondent
• Allows non-verbal communication: can motivate respondent to reply
• Interviewer can determine if the subject is having any difficulty understanding items
• Whether due to:
• Poor grasp of the language
• Limited intelligence
• Problems in concentration
• Boredom
• Allows interviewer to rephrase question in terms the person may better understand, or to probe for
a more complete response.
• Flexibility afforded in presenting items, as questions in interview can range from ‘closed’ to ‘open’
221
• Since many immigrants and people with limited education understand the spoken language
better than they can read it, and read it better than they can write it, fewer people will be
eliminated because of these problems.
• Closed questions, which require only a number as a response, such as the person’s age,
number of children, or years of residence, can be read to the subject.
• If it is necessary for the respondent to choose among three or more alternatives, or to give a
Likert-type response, a card with the possible answers could (and most likely should) be
given to the person so that memory will not be a factor.
• Open questions can be used to gather additional information, since respondents will
generally give longer answers to open-ended questions verbally rather than in writing.
• This can sometimes be a disadvantage with verbose respondents.
222
Disadvantages of face-to-face interviews ?
• More expensive to administer than any other method
• Number of possible interviews that can be done in one day may be limited.
• Many people work during the day, and only evening interviews are convenient
223
• If target language is not native language for a sizable proportion of the
respondents:
• Translation into one or more foreign languages bilingual (or multilingual)
interviewers must be found.
• This may not be unduly difficult if there are only a few major linguistic cultures, but can
be more of a problem in cities that attract many immigrants from different countries.
224
Telephone questionnaires
• Meeting with subjects in person to interview over the phone
225
• Advantages ??
• Reduction in the number of omitted items
227
• Disadvantages ?
• No assurance who the person is at the other end of the line
• Primacy effect
• Likely to occur with subjects tending to endorse categories that are read
towards the beginning rather than towards the end of the list
228
Mailed questionnaires
• Advantages
• Far the cheapest method of the three
• Can be coordinated from one central office, even for national or international
studies
• In contrast, personal interviews usually require an office in each major city, greatly
increasing the expense.
229
• Disadvantages
• Subjects may omit some of the items
230
How to increasing return rate?
• Many techniques although not all have proven to be effective
1. A covering letter
o Most important part of a mailed questionnaire
o Determine if the form will be looked at or thrown away
o Detailed description of letters and their contents should be stressed.
o Should begin with a statement emphasizing:
• Why the study is important, and
• Why that person’s responses are necessary to make the results interpretable, in that order
o Include a:
- Promise of confidentiality
- Description of how the results will be used, and
- Mention of any incentive
o Should be signed by hand
o With the name block under the signature indicating the person’s title and affiliation.
o Should letterhead
o Subjects are more likely to respond if the research is being carried out by respected organization
o Should fit onto one page
231
2. Advance warning that the questionnaire will be coming.
232
3. Giving a token of appreciation
• Use of incentive is predicated on ‘social exchange theory’,
• Which states that even small incentives are effective because they inculcate a sense of social
obligation on the respondent.
• Most often, this is a sum of money, which significantly increases the return rate.
• Relationship between amount of incentive and return rate flattens out quite quickly;
• Amounts as low as $0.50 or $1.00 doubles it, but $15.00 increases the return rate only 2.5
times
• Sharp increase in the odds of return up to $1.00, then a smaller increase until $5.00, and no
further increase after that.
• It doesn’t make sense for financial incentives to exceed $5.00, and even $1.00 is
sufficient in many cases.
• The explanation for this somewhat paradoxical result is that when the value of the
incentive starts approaching the actual value of the task, then ‘social exchange’
becomes more like an ‘economic exchange’, and the person feels less of a social
obligation to reciprocate. 233
• Other incentives that have been used with varying degrees of success have included
• Lottery tickets
• A chance to win a
• Savings bond
• Prize
• Pens or pencils
• Tie clips
• Unused stamps
• Diaries
• Donations to charity
• Key rings
• Golf balls
• Letter openers
• But these seem to be much less powerful than cold, hard cash
234
4. Anonymity
• Evidence on effect of anonymity on response rate is contradictory
235
5. Personalization
• Some people see a personalized greeting using their name as an invasion of privacy and a threat to
anonymity.
• First, the letter can be addressed to a group, such as dear ‘colleague’,resident of . . . neighborhood’,
‘member of . . . ’
• Adding a handwritten ‘thank you’ note at the bottom of a covering letter increases response
rate by 41%.
• Another method to balance anonymity and personalization is to have the covering letter
personalized, and to stress the fact that the questionnaire itself has no identifying information on
it.
• Be aware that personalization may have some detrimental effects on questionnaires and surveys
sent by e-mail or over the Internet
• Other aspects of personalization include typed addresses rather than labels, stamps rather than
metered envelopes, and regular envelopes rather than business reply ones. 236
6. Enclosing a stamped, self-addressed envelope
• Asking the respondents to complete a questionnaire is an imposition on their time;
asking them to also find and address a return envelope and pay for the postage is a
further imposition, guaranteed to lead to a high rate of non-compliance.
237
7. Length of the questionnaire
• It seems logical that shorter questionnaires should lead to higher rates of return
than longer ones.
• However, the research is mixed and contradictory in this regard.
• When the questionnaire is long (over roughly 100 items or 10 pages), each
additional page reduces the response rate by about 0.4 per cent.
• Up to that point, the content of the questionnaire is a far more potent factor
affecting whether or not the person will complete it.
• Thus, it seems that once a person has been persuaded to fill out the form, its
length is of secondary importance. 238
8. Pre-coding the questions
• Although this does not appear to appreciably increase compliance
• On the other hand, subjects may feel that they want to explain their answers, or
indicate why none of the alternatives apply (a sign of a poorly designed question).
• The questionnaire can make provisions for this, having optional sections after each
section or at the end for the respondent to add comments.
239
9. Follow-ups
• To maximize returns
• Four -step process:
1. 7–10 days after the first mailing
• postcard should be sent, thanking those who have returned the questionnaire
• Reminding others of the study’s importance
• Indicate to those who have mislaid original where they can get another copy of
questionnaire.
3. Send another letter, questionnaire, and envelope via registered or special delivery mail.
4. Call those who have not responded to the previous three reminders
• May be impractical for studies that span entire country 240
• May be feasible for more local ones
Use of diaries
241
• Diaries
• refer to detailed prospective records of exposure kept by the subject.
• Used:
• To measure
• Physical activity
• Sexual activity
• Alcohol consumption
• Dietary intake(food records)
• Symptoms
• Minor illnesses
• Medication use, and health diaries
• Medical care
242
Forms of diaries
• Open-ended (generally)
• Allow more accurate specification of the type of exposure
• Monthly calendar
• Diaries in which few or no entries are expected for most days
• E.g.: diary of doctor’s visits
• Ledger format
• When disparate behaviors are being recorded (e.g. both symptoms and doctor’s visits),
diaries can have a with separate sections for the different types of entries.
244
• Electronic diaries
• Can be used instead of paper-and-pencil diaries in many situations
• Can be used
o For frequent simple behaviors
• Drinking
• Use of medication
o For subjective measures
• Symptoms
• Some electronic or mechanical devices can automate data capture with little
subject involvement, such as:
• Pill containers that monitor medication compliance or
• Motion sensors that record physical activity
246
Example of an exposure diary: food records sample page
247
Example of diary recording instructions for subjects: food records
248
Advantages and limitations of diaries
Advantages:
• Highly accurate in measuring current behavior
• Do not rely on memory
o A measure of past exposure only if current and past behavior are highly correlated
o Accurate measures of average current behavior only if sufficient number of days or weeks are
captured
o Demand more time and skills from subjects than do other methods
o Training of subjects in the skills needed to keep an accurate diary can be time consuming for
both subjects and study staff
o Subjects need the motivation to maintain the diary over the required time period.
• These limitations may make it difficult to recruit a representative sample of the population
of interest and to obtain a high response rate. 250
• Response rates across studies of health diaries found a range of 50–96%.
• Participation rates and rates of full completion of diaries among participants have been found to
be lower for those with:
• Less than a high-school education
• Those of lower social class
• Those who are over age 65, and
• Those who have experienced recent stressful life events
• Training and monitoring of subjects and the lengthy coding procedures tend to make the use of
diaries expensive.
251
• These disadvantages have led to limited use of diaries in epidemiology.
• Diaries have been used primarily as a comparison method for validation studies of
questionnaires or other methods.
• Diaries are generally more accurate than questionnaires if they are collected over a sufficient time
period.
• The validity of nutrients computed from a one-week diet diary is somewhat greater than that of a
retrospective food frequency questionnaire, using as the standard three other one-week diaries
completed over a one-year period.
• The correlation of calorie-adjusted fat intake from the one-week diary with the standard was 0.64,
while the correlation of fat intake from the questionnaire with the standard was 0.52.
• A seven day diet diary was much more accurate than a food frequency questionnaire when each was
validated against urinary nitrogen and potassium (r = 0.55–0.67 for the diary versus r = 0.29–0.32
for the questionnaire). 253
• Automated diary methods are generally thought to be more accurate than paper diaries.
• Electronic diaries are more accurate for pain or symptoms because they allow the time of data
entry to be captured, whereas paper diaries can be filled in retrospectively hours (or days) after
the instructed times of diary entries.
• Completeness of record keeping (e.g. as measured by number of completed diary days) has also
been found to be higher with electronic diaries.
• However, objective ‘diary’ measures via devices are not always more accurate than self-reported
diaries.
• E.g: in studies of measuring physical activity in children, there have been conflicting results
as to whether activity monitors (accelerometers) are more accurate than diaries or recalls
254
when each was compared with heart rate monitors.
Sources of error and quality control procedures for diaries
• Although diaries are generally more accurate than questionnaires, they are still subject to a range
of errors.
• Development of a:
• Detailed study procedure manual
• Pre-testing of the procedures, and
• Monitoring of data collection are important in any data collection effort
255
Selection of diary recording period
• Although diaries can only directly measure a few days or weeks of exposure, they
are usually intended to reflect the subject’s exposure over some longer period of
time.
• The diary should include a sufficient number of days and a sufficient spread of
days over time to account for day-to-day, weekday-to-weekend, month-to-month,
or season-to-season variation in exposure.
• E.g.: nutrient intake has been shown to differ on weekends compared with
weekdays and to vary by season.
256
Reactivity
• Diaries are intended to measure the subject’s usual behavior over some time period.
• One concern with diaries is that the act of keeping a diary may lead to a change in behavior.
• This is an example of reactivity
• i.e. that ‘the process of measuring may change that which is being measured’.
• Record keeping may sensitize subjects to their actions or feelings, or may lead them to change
towards more socially desirable or health-conscious behaviors.
• E.g.: recording recreational exercise might lead to an increase in physical activity during
the diary period, or recording symptoms may sensitize a subject to recognize minor
symptoms which might otherwise go unnoticed.
257
• Subjects may also change behaviors in order to reduce record keeping.
• There is also empirical evidence that the number of items recorded in diaries drops over time,
although this may be due to under-reporting of behaviors due to fatigue with the study rather
than subjects actually modifying their behavior.
• Discussion of the problem of reactivity during training of subjects might reduce this source of
error.
• Subjects could be told that for scientific reasons it is important to assess their usual behaviour.
• For example, part of the written instructions for a diet diary could include the following.
• Don’t change what you usually eat
• Eat as you normally do
• Give a complete, true record
• No one is judging what you eat.
258
Inaccuracies due to study subjects as data collectors
259
• Many of the quality control procedures previously outlined to minimize errors by data collectors can
and should be adapted to the situation where the subjects are the data collectors.
• In particular, study subjects should be trained, preferably in person, in the diary recording techniques.
• An overview of the diary recording methods is presented, followed by detailed specific examples.
• The trainers and those who develop the written instructions must be familiar with the coding scheme,
so that the level of detail needed for accurate coding of items is recorded by the subjects.
• For example, if the database for coding food records has different codes for fresh, canned, and frozen
vegetables, then subjects must record this information.
• Subjects should also be taught how to handle any unusual situations, such as illness, or food eaten
outside the home which cannot be weighed.
• The training session would also include review of examples of completed forms, and practice in
completing the diary. To reduce the staff time needed to train each subject, part of the instruction
could be presented on video.
260
• Detailed written instructions should also be given to each subject.
• In addition to the material covered in the training session, the instructions should
include the date to start the diary, the date to end the diary, and the name and
telephone number of the person to call with any questions.
• Subjects should also be instructed in how frequently to record information. Frequent
recording reduces errors due to poor memory ,but increases the subject burden.
• Diaries requiring periodic entries during the day should take the form of a small
booklet or device that can be carried in a handbag or pocket.
• Beyond understanding the data recording procedures, subjects need to be motivated
to spend the time and effort to record information accurately.
• Enthusiastic trainers who can explain the importance of each subject’s involvement
can help to motivate subjects.
261
• After initial training, the data collection needs to be monitored.
• The quality control principles of continued training and motivation of the data collectors
also apply to diaries.
• Phone calls often need to be made to remind subjects of the day to begin, and then again
to ask about any problems within the first few days of keeping the diary.
• Subjects should be asked to review their records for accuracy and completeness each day.
• After the diaries are returned, they should be reviewed immediately by a study editor for
unclear entries or missing data.
• Long diaries should be reviewed periodically.
• The editor should review any specific problems with the subject, and should also discuss
any general recording problems the subject appears to be having if future diaries are to
be collected.
262
Example of topics covered in training subjects to complete a
diary: food records
263
• Errors in coding
• open-ended diaries can yield a large amount of information which requires detailed coding.
• For example open-ended food diaries require code numbers for thousands of types of foods
which relate to a database of nutrients in foods.
• Because coding of diaries can be a complex task, quality control procedures specific to this
step need to be developed.
• Coders should be selected who are familiar with the exposure (e.g. nutritionists for food
diaries) and are meticulous in dealing with details.
• As in any type of study, there is a need for training of coders and practice sessions, and for
monitoring of coders’ work by periodic re-coding by another coder.
• In large-scale studies, coders may be given an examination to become ‘certified’ in the
coding procedures; this improves accuracy and standardization across coders.
• Coders should refer uncertain situations to the lead coder or editor for resolution, and staff
meetings should include exercises and discussion of items that have led to coding problems.
• The codebook containing the codes and detailed coding procedures should be updated by
the lead coder whenever changes are made.
264
• An alternative to manual numeric coding of all diary entries is direct key
entry of the text (or appropriate key words).
267
References
• Ellenberg S, et’al. Meausrement in medicine: Practical Guides to Biostatistics and Epidemiology
• C.B. Terwee et al. Quality criteria were proposed for measurement properties of health status
questionnaires. Journal of Clinical Epidemiology 60 (2007) 34e42
• Cordier S. et’al. Handbook of Epidemiology: exposure assessment. Second Edition
• Lee E-H, et’al. Evaluation of Studies on the Measurement Properties of Self-Reported Instruments.
Asian Nursing Research 14 (2020) 267e276
• Principles of Exposure Measurement in Epidemiology: Collecting, Evaluating, and Improving
Measures of Disease Risk Factors: In: Whiteet E et’al, chapter 1; page 1-50
• L.B. Mokkink et al. The COSMIN study reached international consensus on taxonomy, terminology,
and definitions of measurement properties for health-related patient-reported outcomes. Journal of
Clinical Epidemiology 63 (2010) 737e
• COSMIN Taxonomy of Measurement Properties: COSMIN. https://
www.cosmin.nl/tools/cosmin-taxonomy-measurement-properties/
• Di Malta et al. An Application of the Three-Step TestInterview (TSTI) in the Validation of the
Relational Depth Frequency Scale. Journal of Humanistic Psychology 00(0)
268