Unit Three-Measurement Instruments V4 - 2 - 3

Unit Three: Measurement Instruments
Destaye Shiferaw Alemu

September 2023
1
Objectives
At the of the lecture, you are expected to:
• Define basic properties of a measurement instrument
• Identify sources of variation in measurement
• Describe steps in the development of a measurement instrument
• Describe the psychological stages of the response process in questionnaire
• Provide practical recommendations for questionnaire design and administration
• Identify appropriate questionnaire administration method for a given research question
• Describe the practical applications of diaries as data collection instrument
• Explain how to assure validity and reliability of questionnaires
• Develop a questionnaire for a specific research question
• Identify research questions appropriate for data collection using diary

2
• Develop a diary for a research question
Content
• Properties of measurement instrument
• Development of a measurement instrument
• Questionnaire design
• Stages of response
• Questions organization in a questionnaire
• Pre-testing questionnaires
• Translation in a questionnaire
• Questionnaire administration
• Format and structure of questionnaires
• Response scaling
• Diaries
3
Introduction
• Field work in epidemiological studies consists of collecting data in natural
and experimental settings to answer research questions using instruments
• Researchers frequently measure objective parameters and subjective

parameters
• Unlike objective parameters, subjective parameters cannot be directly

measured,
• so that they are usually measured using instrument comprising questions about the
attributes of the subjective parameter being measured.
4
• Instruments need to:
• Be suitable for the

purpose of the study
• Reflect quality aspect

of measurement
properties
5
Reproducibility
Dependability
Repeatability Precision
Reliability
Stability
Agreement
Variability Consistency
Concordance
6
• A reliable instrument:
• Yields the same results every time it is used to measure the same
object, assuming the object itself has not changed
• Provides a consistent measure of important characteristics despite

background fluctuations.
• The degree to which a measurement is free from measurement error
• The extent to which scores for patients who have not changed are the
same for repeated measurement under several conditions (extended
definition)
• Reliability is an essential requirement of all measurements in clinical

practice and research.
7
Reliability
Internal Multiple
Test retest Equivalence/ Itra-rater Inter-rater
Alternate consistency form
8
Reliability
Internal Multiple
Test retest Equivalence consistency Itra-rater Inter-rater
form
• Same instrument (test) administered to same respondent at different times
• Results (scores) are from one time to the next are compared to obtain an correlation value
• r ≥ 0.70 considered (usually)
• Time interval between measurements needs to be:
• Sufficiently long?
• To prevent recall bias
• Not too long?
• To allow changes to occur in the characteristics of the respondents related to the
construct being measured
• The attributes of the construct being measured should be temporally stable
• It is not appropriate for a state construct to be expected to change over time,
such as mood
9
Reliability
Equivalence/ Internal Multiple Inter-
Test retest consistency Itra-rater
Alternate form rater
• If two different forms of a instrument are supposed to measure the same outcome
• Where measures have alternative formats
• E.g.: a version for interviewer administration and a version for self-administration, or a long and
shorter version – they should be highly correlated.
• Can be assessed by giving different forms of the instrument to two or more groups that have
been randomly selected.
• The forms are created either by using differently worded questions to measure the same
attributes or by reordering the questions
• To test for equivalence, administer the different forms at separate time points to the same
population, or if the sample is large enough, you can divide it in half and administer each of
the two alternate forms to half of the group.
• In either case, you would first compute mean scores and standard deviations on each of the forms
and then correlate the two sets of scores to obtain estimates of equivalence.
• Equivalence reliability coefficients should be at least 0.70.
10
Reliability
Equivalence/ Internal Multiple Inter-
Test retest consistency Itra-rater
Alternate form rater
• How internally consistent the items are in measuring the characteristics that they are supposed
to measure.
• The degree of interrelatedness among items
• Measure of the extent to which items assess the same construct
• Assessed via coefficient alpha (α)
• The best known parameter for assessing the internal consistency of a scale
• Describes how well different items complement each other in their measurement of the same
quality or dimension
• The basic principle of examining the internal consistency of a scale is to split the items in half
and see whether the scores of two half-scales correlate
• A scale can be split in half in many different ways
• The correlation is calculated for each half-split.
• Cronbach’s alpha represents a kind of mean value of these correlations, adjusted for test
length.
11
• A well-accepted guideline for the value of Cronbach’s alpha is between 0.70 and 0.90.
• Decide if your instrument needs to consider internal consistency
• In which statement/s do you think internal consistence is not important?
• Internal consistency is important:

• A ten-item interview is conducted to find out patients’ satisfaction with medical care in
hospitals. High scores mean much satisfaction; low scores mean little satisfaction.
• To what extent do the ten items each measure the same dimension of satisfaction with
hospital care?
• Internal consistency is not important:

• A ten-item interview is conducted with patients as part of a study to find out how hospitals
can improve. Eight items ask about potential changes in different services such as the type of
food that might be served, the availability of doctors, nurses, or other health professionals,
and so on. One item asks patients for their age, and one asks about education. Since this
interview is concerned with views on improving eight very different services and with
providing data on age and education of respondents, each item is independent of the others.
12
Interpretation of Cronbach’s alpha
• Internal consistency assessed requires only one measurement in a study population
• It is easy to calculate ,but often interpreted incorrectly
• Cronbach’s alpha does not measure
It is not a measure of the unidimensionality of a scale

- When a construct consists of two or three different dimensions, then a reasonably high value for
Cronbach’s alpha can still be obtained for all items.
- Unidimensionality cannot be assessed with Cronbach’s alpha .
It does not assess whether the model is reflective or formative

- It occurs quite often that only when a low Cronbach’s alpha is observed, one starts to question whether
one would expect the items in a measurement instrument to correlate (i.e. whether the measurement
instrument is really based on a reflective model).
- But it is not as easy as simply stating that when α is low, it probably is a formative model.
- An alternative explanation for a low Cronbach’s alpha is that the construct may be based on a reflective
model, but the items are poorly chosen.
- So, Cronbach’s alpha should not be used as a diagnostic parameter to distinguish between reflective
13
and formative models.
3. It is sometimes argued that α is a parameter of validity
-Because adequate α suggests only that, on average, items in the scale are highly correlated
-They apparently measure the same construct, but this provides no evidence whether or not the
items measure the construct that they claim to measure.
- The items measure something consistently, but what that is, remains unknown.
- So, internal consistency is not a parameter of validity
-In the COSMIN taxonomy the measurement property ‘internal consistency’ is an aspect of
reliability
14
- The value of α is highly dependent on the number of items in the scale
- This principle is applied for item reduction:
- When α is high we can afford to delete items to make the instrument more efficient
When the value of α is too low, we can increase the value by formulating new items, which are
manifestations of the same construct.
- This principle also implies that with a large number of items in a scale, α may have a high value,
despite rather low inter-item correlations.
• Kudere-Richardsone 20 (KR-20) values

• Appropriate to assess internal consistency only for an instrument with binary responses
15
Reliability
Equivalence/ Internal Multiple
Test retest
Alternate consistency form
Itra-rate Inter-rater
• Refers to the correlation between subdomains of the scale
• Reliability of the same rater’s scores, of the same subjects, on different occasions.
• Concordance of scores achieved by different raters on the same occasion.
16
Sources of variation
• Repeated measurements may display variation arising from several sources:
• Observer variability
• Due to the observer, and includes choice of words in an interview and skill in using a
mechanical instrument
• Instrument variability
• Due to the instrument, and changing environmental factors (e.g., temperature), aging
mechanical components, different reagent lots, and so on.
• Subject variability
• Due to intrinsic biologic variability in the study subjects unrelated to variables under study,
such as variability due to time of day of measurements or time since last food or medication
17
Improving the reliability of measurements
• Reliability concerns the anticipation, assessment and control of sources of variation, and that the
ultimate aim of reliability studies is to improve the reliability of measurements.
• A number of strategies for this purpose:
• Restriction
• Avoid a specific source of variation
• E.g.: when we know that the amount of fatigue that patients experience increases during the day,
we can exclude this variation by measuring every patient at the same hour of the day.
• Training and standardization

• Training of raters or by standardization of the procedure
• To use exactly the same text to instruct the patients, to do that with a similar amount of
enthusiasm, and there should be agreement on whether, and to what extent, they should
encourage the patients during the performance of the tests.
• Averaging of repeated measurements

• Averaging repeated measurements reduces the measurement error.
• Only affects random error, not systematic error 18
2. Validity
• The degree to which an instrument truly
measures the construct(s) it purports to
measure
• Different types of validity
19
2.1. Content validity
• The degree to which the content of an instrument is
adequately reflects the construct to being measured the
following aspects:
• Face validity
• Relevance
• Comprehensiveness
• Comprehensibility
• The most important measurement property of an

instrument
• It may affect the likelihood of the instrument fulfilling other
measurement properties
• Focuses on whether the content of the instrument

corresponds with the construct that one intends to measure,
with regard to relevance and comprehensiveness
• The availability of a definition of the construct being

measured should be a prerequisite for the content validity of 20
a new instrument(conceptual definitions)
• First aspect of content validity
2.1.1. Face validity
• The degree to which a measurement instrument looks as though
it is an adequate reflection of the construct to be measured
• Concerns an overall view, which is often a first impression,
without going into too much detail
• Subjective assessment
• No standards on how it should be assessed
• Cannot be quantified
• As a result, the value of face validation is often
underestimated
• Its is a very strong argument for not using an instrument, or to
end further validation
• E.g.: when selecting a questionnaire to assess physical

activity in elderly, questionnaires containing a large number
of items about activities that are no longer performed by
elderly people are not considered to be suitable.
• Other questionnaires may be examined in more detail to
assess which ones contain items corresponding to the type
of activities that the elderly perform. 21
2. 2. Criterion validity
• The degree to which the scores of a measurement
instrument are an adequate reflection of a gold
standard
• Refers to how well scores of a measurement instrument

agree with the scores on the ‘gold standard’.
• Applicable in situations I when there is:
• ‘Gold standard’ ,or
• Expected scores for the construct to be measured
based on existing knowledge about the construct.
• Consider when an instrument passes the test of face validation
22
• Content validation studies assess whether the measurement
instrument adequately represents the construct under study.
• Clear description of construct should be emphasized
• Possible purposes of a content validation are:

• Discrimination
• To distinguish between persons at one point in time
• Evaluation
• To assess change over time
• Prediction
• To forecast future outcomes
o All these questions assess whether the items are relevant

for measuring the construct.
23
• For multi-item questionnaires,
• Items should be both relevant and comprehensive for the construct to be
measured.
• Relevance
• Can be assessed with the following three questions:
• Do all items refer to relevant aspects of the construct to be
measured?
• Are all items relevant for the study population,
• E.g.: with respect to age, gender, disease characteristics,
languages, countries, settings?
• Are all items relevant for the purpose of the application of the
measurement instrument?
• Comprehensiveness
• Is the construct completely covered by the items?
24
Consider the following in content validation:
Information about construct and situation

-Construct to be measured should be clearly specified
- Elaboration of the theoretical background and/or conceptual model, and
- Description of the situation of use in terms of the target population,
and purpose of the measurement
-Information about construct should be considered by both the:

- Developer of a measurement instrument
- Who should provide this information, and
- User of a measurement instrument
- Who should collect this information about the construct
25
Information about content of the measurement
instrument
- In order to be able to assess whether a specific measurement instrument
covers the content of the construct, developers should have provided full
details about the measurement instrument, including procedures.
- If the new measurement instrument concerns, or a new laboratory

test, the materials, methods and procedures, and scoring must be
described in such a way that researchers in that specific field can
repeat it.
- If the measurement instrument is a questionnaire, all items and

response options, including the instructions must be available,
either in the article, appendix, on a website or on request from the
authors.
- Furthermore, details of the development process may be relevant,

such as a list of the literature that was used or other instruments
that were used as a basis, and which experts were consulted.
- All this information should be taken into consideration in the 26

Selection of expert panel
-Content validity of a measurement instrument is assessed by
researchers who are going to use it.
- Often biased with regard to their own instrument
Content validity should preferably assessed by an
independent panel
- Should be experts in the relevant field
-E.g.:
- Experts who are familiar with the field of radiology are required
to judge the adequacy of various MRI techniques
- For PROs representatives patients of the target population are
the experts
- They are the most appropriate assessors of the relevance of
the items in the questionnaire, and they can also indicate
whether important items or aspects are missing.
27
 Assessing whether content of the measurement instrument
corresponds with the construct (is relevant and
comprehensive)
-Like face validation, content validation is also only based on

judgement, and no statistical testing is involved.
-The researchers who developed the measurement instrument

should have considered relevance and comprehensiveness during
the development process.
-However, users of the instrument should always check whether

the instrument is sufficiently relevant and comprehensive for
what they want to measure.
-Assessment of content validity by the users is particularly important if

the measurement instrument is applied in other situations
- i.e. another population or purpose than for which it was originally
developed
28
- E.g.:
You want to measure physical functioning in stroke patients,
and you found a questionnaire that was developed to assess
physical functioning in an elderly population.
-Would you apply this instrument , in terms of content
validity, for the elderly population?
- To assess the content validity of this questionnaire, you have to

judge and ensure that:
- All activities mentioned in the questionnaire are relevant
for the stroke population, AND
- No important activities for stroke patients are missed (is
the instrument comprehensive?)
 An accelerometer attached to a belt around the hip to

measure physical activity may adequately detect activities
such as walking and running, but may poorly detect
activities such as cycling, and totally fail to detect activities
involving only the upper extremities.
An accelerometer lacks comprehensiveness to measure total 29
physical activity.
Use a strategy or framework to assess the
correspondence between the instrument and
construct
30
2.2.1. Concurrent validity
• An instrument being validated and a selected criterion
measured at the same time
• Both the score for the measurement and the score for the gold
standard is considered at the same time in its assessment
• Usually assessed for instruments to be used for evaluative and
diagnostic purposes
2.2.2. Predictive validity

• Considered whether the measurement instrument predicts
the gold standard in the future in its assessment
• Often used for instruments to be used in predictive
applications
31
• In concurrent validity and predictive validity, there is usually only one hypothesis that is not
clearly stated but rather implicit.
• The measurement instrument under study is as good as the gold standard.
• In practice, the essential question is whether the instrument under study is sufficiently valid for
its clinical purpose.
• It is not possible to provide uniform criteria to determine whether an instrument is sufficiently

valid for application in a given situation,
• Because this depends on the weighing of a number of consequences of applying the
measurement instrument instead of the gold standard.
• These consequences include:

• The costs and burden of the gold standard versus the measurement instrument,
• Consequences of FP and FN classifications resulting from the measurement instrument
• For the predictive validity, an instrument being assessed is administered first, and then a criterion
32
instrument is administered after an appropriate interval.
General design of criterion-related validation consists of the following steps:
Identify a suitable criterion and method of measurement

• The gold standard is:
• Considered to represent the true state of the construct of interest
• Perfectly valid assessment (in theory): seldom exists in practice
• Usually regarded as ideal by experts in the field
• E.g.: Histological findings in tissues, extracted by biopsy to identify cancer
• PROs, which often focus on subjective perceptions and opinions, almost always lack a gold
standard.
• An exception is a situation in which a shorter questionnaire for a construct is developed,
when a long version already exists.
• Gold standard: long version
• To be able to assess the adequateness of the gold standard, it is important that researchers provide
information about the validity and reliability of the measurement instrument, that is used as gold standard.
33
 Identify appropriate sample of the target population in which the measurement
instrument will ultimately be used
- For all types of validation, the instrument should be validated for the target population
and situation in which it will be used.
- E.g.: if we are interested in the validity of the scores of a measurement instrument in

routine clinical care,
- Measurements should be performed in the same way as in routine clinical care
- i.e. without involvement of experts or any special attention being paid to the quality
of measurements, as is usually the case in a research setting.
Define a priori the required level of agreement between measurement instrument
and criterion
-It is better to decide a priori which level of agreement one considers acceptable
- Prevents one from drawing positive conclusions on the basis of non-convincing data
-When formulating hypotheses, the unreliability of measurements must be taken into account.
-It is difficult to provide criteria for the level of agreement between the scores of the measurement
instrument and the gold standard that is considered acceptable, because this totally depends on 34
the situation.
Obtain scores for the measurement instrument and the gold standard,
independently from each other
-Independent application of the measurement instrument and the gold standard is:
- Requirement the validation of measurement instruments
- Measurement instrument should not be part of the gold standard, or influence it in any way.
- This could happen if the gold standard is based on expert opinion
- When a short version of a questionnaire is validated against the original long version, the scores
for each instrument should be collected independently from each other.
35
 Determine the strength of the relationship between the instrument scores and
criterion scores
-To assess criterion validity, the scores from the measurement instrument to be validated are
compared with the scores obtained from the gold standard.
-Various statistical parameters used at various measurement levels of gold standard and
measurement instruments.
-If both gold standard and measurement instrument have a dichotomous outcome: sensitivity
and specificity
-If measurement instrument has an ordinal or continuous scale: ROCs
-If gold standard is a continuous variable: correlation coefficients
-If measurement instrument and gold standard are expressed in the same units: Bland and
Altman plots and ICCs can be used
36
2.3. Construct validity
• Should be used to provide evidence of validity in situations in

which a gold standard is lacking
• The degree to which scores of a measurement instrument are

consistent with hypotheses, e.g. with regard to internal
relationships, relationships with scores of other instruments
or differences between relevant groups
• Three aspects:
37
2.3.1.Structural validity
• The degree to which the scores of a measurement
instrument are adequate reflections of the
dimensionality of the construct to be
measured
• Determine whether a construct exists of one or more

dimensions, as this has to be taken into account in
further hypothesis testing.
• Assessed by:
1. IRT/Rasch analysis
- Provides rich information about individual items that is
not available using classic test theory
2. Factor analysis (FA)
38
EFA CFA
• Used to reduce the number of items or to
explore the number of factors of a new • Used to test a hypothesized factorial structure
instrument in the absence of a prior based on a theory or previous empirical
hypothesis. evidence
• Applied if there are no clear ideas about • Fit-parameters are used to test whether the
the number and types of dimensions data fit the hypothesized factor structure
• (Often) performed when CFA (i.e. • It is possible to test whether the proposed
confirmation of the existence of
predefined dimensions) is inadequate
model is better than alternative models
• If there is no (or little) information • More appropriate than EFA for assessing the
available on the structure of a construct to structural validity of an instrument if there is
be assessed, it is recommended to conduct already information available on the
EFA to identify the structure, and then dimensionality of the instrument.
CFA can be used to confirm whether or
not the structure provides a good fit. 39
• More appropriate for validation purposes
2.3.2. Hypotheses-testing construct validity
• The relationships of scores on the instrument of
interest with the scores on other instruments
measuring similar constructs (convergent
validity) or dissimilar constructs (discriminant
validity), or the difference in the instrument
scores between subgroups of people (known-
groups validity)
• Basic principle: hypotheses are formulated about

the relationships of scores on the instrument
under study with scores on other instruments
measuring similar or dissimilar constructs, or
differences in the instrument scores between
subgroups of patients.
• These hypotheses have then to be tested.
40
• When designing a psychometric study, it is recommend to formulate hypotheses about the
expected direction and magnitude of the correlations or differences for the validation.
• Then, the validation can be performed by analyzing the data regarding whether or not the
formulated hypotheses were satisfied.
• However, many researchers determine the validity only based on the statistical significance of
the employed statistics, without considering the expected direction and magnitude.
• Although evidence for construct validity is typically assembled through a series of studies, the
process generally consists of the following steps:
Describe the construct

• Detailed description of the construct to be measured is the starting point for construct
validation.
• Indispensable to assess whether a chosen measurement instrument validly measures the
construct of interest 41
Formulate hypotheses about expected relationships with measurement instruments
• Hypotheses can be formulated with regard to expected relationships with instruments assessing:
• Related constructs
• Unrelated constructs, or
• With respect to expected differences between subgroups of patients
• E.g.: if the researchers want a measurement instrument to measure physical functioning, and not pain,
then the hypothesis could be formulated that the measurement instrument should have no correlation, or
only a slight correlation with measurement instruments that measure pain (discriminant validity).
42
Describe the comparable measurement instruments or subgroups to be
discriminated
-Describe measurement instruments with which the measurement instrument under study is
compared, in terms of the constructs they measure, and present data about their measurement
properties
-Present details about the other measurement instruments, to which the new measurement
instrument is related, in terms of the construct(s) they measure and their measurement
properties.
- To assess their similarity or dissimilarity, one must have insight into the content of these
comparable measurement instruments.
-There should be a description of what is known about the validity and other measurement
properties of these instruments in the specific situation under study.
- This part of the validation study is often taken too easily.
-There should references describing the content and measurement properties of these
instruments in the same target population.
-When hypotheses are formulated about differences between known groups, details about the
demographic, clinical and other relevant characteristics of these groups should be presented. 43
 Gather empirical data
• Gather empirical data that will permit the hypotheses to be tested

• This is a straightforward step.
• Attention must be paid to the population and situation in which these data are collected.
• Validation is dependent on these issues, so the study sample and situation should be
representative of the target population and conditions in which the measurement
instrument will be used.
44
Assess whether the results are consistent with the hypotheses
• This step should also be straightforward if the previous steps have been performed correctly,
• It is just a matter of counting how many hypotheses were confirmed and how many were
rejected.
• However, if the hypotheses were vaguely formulated, this step becomes problematic.
• Can one say that a correlation coefficient of 0.35 is moderate, and can one conclude that
subgroups have a different mean value, if this difference did not reach statistical significance in a
small study?
• So, defining explicitly beforehand the correlations and magnitude of differences one considers
acceptable, will prevent the need for these post-hoc, data-dependent decisions.
45
Explain the observed findings
• Discuss the extent to which observed findings could be explained by rival theories or
alternative explanations (and eliminate these if possible)
• Only validation studies with explicitly defined constructs, and hypotheses based on
theory or literature findings, make it possible to draw firm conclusions about the (lack of)
construct validity of the scores of a measurement instrument.
46
2.3.3. Cross-cultural validity/measurement invariance
• The degree to which the performance of items on a translated
or culturally adapted instrument are an adequate reflection
of the performance of items in the original version of the
instrument
• Assessed after translation of a questionnaire
• Usually using:
• Multiple-group CFA or
• Differential item functioning (DIF)
• Starts with an accurate translation process
• Validity of the new, cross-culturally adapted instrument, should be
checked by assessing its construct validity.
• There may be differences in cultural issues apart from differences
induced by the translations,
• Some items in a questionnaire may be irrelevant in other cultures.
• E.g.: the ability to ride a bicycle is very important in the Netherlands,
which almost everybody does for short distance transportation, while in
the USA, cycling is considered as a type of sport, and only a minority of
the population possesses a bicycle. 47
The translation process
• Consists of six steps
Step 1: Forward translation
• Two bilingual translators independently translate the questionnaire from original language
into target language
• Translators should have the target language as the mother tongue
• They make a written report of the translation containing challenging phrases and uncertainties,
and considerations for their decisions.
• One translator should have expertise on the construct under study, the second one being a
language expert, but naive about the topic.
• These types of expertise are required to obtain equivalence from both a topic-specific and
language-specific perspective.
48
Step 2: Synthesis of the forward translation
• Two translators and a recording observer combine results of both translations (T1 and T2 into
T12)
• Results in one synthesized version of the translation
• Written report carefully documenting how they have resolved discrepancies is presented
Step 3: Back translation

• Common translation version (T12) is translated back into the original language by two other
translators with the original language.
• Blinded for the original version of the questionnaire
• Translators are both language experts and are not experts on the constructs to be measured.
• Recommended because experts on the construct under study may know unexpected meanings of
items.
• They have background information about what aspects are relevant thereby increasing the
likelihood of detecting imperfect translations.
49
Step 4: Expert committee composes the pre-final version
• Consists of four translators together with
• Researchers
• Methodologists and
• Health and language professionals
• Contact with developers of the original questionnaire (if possible) to check whether the items
have maintained their intended meaning.
• The expert committee reviews

• All translations and all reports
• Takes decisions on all discrepancies and
• Composes a pre-final version
• Made reports of all considerations and decisions
50
Step 5: Test of the pre-final version
• Completed by a small sample of the target population (15–30) for pilot-testing
• It is then tested for comprehensibility
• Special attention should be paid to whether respondents interpret the items and
responses as intended by the developers.
Step 6: Appraisal of the adaptation process by the developers

• In the end, it is recommended to send all translations and written reports to the original
developers of the instrument/questionnaire.
• They will perform a process audit, but they do not adapt the items.
• After their approval, the translated questionnaire is ready for cross-cultural validation.
51
Assessment of measurement invariance
• A measurement instrument, scale or item functions in exactly the same way in different
populations.
• Several methods that can be used to assess measurement invariance.
• Factor analysis
• Logistic regression analysis
• IRT techniques
• In assessing measurement invariance, the factor structure of data gathered in both the original
and new populations are compared on three points.
• Are the same factors identified in both populations, and
• Are these factors associated with the same items across the two populations?
• Do the factors have the same meanings across the two populations
• i.e. Do the items show the same factor loadings in both populations?
• Do the items have the same mean values (intercepts) in both populations?
52
3. Responsiveness
• The most important objective of measurements in clinical practice and clinical and
health research is to assess whether disease status of patients has changed over time
• The ultimate goal of medicine is to cure patients
• We need measurement instruments with an evaluative purpose or application to

detect changes in health status over time.
• These instruments should be responsive.
• The ability of an instrument to detect change over time in the construct to be

measured’
• Requires a longitudinal research design participants have to respond at least twice
on the instrument so that it is validated over at least one interval.
• Only relevant for measurement instruments used in evaluative applications

• i.e. when the instrument is used in a longitudinal study to measure change over time
• If an instrument is only used for discriminating between patients at one point in
time, then responsiveness is not an issue. 53
4. Interpretability
• Pay attention to the interpretability of the scores when applying a measurement instruments
once evaluating measurement properties.
• For well-known instruments, such as BP measurements, the interpretability will cause no

problems, but for new or lesser known instruments this may be challenging.
• This particularly applies to the scores for multi-item measurement instruments, the meaning
of which is not immediately clear.
• What does a mean value of 9.0 points on the 0–24 scale mean?
• In addition, is an improvement of 2.2 points meaningful for the patients?
• The degree to which one can assign qualitative meaning – i.e., clinical or commonly
understood connotations – to an instrument’s quantitative scores or change in scores
• The degree to which one can assign qualitative meaning to quantitative scores 54
Development of a measurement instrument
55
Why development of new measurement instrument needed?
• Technical developments and advances in medical knowledge mean that new measurement
instruments are still appearing in all fields.
• Existing instruments are continuously being refined and existing technologies are being applied
beyond their original domains.
• Current attention to patient-oriented outcome has shifted interest from pathophysiological

measurements to impact on functioning, perceived health and QOL.
• PROs have therefore gained importance in medical research.
• Measurement instruments used in various medical disciplines differ greatly from each other.
 Details of development of measurement instruments must be specific to each discipline.
• However:
• Basic steps in the development of all measurement instruments are same (methodological viewpoint)
• Basic requirements with regard to measurement properties are similar for all measurement instruments.
56
• Before deciding to develop a new measurement instrument:
• SLR on existing instruments intended to measure the specific concept is indispensable
• LR is important for three reasons. searching for existing instruments:
1. Prevents development of new ones in fields where many already exist.

- Additional instrument would yield results incomparable with studies that used other
instruments, and this would only add confusion.
2. Help to obtain ideas about what a new instrument should or should not look like
- Instruments that are not applicable, or insufficient quality can still provide a lot of
information, if only about failures that you want to avoid.
3. Saves time and effort

- If instrument exists that can be translated or adapted to specific needs
• Only if no instrument is available, should a new measurement instrument be developed.
57
• Developing a measurement instrument is not something to be done on a rainy Sunday afternoon.
• If it is done properly, it may take years
• It takes time because the process is iterative.
• During the development process, we have to check regularly whether it is going well.
• Steps in the development of a measurement instrument
• In practice, these steps are intertwined, and one goes back and forth between these steps, in a
continuous process of evaluation and adaptation.
• The last steps in the development process consist of pilot-testing and field-testing.
• Essential parts of the development phase as in this phase the final selection of items takes place
• If the measurement instrument does not perform well it has to be adapted, evaluated again 58
Development & evaluation of instrument: steps
1. Definition and elaboration of the
construct to be measured
• Most essential questions to be addressed
o What do we want to measure?
o In which target population?, AND
o For which purpose?
• Construct should be defined in detail
• Target population
• Purpose of measurement must be considered
59
 What do we want to measure? construct definition
• Definition of a construct starts with a decision concerning its level in the conceptual model
and considerations about potential aspects of the construct
• Which level in the conceptual model are we interested in?
by answering these questions we are
specifying in more detail what we
want to measure.
60
• If a construct has different aspects, and we want to measure all these aspects, the measurement instrument
should anticipate this multidimensionality.
• Thinking about multidimensionality in this phase is primarily conceptual, and not yet statistical.
• E.g.: in the development of the Multidimensional Fatigue Inventory (MFI), which is a multi-item questionnaire to
assess fatigue, the developers:
• Postulated beforehand that they wanted to cover five aspects of fatigue:
• General fatigue
• Physical fatigue
• Mental fatigue
• Reduced motivation and
• Reduced activity
• Developed the questionnaire in such a way that all of these aspects were covered.
• It is of utmost importance that before actually constructing a measurement instrument to decide which aspects
to include.
• This has to be done in the conceptual phase, preferably based on a conceptual model, rather than by finding
61
out
post hoc (e.g. by factor analysis) which aspects turn out to be covered by the instrument.
 Which target population: target population
• The measurement instrument should be tailored to the target
population and so this must be defined.
• How a measurement instrument should be tailored to its
target population?
• Age, gender and severity of disease determine to a large extent the
content and type of instrument that can be used.
• Very young children are not able to answer questions about

symptoms, so pain in newborns is measured by structured
observation.
• Physical functioning is an important issue in many diseases, but

different measurements may be required for different diseases.
• Instruments to measure physical functioning in patients with spinal
cord lesions, cardiovascular disease, cerebrovascular disease or
multiple sclerosis will all have a substantially different content.
62
• Severity of a disease is also important
• Because pathophysiological findings and symptoms will
differ with severity, as will functioning and perceived
health status.
• A screening questionnaire used in general practice to identify

persons with mild depression will differ from a questionnaire
that aims to differentiate between the severe stages of
depression.
• Other characteristics of the target population may also be

important
• E.g: whether or not there is much comorbidity, or other
circumstances/ conditions that influence the outcome of the
measurements.
• There is no universal answer to the question concerning which

characteristics of the target population should be considered 63
 For which purpose? purpose of measurement
• Three important objectives of measurement in health care:

• Diagnosis
• Discriminative instruments is needed
• Evaluation of therapy or effect of treatment

• Evaluative instruments is needed to evaluate the effects of
treatment or longitudinal changes in health status
• Prediction of future course diagnosis (prognosis)

• Predictive measurements to classify individuals
according to their prognosis
• Prediction models are used to define a set of variables
that best predict this future course.
• The purpose of the measurement clearly has bearing on the

choice of construct to be measured, and it also has
consequences for the development of the instrument
64
2. Selecting items
• When talking about multi-item instruments, one immediately
thinks of questionnaires, but performance tests also contain
different tasks and the assessment of an MRI requires the
scoring of different aspects that can be considered as items.
• For reasons of convenience, we focus on questionnaires.
• The basic methodological principles can be applied to other

measurement instruments as well, such as imaging
techniques or physical tests.
65
• Getting input for the items of a questionnaire: literature and experts
o Literature
- Examining similar instruments in the literature might help:
- To clarify the constructs we want to measure
- To provide a set of potentially relevant items
- Starting form scratch is seldom except with new diseases
- Nowadays there are ‘item banks’ for specific topics

- An item bank contains a large collection of questions about a particular construct, but it is
more than just a collection
- We call it an item bank if the item characteristic curves of the items that measure a specific
construct are determined by IRT analysis.
- Item banks form the basis for computer adaptive testing
- Item banks are an extremely rich source of items that can be used to develop new
measurement instruments
- e.g. to develop a disease-specific instrument to measure physical functioning in
patients with Parkinson’s disease or rheumatoid arthritis
66
o Experts
- Clinicians who treated large numbers of patients with target condition have extensive expertise on:
o Characteristic signs
o Typical characteristics and
o Consequences of the disease
• Instruments to measure these constructs should be developed in close cooperation with these experts.
• At the level of symptoms, functioning and perceived health, the patients themselves are the key
experts.
Patients should be involved in the development of measurement instruments when their sensations,
experiences and perceptions are at stake.
• For the development of performance tests to assess physical functioning, patients can also indicate
which activities cause them the most problems.
• The best way to obtain information from clinicians or patients about relevant items is through FG or
IDIs.
• Developers need to have an exact picture in mind of the construct to be measured; otherwise, it is
impossible to instruct the FGs adequately and to extract the relevant data from the enormous yield of
67
information.
Formulating items: first draft
• Some new formulations or reformulations should always occur,
• Because the information obtained from experts and from the literature must be transformed into
adequate items.
• A new measurement instrument is seldom based completely on existing items,

• So brand new items should also be formulated
• First draft of a questionnaire should contain as many items as possible.
• In this phase, creativity should dominate rigor because there will be ample opportunities for evaluation, item
reduction and reconsideration in subsequent phases.
 The formulation of adequate items is a challenging task, but there are a number of basic rules.
68
Items should be comprehensible to the total target population, independent of their level of
education
- This means that difficult words and complex sentences should be avoided
- Items should be written in simple language that anyone over 12 years of age can understand them.
Terms that have multiple meanings should be avoided
- E.g.: the word ‘fair’ can mean ‘pretty good, not bad’, ‘honest’, ‘according to the rules’ and ‘plain’, and the
word ‘just’ can mean ‘precisely’, ‘closely’ and ‘barely’.
- Respondents may interpret these questions using these words differently, but they will not indicate that
the words are difficult.
Items should be specific
- E.g.: in a question about ‘severity of pain’ it should:
- Be specified whether the patient has to fill in the average pain or the worst pain.
- Be clear to which period of time the question refers
- Patient rate current pain, pain during previous 24 hours, or pain during the previous week?
Each item should contain only one question instead of two or more
Negative wording in questions should be avoided
69
Points to keep in mind
o Consider about the conceptual framework
• i.e. the direction of the arrows between the potential items and the construct
• Formative or reflective model
• Type of model has important consequences for selection of items for multi-item
measurement instrument
70
o In a reflective model:
• Items are manifestations (indicators) of the construct
→ items correlate with each other, and they may
replace each other
• i.e. they are interchangeable
• Not disastrous to miss some items that are also
good indicators of the construct.
• In the developmental phase, the challenge is to come up with as
many items as possible.
• Even items that are almost the same are allowed.
• In practice a large number of items are selected, but these will

later be reduced by special item reduction techniques, such as
FA and examination of item characteristics.
71
o In a formative model
- Each item contributes a part of the construct
- Together the items form the whole construct
- Challenge: to find all items that contribute substantially

to the construct
- Items do not necessarily correlate with each other

- Not interchangeable- one item cannot be replaced by another
Missing an important item inevitably means that the
construct is not measured comprehensively.
72
• Life stress could be measured based on a
formative model.
• Items in that measurement instrument comprised
events that all cause stress.
• One can also think of a measurement instrument

consisting of items that are reflections of stress.
• It is known that stress results in a number of

symptoms, such as ‘troubling thoughts about the
future’ and ‘sleep disturbances’, some of which are
presented on the right-hand side of Figure
• So, in the case of the measurement of stress a

researcher can choose between a formative and a
reflective model.
Conceptual framework for the measurement of stress. a formative and a
reflective model
73
o Difficulty of the items
• The difficulty of items in relation to the target population is another point that must be kept
in mind while selecting items
o Response options
• Statements or questions contained in items must correspond exactly with
response options.
74
Things to keep in mind in the selection and formulation of items
75
Scores for items
Scoring options
• Every measurement leads to a result, either a classification or a quantification of a response.
• Response to a single item can be expressed at nominal level, ordinal, interval or ratio level
Which option to choose?

• To what extent can researchers freely choose the level of measurement of the responses?
• If a measurement is at interval scale, it is always possible to choose a lower level of
measurement.
• However, by choosing a lower level of measurement, information is lost: knowing the exact
plasma glucose level is more informative than knowing only whether or not it is elevated.
• The number of options may differ for research and clinical practice.
• If the doctor has only two options to choose from (e.g. treatment or no treatment) then two categories might suffice.
• So, it depends on the number of categories that are clinically relevant for the doctor.
• In research, we often want to have many options, in order to obtain more detailed distinctions or a
more responsive measure
• Seven categories are about the maximum number of distinctions that people are able to make
from a psycho-physiological perspective
76
3. Pilot-testing
• Development of measurement instrument progresses through a
number of phases: iterative process
• The first draft of the measurement instrument is tested in a small

sample of patients (e.g. 15–30 people), after which adaptations
will follow.
• This pilot-testing is intended to test the:

• Comprehensibility
• Relevance
• Acceptability, and
• Feasibility of the measurement instrument
• Pilot-testing is necessary not only for questionnaires, but also for

other newly developed measurement instruments.
77
Field-testing: item reduction and data structure
• When a measurement instrument is considered to be
satisfactory after one or more rounds of pilot-testing, it has to
be applied to a large sample of the target population.
• Aims:
• Item reduction
• Obtaining insight into the structure of the data
• Examining dimensionality and then deciding on the definitive
selection of items per dimension.
• These issues are only relevant for multi-item instruments that
are used to measure unobservable constructs.
• Newly developed measurement instruments and instruments

to measure observable constructs go straight from the phase of
pilot-testing to the assessment of validity, responsiveness and
reliability.
78
Pilot-testing vs field-testing
• Pilot-testing
• Entails an intensive qualitative analysis of the items
• Relatively small number of representatives of the target population
• Field-testing
• Entails a quantitative analysis
• Uses quantitative techniques, such as
• Factor Analysis (FA)
• Item Response Theory (IRT)
• Require data from a large number of representatives of the target population
• For adequate field-testing a few hundred patients are required
79
Questionnaire Design
80
What is a questionnaire?
• A measurement tool consisting of a list of questions accompanied with:
• Instructions
• Response options, and/or
• Answering spaces
• Designed to elicit and record, or guide the elicitation and recording of, exposures from
subjects
• Guides respondent and interviewer in finding and recording measurement

information.
• A source document: it is very close to the source of data, the respondent.
• Errors at this point tend to considerably and sometimes irreversibly affect the validity of the
evidence generated in the study. 81
• Learning how to ask questions in written and spoken form is essential when
collecting field data.
• A straightforward question asks for information in an unambiguous way and

extracts accurate and consistent data.
• Straightforward questions are:

• Purposeful
• Use correct grammar and syntax, and
• Call for one thought at a time with mutually exclusive questions
82
What is questionnaire design?
• Questionnaire design is a big part of the whole process of data collection
and should be implemented completely before the fieldwork begins.
• At the same time, the potential respondents should be selected by

sampling and a more or less complete list of them should be available.
• Questionnaire design usually begins with selection of the items of data that must be
translated into questions.
• The ground to be covered is determined by two main factors:

• Objectives of the study, and
• Limitations imposed by the burden that can be placed on respondents
• Including the feasible length of the questionnaire
83
Objectives of the study
• The content of a questionnaire is generally designed to investigate the minimum amount of an
individual’s total experience that will provide sufficient information concerning the problem
under study.
• Just as the objectives of the study determine the variables to be measured as a whole, they also
determine the specific items to be covered in the questionnaire.
• If a question does not contribute to the achievement of the objectives, it has no place in the
questionnaire.
• Adequately detailed data should be sought for each essential exposure variable.
• E.g.: for exposure that happens frequently, it is usual to ask about the:
• Time exposure began
• Time it ended, and
• Frequency and intensity of the exposure and their variation over time.
84
• A comprehensive list of potential confounders and effect modifiers should also be
developed.
• A well thought out plan for data analysis, and a description of the algorithms that
will be used to create exposure dose variables and covariate variables, is essential
in determining the items and detail required.
• Developing the exposure and covariate algorithms that will be used at the end of
the study before questionnaire development at the beginning of a study is
important to avoid a surprising common problem—that at the time of data
analysis, the researcher realizes that an item needed to compute an exposure dose
variable had not been collected!
85
The response process
• Several psychological stages of the response
process during questionnaire administration.
• A common view distinguishes five stages
• Knowledge of these stages is helpful in:

• Evaluating the usefulness of potential
questions, and
• Minimizing recall errors and misreporting
86
Stage-1: Understanding the question
• Respondent attempt to understand requested
information
• Understanding is influenced by:
• Culture
• Language
• Individual interpretations
• ‘Context effects’ by:
• information that appears on the questionnaire
(e.g. previous questions), or
• Any suggestion that the researcher or the
research is interested in particular types of
behaviors or other characteristics.
87
• Comprehension errors arises if respondent:
• Does not understand the question or

• Understands it in a way unintended by the
researcher
• Implications : questions should be phrased in culturally

appropriate terms and in the language of the respondent.
• Comprehension errors may be related to personal

characteristics such as:
• Education level, alertness, SES,…
• Any comprehension errors can be sources of information

bias, missing data, and hence imprecision.
• Can result in selection bias if questions are used to assess 88

eligibility criteria
Stage-2: Retrieval of information
• Respondents try to retrieve information considered

necessary given the question is understood
• Information retrieval refers to facts retrieved from:

• Memory
• External sources such as:
• Family members’ memories
• Coworkers’ memories
• Databases
• Diaries
• Household files
• For an event or experience to be remembered or
retrieved, a record of it must be available, either under
the form of physical data or a stored memory.
• Encoding errors: errors arising at this stage
• Recall errors: deficiency in retrieving from memory 89
• May results in non-response or of misreporting
• Can be related to participant attributes
• Leads to biased estimates and decreased precision
• Forgetting is the major process leading to recall errors leading to recall bias.
• Experiences to be remembered for a long time: must be very stressful or highly impactful and
infrequent
• Implication
• Asking respondents to count and report a frequency of a common behavior in some defined
calendar period in the past is among the most difficult tasks
• Human memories tend to relate to typical episodes in personal history (event date), rather
than to the defined calendar time episodes the researcher would like to know about. 90
• Telescoping is a problem with event dating
• Forward telescoping
• Concerns stressful events remembered as more recent than they actually were
• May be the most common problem Event recalled
Event happened
Question on event time
• Backward telescoping
• Happens when recent events are remembered as more distant than they actually were.
Event recalled
Event happened Question on event time
91
• Satisficing: can occur at this stage
• Respondent settles for making little mental effort in tracing information
• Implication:
 Questionnaire designers should verify the available evidence in the literature about what is a
reasonable recall period for the specific type of event of interest.
 For events that are highly memorable, recall accuracy tends to increase by decomposing the recall
period in sub-periods about which separate questions are asked.
 One should work back from more recent periods to earlier periods rather than the other way
around.
• Recall accuracy tends to increase when the participant is given more time to think
• Accuracy of retrieved information depends on how much effort the respondent is able and willing to make
to remember and/or lookup information.
 Researchers should be aware that recalling relevant behaviors from memory can be time-
consuming and that satisficing may be induced by any form of pressure to speed up the response
92
process.
Stage-3: inference and estimation
• Additional mental effort is often required to:

• Further use the remembered events for counting or
estimating total numbers of events
• Estimating average (‘usual’) frequencies or intensities
• Comparing various events to decide about the most
intense or the least intense; and
• Calculating durations or other abstractions.
• For these tasks the respondent decides what amount of

motivation and time will spend and what level of
accuracy aim for.
• Satisficing occurs when the task seems too daunting

• Terminal digit preference numerical values reporting can
be a manifestation of satisficing
• When questions are asked about prolonged periods, one
naturally remembers best the last few weeks or months 93
Stage-4: Formatting the response
• Prepare response in the format expected by the researcher.
• There may be instructions preceding options:

• How to choose
• Measurement units to use
• Measurement scale to use
• Long or difficult response option→Satisficing
• Implication: length of response options is important

• 5 - 7 options are often seen as a maximum
• Self -administered questionnaires:

• Options at beginning of the list tend to be chosen more
• Telephone or face-to-face interviews:

• Options at the end of the list tend to be chosen more often during.
• Implication: except for short options lists, response options should rather be 94
presented as separate questions.
Stage-5: Final editing and communication
• Respondent may edit the answer before communicating it

• Social desirability or fear of disclosure may be a concern for
respondent
• E.g.: ticking a box ‘4-6’ instead of the box 7 and above
• Social desirability bias (SDB)
• Arises because respondents like to appear to be other than they are
• Healthier
• More adherent
• More normal and
• Wiser than one actually is
Dealing with SDB
• Identify question areas that are possible sources of SDB
• Consider how best to minimize any possible bias
• Reassuring confidentiality
95
Behavior is likely to be over-reported Behaviors that may be under-reported
• Illness and disabilities
• Being a good citizen: • Mental illness
• Interacting with government officials
• Taking a role in community activities • Illegal or contranormative behavior:
• Knowing the issues • Committing a crime, traffic violations
• Tax evasion
• Drug use
• Being a well-informed and cultured person • Sexual practices
• Reading newspapers, books, libraries…
• Going to cultural events such as concerts • Financial status:
• Participating in educational activities • Savings and other assets
• Income:
• Lower income groups may under- report
• Fulfilling moral and social responsibilities: income anticipated financial assistance or
• Giving to charity over-report to avoid stigma
• Participating in family affairs and child rearing
• Being employed
• Wealthier participants may under-report
income to avoid social or tax repercussions
96
• Possible consequences of SDB in epidemiological studies:
• Under- estimation of the frequency and/or magnitude of socially undesirable
attributes
• Overestimation of the frequency and/or magnitude of socially desirable
attributes
• Biased estimates of the strength of association with other attributes.
• Hello - goodbye effect

• A tendency to exaggerate once condition before intervention in the hope of getting
the best possible care, and tending to healthier than they are as a form of gratitude to
the health workers
• Introduces a falsely strong observed effect of the intervention on self-perceived

health or on outcomes that rely on questions about symptoms.
97
Personal characteristics of respondents affecting responses
• Personal reference points for judgments
o Concerns the way people rate their preferences and intensities of experiences.
o People may take various reference points as a basis for making their judgment.
o E.g.:
1. Would you say that your own health in general is excellent, good, fair, or poor?
2. When you answered question-1 about your health, what were you thinking about?
• Others of the same age?
• Myself at a younger age?
• Myself now as compared to 1 year ago?
• Other
98
• Implication:
Important anticipate possible variations in reference points

• Make it a pilot exercise
When variation in reference points is important,

o Provide respondents with one clear reference point, or
• E.g.: “When you compare your health now with your health 1 year ago, would you say
that your health now is good, fair, or poor?”
o Split the question into several questions each with a specific reference point.
Personal reference points for judgments may shift considerably over time.
• Important for the validity of assessing changes in subjective attributes
99
End aversion
• Reluctance to use extreme options in an options list of answers
• Results in under-estimation of frequencies of extreme categories
• Possible solutions:
• Broaden extreme categories to minimize the effects of this phenomenon
• Conceal the true extreme categories by adding extremes of a nearly impossible
magnitude that nobody is expected to choose.
• Note: remember that age, illness, sickness, and treatments can affect all stages
of the response process.
100
What is the objective of questionnaire design?
• To minimize error in exposure measures while creating an instrument that is

easy for interviewer and subject
• To obtain measurements of exposure variables essential to the objectives of a

study with minimum error
• To create an instrument easy for both the interviewer and subject to use, and is
easy to process and analyze.
These objectives are potentially in conflict, and any questionnaire usually

represents a compromise among them.
101
Practical considerations in questionnaire design
102
Types of items in questionnaires
• Fully structured item
- Responses are preselected for the respondent
→ Response choices must be known in advance
- Preferred by respondents because some are either
- Unwilling or
- Unable to express themselves
- More difficult to write than open ones
• Advantages
• Results lend themselves more readily to statistical analysis and interpretation
103
• Semi-structured item
- A clear range of options, but one or more of the options trigger a sub-question, the
response to which is to be recorded as free text.
- Item is only structured to a certain level
- Useful when an explanation or specification is desired of a chosen option
• E.g.:
• “If ‘other,’ please specify: _____” or
• “If yes, please explain reasons: ___________.”
104
• Fully unstructured item
- Respondent or interviewer can freely write a textual answer to the question
- When respondents to use their own words is required
- Useful to:
• Explore unknown issues of a topic
• Get unanticipated answers
• Describe the world as the respondent sees it
Disadvantage
- Responses are often difficult to interpret and compare
- Subjects cannot be influenced by response options unlike closed-ended questions
- best sought to simple factual data
• Structured questionnaire: questionnaires mostly composed of structured and semi-structured

items
• Open-ended questionnaire: questionnaires mostly containing open-ended items 105
Question content
• Questions may be about:
• Knowledge ?
• What people know
• Attitudes ?
• What people say they want or think
• Beliefs ?
• What people say is true
• Experiences?
• What has happened to people
• Behaviors?
• What people do, have done, or will do
• Attributes?
• What people are
• The objective of a study largely determines content of a question

106
• Thorough familiarity with the topic of interest is important before developing questions
Using already existing and validated instrument
 Advisable to obtain copies of questionnaires previously used by experts to cover

the subject matter of interest and to make use of them.
• The use of standard questions has a number of advantages.

o Questions are used extensively and proved satisfactory in use
o Questions are assessed for reliability and/ or validity
o Permit comparison
o Easy way to draw on expertise of others
o Facilitate questionnaire design
 Make sure that experts and a sample of potential respondents review all
questions even if you are using already existing and validated instrument.
107
• Evaluate questions obtained from other sources for their:
Design adequacy
Appropriateness to objectives of the study, AND
Suitability for use in the population
• Questions developed for use in a f2f interview may require modification in

their wording or format if they are to be used in a telephone interview or a
mailed self-administered questionnaire.
108
Question wording
• Questions should always be stated as complete sentences.
o Complete sentences express one entire thought
- E. g:
- Question: Place of birth? Poor: why? Improve?
o Place of birth means different things to different people.
- I might give the city in which I was born, but you might tell the name of
the country or hospital.
- Better: Name the country in which you were born.
109
Words in a questionnaire: general principles
 Should be the usual ‘working tools’ of the respondents
 Should be neither too difficult nor too simple.

• Difficult words may not be understood
• Simple words may appear condescending
• May not convey the right meaning
• May needlessly lengthen the questionnaire
• Avoid abbreviations and jargon

• They present the same problems as difficult words
• They may not be understood or they may be misunderstood
110
• What is wrong with the question: Have you ever had an ECG?
• Abbreviation + technical jargon

• ECG abbreviated as EKG in some English-speaking
countries
Solution?
• Offer some alternative terms in the question

• E.g.: Have you ever had an ECG, i.e.: a ‘heart tracing’, EKG, or
electrocardiograph?
• Nonetheless, the abbreviation is likely to be familiar to many subjects,

and perhaps more familiar than any alternative terms.
111
Vague questions
• Questions containing words that vary substantially in their

meaning among different people
• Is there a problem with this item? Why? If problem solution?
• How old were you when you first began to smoke regularly?
112
• Vague!
• How old were you when you first began to smoke regularly?
• Could be made more precise by asking:

• How old were you when you first smoked one or more cigarettes a day
for one month or longer?
• This wording eliminates uncertainty about the meaning of ‘regularly’ and

what was smoked
• It would be better to ask subjects about their ‘usual’ action over a specific
period of time (e.g. the past 12 months) than simply to ask about their ‘usual’
intake.
113
• Questions containing words that vary substantially in their meaning
among different people.
• Usually
• Normally
• Regularly Three commonly used vague descriptors of frequency
• Replaced by more precise quantifiers
114
Too precise questions
• Precision is desirable when estimating amount or duration of exposure
• Respondent burden may be increased unduly if too much precision is requested.
• Any issue in the question below?

• How many cigarettes did you smoke everyday in your life?
• Too precise!
• It might be tempting to ask smokers to estimate their daily cigarette consumption
for each year of their smoking life.
• Unreasonably burdensome
• Prone to substantial error in recall
• Better approach
• Ask subjects about major changes in daily cigarette intake (e.g. An increase or
decrease of 10 or more cigarettes a day), and document the time of each of these
115
changes.
Biased questions
• Questions that suggest a respondent that a particular answer is preferred from
among all possible answers.
• Leading questions are well known, and should be easily avoided.
• Is the question below problematic, why, if would be the potential solution

• Do you think that smoking should be banned in planes?
• More likely to bias responses

• Due to the presence of the strong negative word
• Do you think that smoking should be banned in planes?
• Better approach: use neutral, and balanced wording:

• Do you think that smoking should be permitted or not permitted in planes?
116
Double-barrelled questions?
• It is one that asks two or more questions at the same time
• Each of which can be answered differently
• ‘My eyes are red and teary’

• How should one answer if one’s eyes are red but not teary, or teary but
not red?
• Since some people will say ‘yes’ only if both parts are true, while others
will respond this way if either symptom was present, the final result
may not reflect the actual state of affairs.
• NB: beware of items that contain works such as ‘and’, ‘or’, or ‘because’!
• Pre- testing with a group similar to the intended audiences could reveal
that a problem may exist.
117
Sensitive/threatening questions ?
• Questions that ask respondents about behaviours that are:
• Illegal
• Contra-normative (deviant)
• Not discussed in public without tension
• Relate to issues of self-preservation
• If subjects can possibly feel that there is a right or wrong answer to a

question
118
• Fall into distinct classes: those that ask about:
Socially desirable behaviors: tend to be over-reported
Socially undesirable behaviors : tend to be under-reported
Income, savings, and assets

 Not categorized as either socially desirable or undesirable
• Eliciting accurate answers to sensitive questions can be enhanced by:

 Selection of a more impersonal mode of administration
 Interviewer training (if a personal interview is to be used)
 Wording of questions
119
• Techniques to maximize reporting of socially undesirable behaviors:
 Use of words familiar to the respondent, and
 Open-ended questions
 Explicit assurances of confidentiality before interview or before sensitive
question
 Use of a long introduction to the question
• E.g.: Introduction added to the question on drunkenness: Occasionally, people drink on an

empty stomach or drink a little too much and become (intoxicated or respondent’s word). In
the past year, how often …
• The use of ‘occasionally’ tend to minimize the significance

• ‘drink on an empty stomach’ of the behavior so that the respondent
• ‘a little too much’ will be more willing to report it
• A question on walking : Many people find it difficult to find time to get regular exercise, like
walking. In the past year did you walk for exercise at least once a week?
120
• The significance of a behavior may also be minimized by use of the phrase: ‘did you
happen to …’
• In the last month, did you ever happen to forget to use a condom?
• In quantifying sensitive behavior asking a respondent to supply words that is accustomed to

using to describe the behavior may increase the accuracy of reporting.
• Two forms of question can be asked about drunkenness:

 In the past year, how often did you become intoxicated while drinking any kind of
alcoholic beverage?
o Sometimes people drink a little too much beer, wine, or whisky so that they act
differently from usual. What word do you think we should use to describe people when
they get that way, so that you will know what we mean and feel comfort able talking
about it? And
o In the past year, how often did you become (respondent’s word) while drinking any
kind of alcoholic beverage? 121
Double negatives ?
• Can arise whenever a question that is phrased negatively can have a
negative answer.
• E.g.: Should the hospital manager not be responsible for the failures of
services in the hospital?
Yes □
No □
• It is unnatural to say ‘yes’ when the answer really means ‘no’ (that the
hospital manager should not be responsible to the failure), and so answers to
this question would be ambiguous.
122
Mutually exclusive answers
• A subject could reasonably select more than one answer to a question:
• Subject becomes uncertain about which alternative to choose →non-response
• Which age group you belong to?

o Below 30 □
o 30-40 □
o 40-50 □
o Above 50□
Solution:
• What spread do you use on bread?
Butter □
Low-fat spread or diet margarine □
Regular margarine □
Other spread □
Please give details _______________________________
• Solved by asking:
‘What spread do you usually eat on bread?’ (‘Mark all that apply’), or
123
asking about the frequency of use of all the spreads individually
Response for all participants
• Response lists should be exhaustive!

• i.e. all respondents should be able to find an appropriate response
• Response categories are sometimes omitted because the researcher

has made an assumption that would not apply to all respondents.
o Any problem with the question below and solution if problematic?

• How do you evaluate your daily physical exercise on your health?
Very important □
Some what important □
Not important □
124
• How do you evaluate your daily physical exercise on your health?
• Assumes that all respondents do exercise daily
• If assumption is wrong:
• Respondent will not answer question
• May find the assumption to be offensive
Solution 1:
o How do you evaluate your daily physical exercise on your health?
Very important □
Not important □
I do not exercise
125
Solution 2
• Ascertain first whether or not the respondent exercises daily and skip to a
succeeding question
1. Do you have physical exercise on daily bases?
• Yes □
• No □ (Go to question 2)
2. How do you evaluate your daily physical exercise on your health?

Very important □
Not important □
126
Unambiguous time reference
• Any period referred to in a question should be clear and unambiguous.
o “In the past year, did you walk for exercise at least once a week?”
• Improved : .: In 2023, did you walk for exercise at least once per week?
• Different subjects would be likely to refer to different periods of time in the

past in answering it.
• The use of reference date is usually:

• Explained at the beginning of the interview or questionnaire, and
• Questions relating to it usually begin: ‘before [reference date], did you …’,
etc.
127
More than one concept
• A question that has more than one concept should not be posed.
o Think about your diet over the past year. How often did you eat a serving of
fruits or vegetables? Do not include juices, salads, potatoes, or beans.
• Potential issues? Solution?
• Fruit and vegetable intake may form a single exposure in a study, most
subjects would consider fruits and vegetables as separate categories
• Subjects would typically think through the answer by adding the number of
times they eat vegetables to the number of times they eat fruit.
• Decomposition: formulating an answer to a question by breaking it down

into more manageable parts.
128
• Solution:
• Aske as two questions, one on fruits and one on vegetables.
• Learn through “think aloud”

• Learn which questions subjects decompose and how they decompose the item
during pretest
• This information can be used to break down a complex question into
simpler components.
129
Questions that require calculations
• Certain questions which may appear to be a simple concept may actually require
subjects to perform a calculation to derive an answer.
o In the past year, about how many hours per week did you walk for exercise?
• To answer this question, most subjects would need to break it down into
• The number of times they walked each week
• The minutes they walked per session, and
• Then multiply these together (and then convert to hours!)
• Better to ask separate questions:

• In the past year, about how many times per week did you walk for
exercise? And
• How many minutes did you walk each session? 130
Complex concepts
• Some exposures have complex definitions- burdensome for subject to
comprehend
• E.g.: A researcher may want to define walking for exercise as walking at least
once a week for at least 20 minutes per session and at least at a moderate
pace.
• The question might be phrased as: Over the past year, did you walk for
exercise at least once a week for 20 minutes or more per session? Do not
include casual walking.
• If the question were phrased this way, it would certainly annoy some
respondents with its complexity.
131
• Instead, part of the concept in the question can be incorporated into the answers
to sub-questions.
• The question could be phrased as:

• ‘Over the past year, did you walk for exercise at least once a week?’
• Then, sub-questions would ask about:

• Sessions per week
• Minutes per session, and
• Pace
132
Question order
(Generally) questions about a particular topic should:

• Be grouped together
• Proceed from the general to the particular within a group, why?
• Assists and allows more time for recall of the specific details
Grouping questions using a particular response scale may tend to

promote a response set
• A tendency to give the same response to each question regardless of what the correct
response should be.
133
Placing demographic questions at the beginning is common practice
• Not a good idea! Why?
• These questions are:
• Comparatively low interest to the respondents, and
• Some of them are threatening
• Can be answered quickly (when respondent gets tired)
Questionnaire should begin with questions related directly to the topic
interest
• Command the subject’s interest
• E.g.: in a study of sun exposure in relation to skin cancer, it is appropriate to

begin with questions on recreational pursuits involving sun exposure.
If, for some reason (e.g. to select particular respondents), it is necessary to place
demographic data at the beginning, some explanation for their position should134be
given to the respondent.
Place sensitive questions towards the end of a questionnaire, in order
of increasing threat. Why?
• Minimizes:
• Early termination of the interview
• Failure to complete the questionnaire
• The degree of threat presented by particular questions can be determined

empirically by asking subjects ‘how uneasy most people’ would feel about
particular topics of questioning in a questionnaire.
• Sensitive questions should not placed on the last page of a self-administered

questionnaire.
• Questions on the last page are highly visible if the subject peruses the questionnaire.
135
Place relatively easy-to-answer questions at end
• Long or difficult questionnaires respondents get tired  answer last
questions carelessly or not answer them at all.
• Place demographic questions (age, income, gender, and other background
characteristics) at the end because these can be answered quickly.
Avoid many items that look alike

• Twenty items, all of which ask the respondent to agree or disagree with
statements,
• May lead to fatigue or boredom respondent may give up
• To minimize loss of interest,

• Group questions and provide transitions that describe the format or topic.
• E.g.: say or print something like: “The next set of questions ask about your use of
health services.”
136
Questions should appear to reasonable people to be in a logical order
• Do not switch from one topic to another unless you provide a transitional
statement to help the respondent make sense of the order.
Order of questions affect responses

• Sometimes the answer to one question will affect the content of another.
• When this happens, the value of the measure may be diminished
• Which question should come first?
a. How efficient is the managerial staff at UoG? Or
b. Which improvements in management do you recommend?
Question b should come before question a.

 If it does not, the respondent might offer suggestions for the improvement
of the nursing staff’s efficiency merely because it has been suggested.
137
Logical sequence
• Questions should follow a logical sequence

• The sequence that the respondents might be expected to follow in thinking
about the topic.
• E.g.: in collecting job histories, it is usual to proceed chronologically

beginning with the present occupation and proceeding to successively earlier
ones.
• Why this backwards chronological approach?

• Gives more time to recall events of the more distant past
138
Questionnaire structure
• Every questionnaire should contain:
Introduction
Instructions including skip patterns
Linking phrases between topics
Conclusion
139
Introduction
• Interview takes the form of a standard statement read by the interviewer
• It is usually part of the letter soliciting respondent cooperation
• Serves both to:

• Elicit participation
• Discharge investigator’s ethical obligations to the subjects
140
General instructions
• Should be short and simple
• Form part of interviewer training and an interviewer’s manual rather than

the questionnaire.
• Instructions relating to specific questions should appear with those questions

in the body of the questionnaire
• These include instructions to:
• ‘Mark all that apply’
• When more then one response may be appropriate
• Skip instructions
• Instructions about the meaning of specific questions

141
Linking statements
• Break the subject’s concentration on a particular topic
• Provide a brief pause
• Establish concentration on a new topic
• Used to break the monotony of a long series of questions on one topic
• Should not be:

• Unnecessarily long
• Appear to be demanding, or
• Give unwarranted importance to the succeeding questions
• Contain words or phrases that may bias the succeeding responses.
142
• The following are some uses and examples of linking statement.
To signify a major change in questioning:

• “The food we eat is an important part of our everyday lives. I would now
like to ask some questions about the foods that you usually eat and the
amounts of them that you eat.
To break the monotony of a series of (food) frequency questions:

• Next, I would like to ask about bread and breakfast cereals.
To introduce demographic questions at the end of a questionnaire:

• Finally, I would like to ask a few questions about your demography for
statistical purposes.
143
Skipping(branching)
• Necessary where some succeeding questions are not applicable to all
respondents.
• Failure to follow skip patterns: major source of missing data

• CAI greatly reduced missing data due to failure to follow the proper skip patterns.
• Paper and pencil questionnaires must rely on good instructions to follow skip
patterns
• Skip instruction:
• Should be placed immediately after the answer that leads to the branch point in the
questionnaire
• Most important requirement of skip instructions
• Should always be worded positively rather than negatively
• ‘Go to question 3’ than ‘Skip question 2’
144
• Skip patterns may be confusing to people
• Should be avoided in self-administered printed questionnaires
• To ensure accuracy, interviewers must be trained to follow skip patterns
• Online questionnaires are effective vehicles for branching

• Because you can design the software so that the respondent is automatically guided to the
appropriate branch.
• E.g.: if the questionnaire tells the respondent, “If no, go to question 6,” the respondent who
answers “no” will automatically be sent to question 6.
145
• Important to make the path clear for those who are to complete the sub-questions
than for those who should skip them.
• Have every smoked?
□ No
□ Yes
If yes, describe how you start smoking in the space provided
• Complex branching designs are usually only possible in interviewer administered questionnaires.
• Including “Inapplicable” category may be used to avoid skips in a self-administered questionnaire
• How often do you cut the fat off meat before you cook or eat it?
Never □
Less than half the time □
More than half the time □
Always □
I never eat meat □
• Provides an alternative answer for everyone and eliminates the need for a skip 146
Questionnaire length
• Topics to be covered and details covered in a questionnaire are limited by the
length of time that subjects are willing to spend on the questioning process.
• The maximum time that can be spent administering a questionnaire:

• 1–2 hours by face-to-face interview and
• 30–40 minutes by telephone interview General rule
• Self-administered questionnaires are at an added disadvantage in that the subject

can gain an impression of the size of the response task before deciding whether to
embark on it.
• Response rates are reduced with longer mailed questionnaires.
147
Questionnaire formats: Standard components
Items:
• Main building blocks of a questionnaire
• Units composed of a question with:

• Instructions
• Response options, and
• Answering spaces
• Items about a common theme are arranged in clearly delineated sections and
linked through alphanumerical sequencing, combined with skip instructions
when appropriate.
148
Spaces:
• Serve as administrative or quality control purposes
• Each single page of a questionnaire has a header section that identifies, as a
minimum the:
• Study
• Questionnaire within the study (if several exist)
• Page number
• Participant identification number, and
• Date of completion
• Participant numbers and dates of completion may be pre-printed.
• All instructions are traditionally given in italics.
• A small footer indicates the version of the questionnaire and the printing date.149
150
Questionnaire Format: Principles
Different typefaces for questions, responses, and instructions
• Lead interviewer or respondent to the correct parts of the question
Use of capital letters

• Provides a consistent cue to the interviewer of what is to be read aloud
Put specific instructions or prompts next to the question as needed

• Eliminates the need for complex initial instructions and serves as a reminder at
the time the instruction is needed.
151
Principles…
Record responses to closed-ended questions by check boxes
• Easy to understand
20-30
Use vertical response formats (except for scales) 31-40
41-50
• Makes it clear which box goes with which response 51+
• Increase space between questions
• Avoids questionnaire congestion
• Enables to see the path of questions easily
• Exception: scale for an attitude or for degree of pain

• Categories may be better visualized on self-administered questionnaires if
they are on a horizontal line.
Provide spaces or boxes for coding open-ended questions 152

Principles…
Consider data capture methods when making design decisions
• When responses are to be key-entered, the response options need to be pre-

coded into numbers to facilitate key-entry
• E.g.:
Yes □1
No □2
Formats need to be tailored to specific requirements of other data-

capture methods, including:
• CAI
• Web surveys
• Self-administered questionnaires
153
Principles…
 Pages and questions should be numbered consecutively
 Sub-sections of questions should be indented and identified with
letters rather than numbers.
 Question numbers should extend to the left of questions to stand out
 Questions should not extend over more than one page
 Clear skip patterns are important
Certain visual changes, such as underlining words or using capital letters, can
be used to emphasize a change in concept
• Not be over-used
154
Principles of formatting individual questions
• Interviewer-administered questionnaires: •
Self-administered
questionnaires:
• CAPITAL LETTERS for questions
• Bold face for questions
• Bold face for alternative responses that are not
to be read to respondent • Regular typeface for
alternative responses
• Bold CAPITAL LETTERS for alternative
• Italics for instructions
responses to be read
• Italics for instructions not to be read
• CAPITAL italics for instructions to be read

155
Formatting self-administered questionnaires
• Presentation of questionnaire: ease of use and give authoritative appearance to

encourage response.
• Questionnaire should be printed in booklet form so that will open flat on a table.
• First page should have title of the project and instructions for completing the
questionnaire.
• Include a graphic and/or color for visual interest
• Use of colored paper may increase response rates by a small degree.
156
Formatting self-administered…
• A two-column format for the questions

• Easier to read because participants may skip words when reading longer
lines of text
• Allows more questions per page
• Consider coloured background with white response boxes
• Consecutively number pages and questions
• Make skip patterns clear, e.g. through use of arrows after response and
instructions
157
Aids to recall
• Disease may be influenced by exposures which occurred many years before
diagnosis.
• Aids may be used to assist subjects in recalling information
• Recall may be aided by:
 Allowing subjects some time to think about the question: simple aid to recall
 Providing longer introductions or redundant wording- allow more seconds

for retrieval of information.
 Asking subjects to refer personal records where they may be relevant
 Supplying list of alternative answers as part of a questionnaire:
 Showing respondent photographs

158
 Life events calendar
• Subjects asked to place personal landmarks
• e.g. marriage, birth of children and/or job and residential histories
• Improves the:
• Accuracy of recall of dates of events
• Recall of the events
• Research on memory: most events are not stored with dates.

• Instead, people attach dates to events by relating them to:
• Datable events (e.g. marriage) and
• Time periods (college, jobs, places of residence)
159
• Use of a life events calendar is an example of the use of autobiographical
sequences to aid recall.
• Autobiographical sequences
• Groups of events clustered in time, often organized within some wider
framework (e.g. a job or illness), within which memory appears to be
organized.
• Thus any means of entry into an autobiographical sequence, whether by

way of calendar time, place lived, or through some highly salient event,
such as childbirth or illness, may assist in recall of events or behaviors of
low salience
• E.g.: Asking first about illnesses which may have been indications for the
use of particular medications may assist in recall of those medications.
160
• Telephone interviews are at a disadvantage with regard to recall because:
• Silent pauses are more awkward on the telephone, and
• Subjects might respond too quickly to retrieve the memory fully
• This is more easily done in a mailed questionnaire,

• Where the respondent can refer to the records than it is in a personal interview
• NB: records may be made available in an interview by notifying subjects in

advance of the proposed lines of questioning and asking that they collect together
whatever records they may have.
161
Pre-testing
• Essential part of all questionnaires development
• Regardless of being substantially based on previous questionnaires
• Objective: to identify questions that are:

• Poorly understood
• Ambiguous,
• Evoke hostile or other undesirable responses
• Etc.
• Some questions a pre-test should answer are:

• Are all the words understood?
• Are questions interpreted similarly by all respondents?
• Does each closed-ended question have answer applies to each respondent?
• Are some questions not answered?
• Do some questions elicit uninterpretable answers?
162
• Techniques of pre-testing questionnaires
 Traditional methods such as:

o Expert reviews
o Debriefing meetings with interviewers
o Item distribution analysis including percentage non-response
 Newer cognitive and behavioral approaches

 Include:
• Cognitive or intensive interviews with respondents
• Interactive coding of interviewer & respondent behaviors during interview
• Testing different questionnaire versions in small experiments
163
• Pretesting typically involves several of these methods applied to different versions of the
questionnaire as it is revised and gets closer to its final form.
• Some of the techniques are more appropriate for earlier stages of pre-testing:
• Expert reviews
• Cognitive interviews
• Experiments
• Others would be done on questionnaires closer to finalization:

• Item response distribution analysis
• Validity studies
• Some authors use the word pre-test for the early testing of the questionnaire and pilot test for
later testing of the study field methods, including:
• Selecting subjects
• Recruitment, and
• Data collection
• Methods, such as intensive interviews and interactive coding, might be too costly in terms
164 of
• When time and budget are restricted, one should at a minimum:
• Seek expert review of the questionnaire (e.g. from colleagues)
• Conduct a pre-test on at least 20 test subjects followed by debriefings, and
• Perform an item distribution analysis
• Interviewer debriefing sessions and monitoring of the distribution of item

responses should be done at the beginning of actual data collection and further
changes made to the questionnaire or protocol to resolve problems may be made.
• Subjects in questionnaire pre-tests should be similar to the target population of the

parent epidemiological study in terms of age, education, and study eligibility
criteria (ideally).
• For a first pre-test, one could use a small sample of convenience: co-workers, friends
• Mode of administration should be the same as it will be in the parent

epidemiological study. 165
Methods of pre-testing questionnaires
o Expert/peer review
• For content, wording, and format
• Experts in the content areas or in questionnaire development asked to

review the questionnaire and make comments.
• Often professional colleagues can fill this role
• Can help determine whether all necessary items are included to meet study
aim, and provide advice on wording, format, etc.
• Data analyst also need to review content and format to identify problems,
before data collection begins, which might otherwise arise at the data
processing and analysis phase.
166
o Cognitive/intensive interviews with respondents
• To observe how items were understood and answered
• Useful tool in refining questions on new or complex exposure areas
• Gather detailed information about how respondents formulate answers to key

questions
• Several cognitive steps:

• Understanding the question
• Retrieving information from memory
• Judgement (such as estimation when an exact number cannot be recalled)
• Censoring the response (typically to be more socially desirable), and
• Providing answer in the response format requested
167
• Interviewer asks respondents:
• To ‘think out loud’ while they are coming up with their response, or
• To explain how they came to their response after the answer has been given.
• Aids researcher in understanding whether a question is misunderstood or
whether it is complex and needs to be decomposed into simpler questions.
• Cognitive interviews can also use item-specific probes to understand how specific
concepts are understood.
• E.g.: the following questions would be appropriate for the question, ‘How many
people are there in your household?’,
• Did the respondent include him/herself in the count?
• To what period did the respondent think the question related
• i.e. if a household member had been temporarily away, would he/she have been included?
• How did the respondent interpret the term household?
168
o Interviewer and respondent debriefings
• To identify items with problems
• Can be:
• Interviews conducted immediately or soon after
• Questionnaires, or the questionnaire is completed
• group meetings
• Can be with the interviewer or respondent
• Interviewers should always be debriefed after they pre-tested the questionnaire

on a number of participants.
• Can be asked:
• Which items caused the most problems in terms of obtaining adequate answers from
participants, and
• For those items, about what percentage of the time there was a problem. 169
• Respondents should be told that:
• They are pilot participants before they complete questionnaire or interview, and
• Their help is needed to identify any problems with the questionnaire
• In the debriefing, respondent should be asked about:

Which questions were confusing or hardest to answer and why
For which questions the answer they wanted to give was not alternate response
Whether any questions were offensive to them
Whether they found it easy to follow the skip patterns
Specific items about which the researcher has concerns
• E.g.: did the participant notice that the time reference changed for a particular question?
Any additional comments and suggestions
170
o Interviewer–respondent interaction coding scheme
• Interviewer behaviors in question-asking
• Substantive change: makes a substantive change in reading question
• Incorrect prompt: repeats question not as written or suggests answer
• Skips question: skips applicable questions
• Reads wrong question :reads question that was not supposed to be read
• Respondent behaviors
• Interrupt: Interrupts question with an answer
• Uncertain: expresses uncertainty about question, requests clarification
• Uncodeable: response does not meet question objectives, uncodeable
• Don’t know: Offers a ‘don’t know’ response
• Refusal: refuses to answer
171
o Interaction or behaviour coding
• A monitor listens to the interview (usually a tape recording) and codes specific
behaviors of the interviewer and the respondent for each question.
• Length of pause between end of question and answer (reaction time or response
latency) is sometimes coded as an indicator of difficulty in answering the
question.
• By analyzing frequencies of behaviors for each question, one can determine problem
questions
• E.g.: which questions interviewers do not read as worded (often to attempt to
improve the meaning of the question).
• Questions with a moderate proportion (e.g. ≥ 10 %) of respondent codes for uncertainty,

uncodeable answers, or interruptions can be used to identify problematic questions
which could be phrased better. 172
o Item non-response and response distributions
• Distribution of responses to each question should be reviewed
• After representative pilot participants completed the pre-test
• Percentage of non-response for each question is a particular concern.
• Two types of non-response:

• Missing/refusal
• Explicit ‘don’t know’
• Subjects not following the skip patterns

- Major source of missing data in self-administered questionnaires
• Improving format of questionnaire to make skip pattern clearer could
reduce the amount of missing data.
173
• Another reason for missing/refusal is the sensitivity of the question.
• Techniques for asking sensitive questions might reduce non-response.
• Reviewing item response distributions is also important to review

frequency of each response category when questions are closed-ended.
• Response categories should be changed if some responses were

selected by a very low or high percentage of respondents
• An exception to this is when certain extreme response categories are
given to encourage reporting of socially undesirable behaviors.
174
o Percentage missing experiments
• As part of pre-testing, two (or more) versions of each question can be tested
• usually with each questionnaire version given to a different group of subjects.
• The versions of each question can be different phrasing of the question or

different categories for the answers for closed-ended questions.
• The two versions can be compared based on:

• Interviewer debriefings
• Respondent debriefings
• Item analysis
• Behavior coding, or
• Comparison with more accurate measures of the exposure
• e.g. records or diaries 175
• Interviewer debriefing questions could include:
• Which form of the questionnaire was easier to administer?
• Why was this form easier and the other more difficult?
• From this information, the researcher can select the best alternative of each
question for the final questionnaire.
176
o Revising the questionnaire
• Typically a questionnaire is tested and revised several times before being used in
the field.
• Once problems are identified during a pre-test, they can be resolved through
changes in:
• Questionnaire wording
• Questionnaire format, or
• Interviewer training
• However, often a revised question which solves problems identified by some
respondents will lead to problems for other respondents.
• Often adding explanations to make the question clearer to some respondents will
make the question longer, more burdensome, or even confusing to others.
• Thus it seems reasonable to modify only questions that are problematic for a
moderate proportion of subjects (e.g. ≥10%) and/or have a simple solution.
177
o Think-aloud techniques
o Paraphrasing the question by respondent
o Specific probes about how question was answered
o Debriefing questionnaires
o Group sessions
o Respondent
o Focus groups
o Observation of interviews and possible interaction coding
• To identify interviewer and respondent behaviors for each question, such
as rewording of question by interviewer or uncertainty about the answer
expressed by respondent
o Distribution of valid responses

o Percentage ‘don’t know’
178
Questionnaire translation
• Translation into another language is a problem that has to be addressed.
• Intra-method reliability of some questions is greater when they are administered

in the respondent’s mother tongue, even when the respondent is multilingual.
• International multicenter studies often necessitates translation into a language

other the questionnaire was first developed.
• Quite probable English will not be the first language of a significant proportion of respondents.
• Possible alternatives
o Respondents can be eliminated from the study
• Unrepresentative sample…………. limited generalizability
o Translate to the languages most commonly used within the catchment area encompassed
by the study.
• Translation may also be required when a population contains ethnic minority groups.
• Translating an instrument is as time-consuming as developing a new tool.
179
• Similarities and differences in results must be interpreted with extreme caution.
• Goal of translation:
• To achieve equivalence between original and translated versions of a scale.
 Establish equivalence
• Different types of equivalencies proposed
• Most agree that there are five or six key ones
 Conceptual equivalence
• Do people in two cultures see the concept in the same way?
• Extremes:
• Both source and target cultures completely agree on what elements
constitute the construct
• → Translators can proceed to the next step
• Concept may not exist in the target culture
• Concept exists in the target culture, but differ with constituent elements
or weight given to each element
• Concepts either do not exist in other cultures or take different forms difficult
or impossible to translate meaningfully
• Concepts in other cultures that have no direct counterpart
180
• Conceptual equivalence can be determined in a number of ways:
• A review of the ethnographic and anthropological literature about the target group
• Interviews and focus groups
• Consultations with a broad range of experts
Item equivalence
 If conceptual equivalence exists
 Determines relevance and acceptance of specific items in the target population.
 E.g.:
o It does not make sense to ask about a person’s ability to climb of stairs if the
questionnaire will be used in a setting consisting solely of single-story dwellings
o It maybe taboo in some cultures to inquire about certain topics
• These questions may have to be reworded or replaced before the translation begins
 Established in much the same way as conceptual equivalence
181
Semantic equivalence
 Refers to the meaning attached to each item.
 E.g.:
• In China white is the color for mourning, in others connote purity.
• Such problem exists even within the same language as spoken in different countries.
 Idioms do not translate well.
 Can be established in a number of ways.

o Cognitive testing
• People can be given the translated version and asked to rephrase the question in their
own words, or say what they think the item means
o Translation and back translation

• Uncover discrepancies in interpretation between the two versions.
182
• One problem with translation and back-translation process has to do with the
translators themselves.
• They are usually:

• Better educated
• Have higher reading level than those who will ultimately be completing
the scale
• Familiarity with two languages
• May make it easier for them to grasp the intended meaning of the question
• Consequently, it is necessary to have a group of unilingual people,

similar in terms of sociodemographic characteristics to the target
population.
183
Operational equivalence
• Goes beyond the items themselves
• Looks at whether the same format of:
• The scale
• The instructions, and
• The mode of administration can be used in the target population
• Self-administered scales would be totally inappropriate in places with low

literacy levels.
• Even the format of the items may present difficulties.

• Elderly people have difficulty grasping the concept of putting an X on a VAS
corresponding to their degree of discomfort.
• Able to use the scale reliably only when turned it on its side and made it resemble a
thermometer that could be filled in with a red marker.
184
• Remember that when gathering data involving dates, formats are not consistent
between countries or even within the same country.
• The number 09/02/08 could variously refer to:

• September 2, 2008
• February 9, 2008
• February 8, 2009, or
• August 2, 2009
• It is best to explicitly ask for the day, month, and year.
185
Measurement equivalence
• Investigates whether the psychometric properties of a test—its various forms

of reliability and validity—are the same in both versions.
• Can be done only after the test has been translated.
• People should translate into their native tongue and should be aware of the
intent of each item and the scale as a whole.
• Allows respondents to go beyond a strictly semantic translation and to use

more idiomatic language that would be better understood by the
respondents.
186
Respondent burden
• Concerns level of demand placed on respondent necessary to answer questions.
• Length of the questionnaire is one aspect of respondent burden.
• Additional contributors to respondent burden are:

o Length and distance of period of time over which recall is requested
o Salience (or impact) to the subject of the topic of questioning, including its
sensitivity
o Frequency of the event
o Complexity or detail of the data sought
187
• In general, the following will all add to respondent burden
o Recall over a long period of time or from the distant past
o Topics of low salience or impact
o Questioning regarding frequent events (such as eating)
o Complex questions (e.g. full occupational history with details of exposure to

hazards in each occupation)
o Being a proxy respondent for someone else.
188
• Consequences of increased burden on respondent:
o Increases risk for termination interview
o Increases risk of non-completion of a self-administered questionnaire
o Reduces quality of data
o Threatens response rate
o Reduces population alienation from research & cooperation in future studies

• A particular problem for longitudinal studies requiring recurrent surveys in one
population.
189
• There is often a conflict in:
• Collecting necessary information necessary to the objectives of a study
• Keeping questionnaire to an acceptable length, and
• Minimizing respondent burden
• To resolve these conflicts:

• Collect the amount of information necessary to the objectives of the study
• Ensure questionnaire length and respondent burden are kept to levels that
do not threaten subject participation or cause increase in measurement error
190
General approach to questionnaire development
Avoid anything that could confuse, bore, embarrass, or otherwise burden either the interviewer or
respondent.
• Element encompasses:
• Making questionnaire as clear, short, simple, friendly, and attractive as possible, and
• Making all possible efforts to keep motivation high
Account for what is known about psychological response stages and influences of personal
characteristics.
Draw from what is known already about the validity of specific questions.
• It is unwise to produce a questionnaire item de novo if a suitable version of the item is:
• Known to exist
• Has been used in other studies, and
• Has produced reliable and accurate information, except when there are reasons to believe
that a translation, update, or cultural adaptation is necessary.
Make maximal use of possibilities to promote data integrity after questionnaire filling.
191
Scaling responses
• A method must be choose by which responses will be obtained once a set of
questions devised
• Choice of method is dictated, at least in part, by nature of the question asked.

• E.g.:
• Have you ever gone to church?’
• Directly to a response method consisting of two boxes, labelled ‘yes’ and ‘no’
• ‘How religious are you?’

• Does not dictate a simple two-category response
• ‘Do you believe that religious instruction leads to racial prejudice?’

• Require the use of more subtle & sophisticated techniques to obtain valid responses.
192
1. Visual analog scale (VAS)
• Graphic rating method
• Consists a line of fixed length, usually 100 mm ,

and two described endpoints representing the
least and most possible amount of an attribute
at the extreme ends with no words describing
intermediate positions
• Extensively being used in medicine to assess a

variety of constructs; pain, mood, and
functional capacity, etc.
• Has been used for the measurement of change
• Respondents are required to place a mark, usually an ‘X’

or a vertical line, on the line corresponding to their 193
perceived state.
• Weaknesses of VAS
o Respondent may not find it as simple and appealing as researchers
o Little thought is given to wording of end-points, yet patients’ ratings of pain

are highly dependent on the exact wording of the descriptors.
o Optimal wording to describe endpoints can be a problem and a source of

variation.
o E.g.: an endpoint described as ‘the worst possible anger’ may mean totally
different things to different respondents depending on their experiences
and imagination.
o While the lower limit is often easy to describe (none of the attributes being
measured), the upper end is more problematic.
194
• The reliability of a scale is directly related to the number of items in the scale
• So that the one-item VAS test is likely to demonstrate low reliability in
comparison to longer scales.
• Strengths of VAS
• Its simplicity contributes to its popularity
195
4. Ordinalized scales
• Sometimes the measured attribute is continuous but the scale for
measurement is ordinalized
• Optimal number of levels is usually in the range of 5–7
• Ordinalized scales include the following types:
4.1. Horizontal options lists with circles
A horizontal options list with circles showing incremental values

196
2. Adjectival scales
• Use descriptors along a continuum
• Used rating scales

• e.g.
• Satisfaction
• Unsatisfactory/satisfactory/excellent
• Self-reported health
• Excellent/very good/good/fair/poor
• Some degree of reading ability is required (major problem)
197
3. Juster scale
• Variant of adjectival scale
• Combines descriptors of probabilities with numerical ones
• Used mostly for subjectively estimating the probability of an event
• Ordinal levels described by a numerical probability combined with a worded
interpretation of that same probability
198
4.2. Likert scales (‘agree–disagree’ scale)
• Used to measure subjective levels of:

• Agreement
• Acceptance, or
• Perceived likelihood
• Characterizes the presence of levels of opinion in either

direction away from a neutral opinion
• The neutral opinion may or may not be mentioned as a

separate level, but it usually is
• Respondents presented with a series of attitude

dimensions, and are asked:
• Agree or disagree
• Extent of agreement or disagreement
• Easy to administer in self-completion questionnaires, either paper

or electronic 199
• Be aware of the following interrelated issues while using Likert scales:
Acquiescence (‘yea saying’)

• Tendency for respondents to say ‘yes (agree) rather than disagree with statements
Central tendency
• Reluctance of respondents to use extreme positions
Pattern answering
• Respondent falls into a routine of ticking boxes in a pattern
• Straight down the page or diagonally across it
• Often a symptom of fatigue or boredom
• Best way to avoid: keep interview interesting
• To minimize: include both positive and negative statements
• Conflicting answers from the same respondent will identify where pattern answering occurred
Order effect
• Arises from the order in which the response are presented.
• There is a bias to the left on a self-completion scale.
200
• Likert scales vs adjectival scales
• Adjectival scales: unipolar

• Descriptors range from none or little of the attribute at one end to a lot or the maximal
amount at the other.
• Likert scales: bipolar

• Descriptors most often tap agreement (strongly agree to strongly disagree)
• Some degree of reading ability is required (major problem)

• Inappropriate for young children
• Difficult for those with cognitive disorder
• Solution: use faces to measure primarily pain or unhappiness.

201
5. Face scales
• Ordinal levels are represented by faces expressing a range
of moods or of pain
• Feasible for children and for those with reading

difficulties
• Can be seen as a special form of Likert scale
202
One pole or two
• Number of ‘poles’ of the factor being assessed: major difference between VAS,
adjectival scales and Likert scales
• VAS and adjectival scales: unipolar

• They assess the magnitude of a feeling or belief from zero or little to very much
or a maximum.
• Likert scales: bipolar

• One end reflects strong endorsement of an idea, and the other end strong
endorsement of its opposite.
• What is being evaluated usually determines which format to use

• Amount of pain or ability to perform some action would require a unipolar format
• Endorsement of a belief or attitude is often bipolar—strongly agree or in favor to strongly
disagree or reject.
• Some attributes are not as clear-cut, and there are significant implications in the choice that
is made. 203
Interviewer: finding them
o Interviewers:
- Similar to respondents in gender, age, or other demographic characteristics
- Should avoid flamboyant clothes, haircuts, and so on
- Be able to speak clearly and understandably
- Unusual speech patterns or accents may provoke unnecessarily favorable or
unfavorable reactions.
- Interviewer’s attitude toward the study and the respondent will influence the
results.
- Monitor interviewers systematically and frequently to get the most accurate

data possible
204
Interviewer: Training
• Training is key
• Overall goal of training :
o To produce interviewers who know what is expected of them and how to
answer questions
o Know where to turn if problems arise unexpectedly in the field
• It is important to find time to meet interviewer/s:

• To develop a standard vocabulary
• To share problems encountered in the field
• Prepare a manual
- Most efficient way to make sure trainees have all information they need to perform their job
- Can explain what they are to do and when, where, why, and how they are to do it.
205
Conducting interviews
Guidelines for conducting interviews:
• Make a brief introductory statement that will:
 Describe who is conducting the interview
e.g. “Dr/Mr/Mrs/pro xx yy from university of Gondar ”
 Tell why the interview is being conducted

E.g. “to find out how satisfied you are with our after surgery program”
 Explain why the respondent is being called

e.g. “We’re asking a random sample of people who were discharged from the hospital in
the last 2 months”), and
 Indicate whether or not answers will be kept confidential
e.g. “Your name will not be used without your written permission”.
206
• Try to impress the person being interviewed with the importance of the interview
and of the answers.
- People are more likely to cooperate if they appreciate the importance of the subject matter.
• Check the hearing and “literacy” of the respondent
- A few people may have trouble hearing and understanding some of the questions
- If that happens, ???
o Reappraise the eligibility of the respondent
- Perhaps an interview is not the best method of obtaining reliable data from this
respondent
- Other methods may be more appropriate
o Another option is to speak more clearly and slowly.
• Ask questions as they appear in the interview schedule.

- It is important to ask everyone the same questions in the same way or the results will not be
comparable.
207
Monitoring interview quality
• To assure getting the most accurate data possible
• Go with an interviewer or spend time with interviewers to make sure what they
are doing is appropriate for the study’s purposes.
• To prevent problems, take some or all of the following steps:

• Establish a hot line
- Having someone available to answer any questions that might occur immediately, even at the
time of an interview.
• Provide written scripts
- If interviewers are to introduce themselves or the study, give them a script or set of topics to
cover.
- The script may have to be approved by an Institutional Review Board!
208
• Provide extra copies of all supplementary materials.
- If data collectors are to mail completed interviews back, make sure to give them extra forms and
envelopes.
• Provide an easy-to-read handout describing the purpose of the interview and the
content of the questions.
• Provide a schedule and calendar so that interviewers can keep track of their
progress
• Consider providing the interviewer with visual aids.
- Extremely important when interviewing people in-person whose ability to speak or read may be
limited.
• Consider the possibility that some interviewers may need to be retrained and
make plans to do so.
209
Questionnaire Administration
• For questionnaire administration it is important to keep in mind that anything
that can:
• Confuse
• Distract
• Bore
• Embarrass
• Burden the respondent or the interviewer tends to adversely affect accuracy
and completeness of the recorded responses.
• The conclusion of either a self-administered questionnaire or an interview should

include
• An expression of appreciation, and
• Should provide an opportunity for the respondent to make comments.
210
• A self-administered questionnaire could also include a request for the
respondent to check that all questions have been answered.
• The conclusion should also contain the address for the return of a mailed
questionnaire; while an addressed return envelope will usually be included, it
may have become separated from the questionnaire.
• Interviewer-administered questionnaires should also provide for the entry of the

interviewer’s comments.
211
• The important choices to make included:
• Self-administered vs. interviewer-administered
• Face-to-face vs. internet vs. telephone vs. mixed administration
• Administration at home vs. clinical care settings vs. other
• Proxy-respondents vs. interviewing enrolled study subjects
212
• One should make sure to always record the type of respondent used
• E.g.:
• Self-about-self
• Mother-about-child
• Other-caregiver-about-child
• Etc.
• When an adult is reporting about a child, especially in environments with

extended care-giving practices, it may be necessary to define the relationship of
the adult to ensure validity of responses.
• Generally speaking, proxy-respondents must be avoided as much as possible if

the enrolled subject is capable of providing accurate answers.
213
Main styles of interviewing
• Style of interviewing tends influence on the accuracy of the responses
• Standardized interviewing
• All interactions with respondent are:
• Prescribed, and
• Written in the interviewer’s guide as a step-by-step process
• Rules out most interviewer influences on responses.
• Conversational interviewing
• Allows interviewers to interact freely with respondents
o Minimizes errors due to poor understanding of a question by the respondent
o Introduces some interviewer variance
214
• Conversationally flexible interviewing
• Combines both standardized and conventional interviewing styles:

• A standardized part and a free part to each question.
• There can be a standardized approach for one question and a conversational one for
another question.
• Leads to:
• The same accuracy as standardized interviewing when the question is easy to answer
• Allow for better accuracy than standardized interviewing when the question is difficult
215
Training of questionnaire administration
• Provide detailed instructions in a user’s manual
• Train each interviewer and has manual during each interviewer including:
• Moving through the questionnaire at an appropriate pace
• Writing legibly
• Using permanent ink, etc.
• Sufficient training should ensure that the interviewer establishes rapport
• User’s manual should be constantly referred to special training on:

• Use of code lists
• Skip patterns
• Uniform date recording
• Items that require complex probing, e.g., Age or date assessments based on a calendar of
local events
• Specificity of terminology, length of text, etc.
• For items involving free-text
216
Questionnaire user’s manual/Interviewer’s Guide/Instruction Sheet
• Contains detailed instructions on the use of the questionnaire form
• General guidelines and question-specific sections
• Content is influenced largely by the chosen style of interviewing.
• Consider providing a library of pre-coded answer sets in the user’s manual

• e.g., occupational categories
• Each interviewer should be trained extensively on how and when to use the instruction sheets.
• It should be a formal obligation for the interviewers to have the instruction sheets available for
consultation during each interview.
• Prepare a Standard Operating Procedure based on the User Manual and field logistics
• Prevent deviation from the study protocol 217
Questionnaire administration: Ethical considerations
• During questionnaire administration:

• Ensure privacy optimize accuracy and
• Avoid non-intended disclosures limit item non- response rates.
• Protocols should include details on emergency counseling and professional

services when the subject matter is anticipated to reveal emotionally sensitive
issues, such as
• Partner violence
• Mental distress
• Adherence to source document standards is another ethical imperative
218
Questionnaire administration: Methods
• Having developed a questionnaire how to administer it is the next consideration.
• This is an issue which affects:

• Costs
• Response rates
• Which questions can be asked and in what format
• The four methods commonly used to administer questionnaires are:

• Face-to-face interviews (f2f)
• Over the telephone
• By mail
• By computer
219
• Face-to-face interviews
• A trained interviewer administering the questionnaire on a one-to-one basis
• Either in office or subject’s home (more usually)
• Interview at home serves to:
• Put the respondents at ease….. familiar surroundings, and
• Increase compliance……………..subjects do not have to travel
• Involves greater cost to the investigator
• Possibility of interruptions…………. Telephones, family members…
220
o Advantages of face-to-face interview
• Interviewer is sure who is responding
• Not the case with telephone or mail administration
• Anyone in the household can answer or provide a second opinion for the respondent
• Allows non-verbal communication: can motivate respondent to reply
• Interviewer can determine if the subject is having any difficulty understanding items
• Whether due to:
• Poor grasp of the language
• Limited intelligence
• Problems in concentration
• Boredom
• Allows interviewer to rephrase question in terms the person may better understand, or to probe for
a more complete response.
• Flexibility afforded in presenting items, as questions in interview can range from ‘closed’ to ‘open’
221
• Since many immigrants and people with limited education understand the spoken language
better than they can read it, and read it better than they can write it, fewer people will be
eliminated because of these problems.
• Closed questions, which require only a number as a response, such as the person’s age,
number of children, or years of residence, can be read to the subject.
• If it is necessary for the respondent to choose among three or more alternatives, or to give a
Likert-type response, a card with the possible answers could (and most likely should) be
given to the person so that memory will not be a factor.
• Open questions can be used to gather additional information, since respondents will
generally give longer answers to open-ended questions verbally rather than in writing.
• This can sometimes be a disadvantage with verbose respondents.
222
Disadvantages of face-to-face interviews ?
• More expensive to administer than any other method
• Distortion of questions meaning (insufficient interviewer training)
• Number of possible interviews that can be done in one day may be limited.
• Many people work during the day, and only evening interviews are convenient
• Difficulty gaining access to homes or apartments that have security procedures
223
• If target language is not native language for a sizable proportion of the
respondents:
• Translation into one or more foreign languages bilingual (or multilingual)
interviewers must be found.
• This may not be unduly difficult if there are only a few major linguistic cultures, but can
be more of a problem in cities that attract many immigrants from different countries.
• Attributes of the interviewer may affect the responses given.

• Can be caused by two factors:
• Biases of the interviewer
• Interviewer social or ethnic characteristics
• Differences between interviewer and respondent, especially race, also
have an effect.
224
Telephone questionnaires
• Meeting with subjects in person to interview over the phone
• Savings incurred in terms of time and transportation, and therefore money

( major advantage)
• Using telephone directory as a sampling frame

• Systematically underrepresented poorer people
• Homes with more than one telephone number will have multiple chances of
being selected, a bias favouring more affluent households.
225
• Advantages ??
• Reduction in the number of omitted items
• Skip patterns followed by interviewer rather than the respondent

• Open-ended questions can be asked
• A broad, representative sample can be obtained

• Interviewer can be prompted by a computer (CATI)
• Interviewer can determine if the person is having problems understanding the

language in general or a specific question in particular
• Some basic demographic information such as age, marital status or education

can be obtained even the person is not willing to participate
• Allows to determine if there is any systematic bias among those who decline to participate
in the study.
226
• At least three areas in which the telephone may be superior to f2f interviews
1. Bias caused by appearance of interviewer, due to skin color, physical deformity,

is eliminated.
• One interviewer characteristic that cannot be masked by a telephone is gender
2. Nationwide surveys can be conducted out of one office

• Lowers administrative costs
• Facilitates supervision of the interviewers to ensure uniformity of style
3. People may report more health-related events in a telephone interview than in a

f2f interviewer
• Not clear whether the higher figure is necessarily more accurate
227
• Disadvantages ?
• No assurance who the person is at the other end of the line
• Biased sample unless a specific respondent is chosen beforehand

• Higher probability that parents with young children, those who work at
home, shift-workers, the ill, or the unemployed will be reached
• Primacy effect
• Likely to occur with subjects tending to endorse categories that are read
towards the beginning rather than towards the end of the list
228
Mailed questionnaires
• Advantages
• Far the cheapest method of the three
• Can be coordinated from one central office, even for national or international
studies
• In contrast, personal interviews usually require an office in each major city, greatly
increasing the expense.
• Social desirability bias tends to be minimized

• No interviewer present, either in person or at the other end of a telephone line
229
• Disadvantages
• Subjects may omit some of the items
• No assurance that the subjects read them in order.
• Possibility of the questionnaires delayed by a postal strike.
230
How to increasing return rate?
• Many techniques although not all have proven to be effective
1. A covering letter
o Most important part of a mailed questionnaire
o Determine if the form will be looked at or thrown away
o Detailed description of letters and their contents should be stressed.
o Should begin with a statement emphasizing:
• Why the study is important, and
• Why that person’s responses are necessary to make the results interpretable, in that order
o Include a:
- Promise of confidentiality
- Description of how the results will be used, and
- Mention of any incentive
o Should be signed by hand
o With the name block under the signature indicating the person’s title and affiliation.
o Should letterhead
o Subjects are more likely to respond if the research is being carried out by respected organization
o Should fit onto one page
231
2. Advance warning that the questionnaire will be coming.
• Introductory letter prepares respondent for the questionnaire, and helps

differentiate it from junk mail.
• To overcome the skepticism that often greets such unsolicited arrivals.
232
3. Giving a token of appreciation
• Use of incentive is predicated on ‘social exchange theory’,
• Which states that even small incentives are effective because they inculcate a sense of social
obligation on the respondent.
• Most often, this is a sum of money, which significantly increases the return rate.
• Relationship between amount of incentive and return rate flattens out quite quickly;
• Amounts as low as $0.50 or $1.00 doubles it, but $15.00 increases the return rate only 2.5
times
• Sharp increase in the odds of return up to $1.00, then a smaller increase until $5.00, and no
further increase after that.
• It doesn’t make sense for financial incentives to exceed $5.00, and even $1.00 is
sufficient in many cases.
• The explanation for this somewhat paradoxical result is that when the value of the
incentive starts approaching the actual value of the task, then ‘social exchange’
becomes more like an ‘economic exchange’, and the person feels less of a social
obligation to reciprocate. 233
• Other incentives that have been used with varying degrees of success have included
• Lottery tickets
• A chance to win a
• Savings bond
• Prize
• Pens or pencils
• Tie clips
• Unused stamps
• Diaries
• Donations to charity
• Key rings
• Golf balls
• Letter openers
• But these seem to be much less powerful than cold, hard cash
234
4. Anonymity
• Evidence on effect of anonymity on response rate is contradictory
• Assurances of confidentiality improves response rates to sensitive information

• Promises of confidentiality for non-sensitive material do not increase
compliance
• When data are not sensitive, such assurances may make people more suspicious and result
in an increased refusal rate
• If it is necessary to identify the respondent,

• To link the responses to other information or
• To determine who should receive follow-up reminders,
• then the purpose of the identification should be stated along with guarantees that:
• Person’s name will be thrown away when it is no longer needed, and
• Kept under lock and key in the meantime; and
• no subject will be identifiable in the final report
235
5. Personalization
• Some people see a personalized greeting using their name as an invasion of privacy and a threat to
anonymity.
• This problem can be handled in a number of ways.
• First, the letter can be addressed to a group, such as dear ‘colleague’,resident of . . . neighborhood’,
‘member of . . . ’
• Adding a handwritten ‘thank you’ note at the bottom of a covering letter increases response
rate by 41%.
• Another method to balance anonymity and personalization is to have the covering letter
personalized, and to stress the fact that the questionnaire itself has no identifying information on
it.
• Be aware that personalization may have some detrimental effects on questionnaires and surveys
sent by e-mail or over the Internet
• Other aspects of personalization include typed addresses rather than labels, stamps rather than
metered envelopes, and regular envelopes rather than business reply ones. 236
6. Enclosing a stamped, self-addressed envelope
• Asking the respondents to complete a questionnaire is an imposition on their time;
asking them to also find and address a return envelope and pay for the postage is a
further imposition, guaranteed to lead to a high rate of non-compliance.
237
7. Length of the questionnaire
• It seems logical that shorter questionnaires should lead to higher rates of return
than longer ones.
• However, the research is mixed and contradictory in this regard.
• When the questionnaire is long (over roughly 100 items or 10 pages), each
additional page reduces the response rate by about 0.4 per cent.
• Up to that point, the content of the questionnaire is a far more potent factor
affecting whether or not the person will complete it.
• In fact, there is some evidence that lengthening the questionnaire by adding

interesting questions may actually increase compliance and lead to more valid
answers.
• Thus, it seems that once a person has been persuaded to fill out the form, its
length is of secondary importance. 238
8. Pre-coding the questions
• Although this does not appear to appreciably increase compliance
• Pre-coding does serve a number of useful purposes

• Open-ended questions must at some point be coded for analysis
• Subjects are more likely to check a box rather than write out a long explanation
• Handwritten responses may be illegible or ambiguous
• On the other hand, subjects may feel that they want to explain their answers, or
indicate why none of the alternatives apply (a sign of a poorly designed question).
• The questionnaire can make provisions for this, having optional sections after each
section or at the end for the respondent to add comments.
239
9. Follow-ups
• To maximize returns
• Four -step process:
1. 7–10 days after the first mailing
• postcard should be sent, thanking those who have returned the questionnaire
• Reminding others of the study’s importance
• Indicate to those who have mislaid original where they can get another copy of
questionnaire.
2. 2–3 weeks later

• second letter is sent, emphasizing why that person’s responses are necessary the study.
• Include another questionnaire and return envelope
3. Send another letter, questionnaire, and envelope via registered or special delivery mail.
4. Call those who have not responded to the previous three reminders
• May be impractical for studies that span entire country 240
• May be feasible for more local ones
Use of diaries
241
• Diaries
• refer to detailed prospective records of exposure kept by the subject.
• Used:
• To measure
• Physical activity
• Sexual activity
• Alcohol consumption
• Dietary intake(food records)
• Symptoms
• Minor illnesses
• Medication use, and health diaries
• Medical care
• Other frequent exposures
242
Forms of diaries
• Open-ended (generally)
• Allow more accurate specification of the type of exposure
• Take the form of a booklet

• Subject records each occurrence of a particular behavior at the level of detail requested by the
researcher.
• Usually take the form of a journal with one entry per line
• With columns indicating the details needed
• A new page started each day
• One food or ingredient to be recorded per line, with columns for recording the:
• Meal
• Place of food preparation
• Food description, and
• Amount
• Incorporates detailed instructions for recording the required information.
• Need to have spaces for coding
• Sample completed page is helpful for most types of diaries 243
• Closed-ended or partly closed-ended
• E.g.: types of physical activities may be printed on the form, with columns for
the subject to prospectively record his/her daily frequencies of the listed
activities.
• Reduce the amount of coding required
• Monthly calendar
• Diaries in which few or no entries are expected for most days
• E.g.: diary of doctor’s visits
• Ledger format
• When disparate behaviors are being recorded (e.g. both symptoms and doctor’s visits),
diaries can have a with separate sections for the different types of entries.
244
• Electronic diaries
• Can be used instead of paper-and-pencil diaries in many situations
• Data collection can be accomplished via touch-tone telephones or hand-held

electronic devices (‘palm’ computers)
• Can be used
o For frequent simple behaviors
• Drinking
• Use of medication
o For subjective measures
• Symptoms
• Some devices can be programmed to ‘beep’ to remind subjects to record

information in their diary 245
• Particularly useful for diaries of pain or symptoms that are to be recorded at
specified or random times during the day, rather than diaries that are to be
recorded after specific behaviors such as drinking
• Some electronic or mechanical devices can automate data capture with little
subject involvement, such as:
• Pill containers that monitor medication compliance or
• Motion sensors that record physical activity
• These could be considered a hybrid of diaries and objective measures.
246
Example of an exposure diary: food records sample page
247
Example of diary recording instructions for subjects: food records
248
Advantages and limitations of diaries
Advantages:
• Highly accurate in measuring current behavior
• Do not rely on memory
• The use of prospective recording:

• Eliminates telescoping
• Facilitates collection of information on events of low salience that are quickly forgotten
• Allow collection of greater detail about exposure than is possible by questionnaire.

• E.g.:
• Foods can be weighed or measured by the subject before consumption, or
• Recreational physical activities can be timed
• Do not require the subject to summarize pattern of behavior

• E.g. pattern of alcohol drinking can vary greatly from day to day or from week to week.
• A diary kept for a sufficient time period can capture this kind of variation.
249
Limitations
o Only current exposure can be measured (primary limitation of the use of diaries)
o A measure of past exposure only if current and past behavior are highly correlated
o Accurate measures of average current behavior only if sufficient number of days or weeks are
captured
o Demand more time and skills from subjects than do other methods
o Subjects need basic measurement and recording capabilities
o Training of subjects in the skills needed to keep an accurate diary can be time consuming for
both subjects and study staff
o Subjects need the motivation to maintain the diary over the required time period.
• These limitations may make it difficult to recruit a representative sample of the population
of interest and to obtain a high response rate. 250
• Response rates across studies of health diaries found a range of 50–96%.
• Participation rates and rates of full completion of diaries among participants have been found to
be lower for those with:
• Less than a high-school education
• Those of lower social class
• Those who are over age 65, and
• Those who have experienced recent stressful life events
• Complexity of processing the information is another disadvantage of diaries

• For diet diaries, for example, each food item recorded must be numerically coded, food portions must
be standardized, and a computer program and associated database are needed to convert the wide
range of foods to nutrients.
• Training and monitoring of subjects and the lengthy coding procedures tend to make the use of
diaries expensive.
251
• These disadvantages have led to limited use of diaries in epidemiology.
• Diaries have been used primarily as a comparison method for validation studies of
questionnaires or other methods.
• Diaries could be an appropriate method for exposure measurement in prospective cohort or

cross-sectional studies.
• Advantage of increased accuracy needs to be weighed against the disadvantages of:

• Subject burden
• Costs of participant training
• Staff monitoring
• Review and
• Coding
• Diaries might be able to be used in large-scale studies
252
Reliability and validity of measures from diaries
• Diaries are generally more accurate than questionnaires if they are collected over a sufficient time
period.
• Health diaries are clearly superior to interview,

• With some studies showing twice as many minor illnesses reported in the diary as by interview.
• The validity of nutrients computed from a one-week diet diary is somewhat greater than that of a
retrospective food frequency questionnaire, using as the standard three other one-week diaries
completed over a one-year period.
• The correlation of calorie-adjusted fat intake from the one-week diary with the standard was 0.64,
while the correlation of fat intake from the questionnaire with the standard was 0.52.
• A seven day diet diary was much more accurate than a food frequency questionnaire when each was
validated against urinary nitrogen and potassium (r = 0.55–0.67 for the diary versus r = 0.29–0.32
for the questionnaire). 253
• Automated diary methods are generally thought to be more accurate than paper diaries.
• Electronic diaries are more accurate for pain or symptoms because they allow the time of data
entry to be captured, whereas paper diaries can be filled in retrospectively hours (or days) after
the instructed times of diary entries.
• Completeness of record keeping (e.g. as measured by number of completed diary days) has also
been found to be higher with electronic diaries.
• In a comparison of medication adherence by self-completed diaries versus an electronic

monitoring device, compliance was less (and presumably more accurate) using electronic
monitoring.
• However, objective ‘diary’ measures via devices are not always more accurate than self-reported
diaries.
• E.g: in studies of measuring physical activity in children, there have been conflicting results
as to whether activity monitors (accelerometers) are more accurate than diaries or recalls
254
when each was compared with heart rate monitors.
Sources of error and quality control procedures for diaries
• Although diaries are generally more accurate than questionnaires, they are still subject to a range
of errors.
• Development of a:
• Detailed study procedure manual
• Pre-testing of the procedures, and
• Monitoring of data collection are important in any data collection effort
• Several sources of error specific to diaries need to be considered too including:

• Time period covered by the diary may not be sufficient to reflect the subject’s ‘true exposure’.
• The act of keeping the diary may affect the behaviors being recorded.
• Errors may be introduced because the subjects serve as the primary data collectors.
• Coding of diaries may be more complex than for other methods.
255
Selection of diary recording period
• Although diaries can only directly measure a few days or weeks of exposure, they
are usually intended to reflect the subject’s exposure over some longer period of
time.
• Thus, while a one-day diary might be perfectly accurate as a measure of exposure

during that day, the validity of that measure depends on how well it captures the
true variable of interest, for example exposure over the preceding year.
• The diary should include a sufficient number of days and a sufficient spread of
days over time to account for day-to-day, weekday-to-weekend, month-to-month,
or season-to-season variation in exposure.
• E.g.: nutrient intake has been shown to differ on weekends compared with
weekdays and to vary by season.
256
Reactivity
• Diaries are intended to measure the subject’s usual behavior over some time period.
• One concern with diaries is that the act of keeping a diary may lead to a change in behavior.
• This is an example of reactivity
• i.e. that ‘the process of measuring may change that which is being measured’.
• Record keeping may sensitize subjects to their actions or feelings, or may lead them to change
towards more socially desirable or health-conscious behaviors.
• E.g.: recording recreational exercise might lead to an increase in physical activity during
the diary period, or recording symptoms may sensitize a subject to recognize minor
symptoms which might otherwise go unnoticed.
• Reactivity may be a particular problem in randomized trials, because participants may be

particularly compliant with the intervention on diary days.
257
• Subjects may also change behaviors in order to reduce record keeping.
• There is also empirical evidence that the number of items recorded in diaries drops over time,
although this may be due to under-reporting of behaviors due to fatigue with the study rather
than subjects actually modifying their behavior.
• Discussion of the problem of reactivity during training of subjects might reduce this source of
error.
• Subjects could be told that for scientific reasons it is important to assess their usual behaviour.
• For example, part of the written instructions for a diet diary could include the following.
• Don’t change what you usually eat
• Eat as you normally do
• Give a complete, true record
• No one is judging what you eat.
258
Inaccuracies due to study subjects as data collectors
• When diaries are used to measure exposure, the study subjects

themselves are the primary data collectors.
• Subjects may record information inaccurately because of deception (e.g.
under-reporting socially undesirable behaviours), lack of understanding
of the recording techniques, or lack of motivation.
• For example, studies which used an objective measure of energy intake,
doubly labelled water, found that food records underestimate actual
caloric intake by about 20 per cent.
259
• Many of the quality control procedures previously outlined to minimize errors by data collectors can
and should be adapted to the situation where the subjects are the data collectors.
• In particular, study subjects should be trained, preferably in person, in the diary recording techniques.
• An overview of the diary recording methods is presented, followed by detailed specific examples.
• The trainers and those who develop the written instructions must be familiar with the coding scheme,
so that the level of detail needed for accurate coding of items is recorded by the subjects.
• For example, if the database for coding food records has different codes for fresh, canned, and frozen
vegetables, then subjects must record this information.
• Subjects should also be taught how to handle any unusual situations, such as illness, or food eaten
outside the home which cannot be weighed.
• The training session would also include review of examples of completed forms, and practice in
completing the diary. To reduce the staff time needed to train each subject, part of the instruction
could be presented on video.
260
• Detailed written instructions should also be given to each subject.
• In addition to the material covered in the training session, the instructions should
include the date to start the diary, the date to end the diary, and the name and
telephone number of the person to call with any questions.
• Subjects should also be instructed in how frequently to record information. Frequent
recording reduces errors due to poor memory ,but increases the subject burden.
• Diaries requiring periodic entries during the day should take the form of a small
booklet or device that can be carried in a handbag or pocket.
• Beyond understanding the data recording procedures, subjects need to be motivated
to spend the time and effort to record information accurately.
• Enthusiastic trainers who can explain the importance of each subject’s involvement
can help to motivate subjects.
261
• After initial training, the data collection needs to be monitored.
• The quality control principles of continued training and motivation of the data collectors
also apply to diaries.
• Phone calls often need to be made to remind subjects of the day to begin, and then again
to ask about any problems within the first few days of keeping the diary.
• Subjects should be asked to review their records for accuracy and completeness each day.
• After the diaries are returned, they should be reviewed immediately by a study editor for
unclear entries or missing data.
• Long diaries should be reviewed periodically.
• The editor should review any specific problems with the subject, and should also discuss
any general recording problems the subject appears to be having if future diaries are to
be collected.
262
Example of topics covered in training subjects to complete a
diary: food records
• Importance of accurate records (to motivate subjects not to change behaviour)

• Scope of records
• General instructions to record the type of food, brand name, method of storage (fresh, frozen, canned), method of
preparation, and amount eaten
• How and when to record (e.g. record immediately after eating or drinking, list only one food per line)
• Recording foods
• Examples of recording milk products, meats, desserts, packaged foods, supplements How to weigh or measure foods
• Recording recipes
• Recording in unusual situations: illness, travel, restaurant foods which cannot be weighed or measured
• Reviewing records for accuracy and completeness
• Importance of subject to study (to motivate subjects to keep accurate diary)
• Logistics
• When to start and stop recording
• Whom to call with problems
• Review of sample completed diary form Practice in completing form
263
• Errors in coding
• open-ended diaries can yield a large amount of information which requires detailed coding.
• For example open-ended food diaries require code numbers for thousands of types of foods
which relate to a database of nutrients in foods.
• Because coding of diaries can be a complex task, quality control procedures specific to this
step need to be developed.
• Coders should be selected who are familiar with the exposure (e.g. nutritionists for food
diaries) and are meticulous in dealing with details.
• As in any type of study, there is a need for training of coders and practice sessions, and for
monitoring of coders’ work by periodic re-coding by another coder.
• In large-scale studies, coders may be given an examination to become ‘certified’ in the
coding procedures; this improves accuracy and standardization across coders.
• Coders should refer uncertain situations to the lead coder or editor for resolution, and staff
meetings should include exercises and discussion of items that have led to coding problems.
• The codebook containing the codes and detailed coding procedures should be updated by
the lead coder whenever changes are made.
264
• An alternative to manual numeric coding of all diary entries is direct key
entry of the text (or appropriate key words).
• Sophisticated computer programs can then automate the coding and

analysis.
• Reliability studies of coding have identified several sources of error

• Incomplete description of items which require coder judgement
• Hand conversions by coders (e.g. Cups to weight)
• Transposed numbers.
265
• Incomplete description of items can be avoided by appropriate training of subjects and by
call-backs to subjects.
• To avoid errors in hand calculation, all calculations should be performed by computer.
• For example, a program to convert a diet diary to nutrients should have the capability to
convert cups to weight or food dimensions to weight when necessary.
• A check digit can be included in each code, so that transposition errors can be identified
by the data entry program.
• Computer range and logic checks on the raw data (e.g. quantity of each food) and the
computed variables (e.g. kilocalories per day intake) can also identify some coding errors.
• In one study of coding four-day diet diaries, continued improvement of documentation,
automated conversions, computerized edit checks, and increased experience of coders
resulted in an improvement of inter-coder reliability of dietary fat intake from 0.92 to
0.98.
266
Summary
• The use of diaries may be a highly accurate method of measuring
present common behaviours.
• The limitations of diaries, compared with interview methods, are the
greater burden on subjects, which may lead to a poorer response rate,
and the greater cost for subject training and for coding of the data.
• The accuracy of diary information can be enhanced by using multiple
diary days spread over a sufficient time period, and by careful training
of subjects and coders.
267
References
• Ellenberg S, et’al. Meausrement in medicine: Practical Guides to Biostatistics and Epidemiology
• C.B. Terwee et al. Quality criteria were proposed for measurement properties of health status
questionnaires. Journal of Clinical Epidemiology 60 (2007) 34e42
• Cordier S. et’al. Handbook of Epidemiology: exposure assessment. Second Edition
• Lee E-H, et’al. Evaluation of Studies on the Measurement Properties of Self-Reported Instruments.
Asian Nursing Research 14 (2020) 267e276
• Principles of Exposure Measurement in Epidemiology: Collecting, Evaluating, and Improving
Measures of Disease Risk Factors: In: Whiteet E et’al, chapter 1; page 1-50
• L.B. Mokkink et al. The COSMIN study reached international consensus on taxonomy, terminology,
and definitions of measurement properties for health-related patient-reported outcomes. Journal of
Clinical Epidemiology 63 (2010) 737e
• COSMIN Taxonomy of Measurement Properties: COSMIN. https://
www.cosmin.nl/tools/cosmin-taxonomy-measurement-properties/
• Di Malta et al. An Application of the Three-Step TestInterview (TSTI) in the Validation of the
Relational Depth Frequency Scale. Journal of Humanistic Psychology 00(0)
268

Unit Three-Measurement Instruments V4 - 2 - 3

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unit Three-Measurement Instruments V4 - 2 - 3

Uploaded by

Copyright:

Available Formats

Unit Three: Measurement Instruments

Destaye Shiferaw Alemu

• Identify sources of variation in measurement

• Describe steps in the development of a measurement instrument

• Describe the psychological stages of the response process in questionnaire

• Provide practical recommendations for questionnaire design and administration

• Identify appropriate questionnaire administration method for a given research question

• Describe the practical applications of diaries as data collection instrument

• Explain how to assure validity and reliability of questionnaires

• Develop a questionnaire for a specific research question

• Identify research questions appropriate for data collection using diary

• Researchers frequently measure objective parameters and subjective

• Unlike objective parameters, subjective parameters cannot be directly

• Be suitable for the

• Reflect quality aspect

• Provides a consistent measure of important characteristics despite

• The degree to which a measurement is free from measurement error

• Reliability is an essential requirement of all measurements in clinical

• Internal consistency is important:

• Internal consistency is not important:

It is not a measure of the unidimensionality of a scale

It does not assess whether the model is reflective or formative

• Kudere-Richardsone 20 (KR-20) values

• Refers to the correlation between subdomains of the scale

• Concordance of scores achieved by different raters on the same occasion.

• Training and standardization

• Averaging of repeated measurements

• Different types of validity

• The most important measurement property of an

• Focuses on whether the content of the instrument

• The availability of a definition of the construct being

• E.g.: when selecting a questionnaire to assess physical

• Refers to how well scores of a measurement instrument

• Consider when an instrument passes the test of face validation

• Possible purposes of a content validation are:

o All these questions assess whether the items are relevant

Information about construct and situation

-Information about construct should be considered by both the:

- If the new measurement instrument concerns, or a new laboratory

- If the measurement instrument is a questionnaire, all items and

- Furthermore, details of the development process may be relevant,

- All this information should be taken into consideration in the 26

-Like face validation, content validation is also only based on

-The researchers who developed the measurement instrument

-However, users of the instrument should always check whether

-Assessment of content validity by the users is particularly important if

- To assess the content validity of this questionnaire, you have to

 An accelerometer attached to a belt around the hip to

2.2.2. Predictive validity

• It is not possible to provide uniform criteria to determine whether an instrument is sufficiently

• These consequences include:

Identify a suitable criterion and method of measurement

- E.g.: if we are interested in the validity of the scores of a measurement instrument in

-If gold standard is a continuous variable: correlation coefficients

• Should be used to provide evidence of validity in situations in

• The degree to which scores of a measurement instrument are

• Determine whether a construct exists of one or more

• Basic principle: hypotheses are formulated about

Describe the construct

• Gather empirical data that will permit the hypotheses to be tested

• Translators should have the target language as the mother tongue

Step 3: Back translation