Missing Data and Multi Imputation

research methods
& reporting
Multiple imputation for missing data in epidemiological and
clinical research: potential and pitfalls
Jonathan A C Sterne,1 Ian R White,2 John B Carlin,3 Michael Spratt,1 Patrick Royston,4 Michael G Kenward,5
Angela M Wood,6 James R Carpenter5
Most studies have some missing data. Jonathan Sterne and colleagues describe the
appropriate use and reporting of the multiple imputation approach to dealing with them
1
Department of Social Medicine, Missing data are unavoidable in epidemiological and have no missing data in any of the variables required for
University of Bristol, Bristol BS8 clinical research but their potential to undermine the that analysis. However, results of such analyses can be
2PR
2
MRC Biostatistics Unit, Institute of
validity of research results has often been overlooked in biased. Furthermore, the cumulative effect of missing
Public Health, Cambridge CB2 0SR the medical literature.1 This is partly because statistical data in several variables often leads to exclusion of a
3
Clinical Epidemiology and methods that can tackle problems arising from missing substantial proportion of the original sample, which in
Biostatistics Unit, Murdoch data have, until recently, not been readily accessible to turn causes a substantial loss of precision and power.
Children’s Research Institute, and
University of Melbourne, Parkville, medical researchers. However, multiple imputation—a The risk of bias due to missing data depends on the
Victoria 3052, Australia relatively flexible, general purpose approach to dealing reasons why data are missing. Reasons for missing data
4
Cancer and Statistical with missing data—is now available in standard statisti‑ are commonly classified as: missing completely at ran‑
Methodology Groups, MRC Clinical cal software,2‑5 making it possible to handle missing data dom (MCAR), missing at random (MAR), and missing
Trials Unit, London NW1 2DA
5
Medical Statistics Unit, London semiroutinely. Results based on this computationally not at random (MNAR) (box 1).6 This nomenclature
School of Hygiene and Tropical intensive method are increasingly reported, but it needs is widely used, even though the phrases convey little
Medicine London, WC1E 7HT to be applied carefully to avoid misleading conclusions. about their technical meaning and practical implica‑
6
Department of Public Health and In this article, we review the reasons why missing data tions, which can be subtle. When it is plausible that
Primary Care, Institute of Public
Health, Cambridge may lead to bias and loss of information in epidemiologi‑ data are missing at random, but not completely at ran‑
Correspondence to: J A C Sterne cal and clinical research. We discuss the circumstances dom, analyses based on complete cases may be biased.
jonathan.sterne@bristol.ac.uk in which multiple imputation may help by reducing bias Such biases can be overcome using methods such as
Accepted: 30 January 2009 or increasing precision, as well as describing potential multiple imputation that allow individuals with incom‑
Cite this as: BMJ 2009;338:b2393 pitfalls in its application. Finally, we describe the recent plete data to be included in analyses. Unfortunately, it
doi: 10.1136/bmj.b2393 use and reporting of analyses using multiple imputation is not possible to distinguish between missing at ran‑
in general medical journals, and suggest guidelines for dom and missing not at random using observed data.
the conduct and reporting of such analyses. Therefore, biases caused by data that are missing not at
random can be addressed only by sensitivity analyses
Consequences of missing data examining the effect of different assumptions about the
Researchers usually address missing data by including in missing data mechanism.
the analysis only complete cases —those individuals who
Box 1 | Types of missing data* Statistical methods to handle missing data
• Missing completely at random—There are no systematic
A variety of ad hoc approaches are commonly used to
differences between the missing values and the observed deal with missing data. These include replacing miss‑
values. For example, blood pressure measurements ing values with values imputed from the observed data
may be missing because of breakdown of an automatic (for example, the mean of the observed values), using
sphygmomanometer a missing category indicator,7 and replacing missing
• Missing at random—Any systematic difference between the values with the last measured value (last value carried
missing values and the observed values can be explained forward).8 None of these approaches is statistically valid
by differences in observed data. For example, missing blood
in general, and they can lead to serious bias. Single
pressure measurements may be lower than measured blood
pressures but only because younger people may be more imputation of missing values usually causes standard
likely to have missing blood pressure measurements errors to be too small, since it fails to account for the
• Missing not at random—Even after the observed data are fact that we are uncertain about the missing values.
taken into account, systematic differences remain between When there are missing outcome data in a randomised
the missing values and the observed values. For example, controlled trial, a common sensitivity analysis is to
people with high blood pressure may be more likely to miss explore “best” and “worst” case scenarios by replacing
clinic appointments because they have headaches missing values with “good” outcomes in one group and
* When one variable has missing data
“bad” outcomes in the other group. This can be useful if
BMJ | online
RESEARCH METHODS & REPORTING
there are only a few missing values of a binary outcome, the imputation of the missing values, and they are only
but because imputing all missing values to good or bad useful when averaged together to give overall estimated
is a strong assumption the sensitivity analyses can give associations. Standard errors are calculated using Rubin’s
a very wide range of estimates of the intervention effect, rules,16 which take account of the variability in results
even if there are only a moderate number of missing between the imputed datasets, reflecting the uncertainty
outcomes. When outcomes are quantitative (numerical) associated with the missing values. Valid inferences are
such sensitivity analyses are not possible because there obtained because we are averaging over the distribution
is no obvious good or bad outcome. of the missing data given the observed data.
There are circumstances in which analyses of com‑ Consider, for example, a study investigating the
plete cases will not lead to bias. When missing data association of systolic blood pressure with the risk of
occur only in an outcome variable that is measured subsequent coronary heart disease, in which data on
once in each individual, then such analyses will not be systolic blood pressure are missing for some people.
biased, provided that all variables associated with the The probability that systolic blood pressure is missing
outcome being missing can be included as covariates is likely to decrease with age (doctors are more likely
(under a missing at random assumption). Missing data to measure it in older people), increasing body mass
in predictor variables also do not cause bias in analyses index, and history of smoking (doctors are more likely
of complete cases if the reasons for the missing data are to measure it in people with heart disease risk factors
unrelated to the outcome.9 10 In these circumstances, or comorbidities). If we assume that data are missing
specialist methods to address missing data may lessen at random and that we have systolic blood pressure
the loss of precision and power resulting from exclusion data on a representative sample of individuals within
of individuals with incomplete predictor variables but strata of age, smoking, body mass index, and coronary
are not required in order to avoid bias. heart disease, then we can use multiple imputation to
If we assume data are missing at random (box 1), estimate the overall association between systolic blood
then unbiased and statistically more powerful analyses pressure and coronary heart disease.
(compared with analyses based on complete cases) can Multiple imputation has potential to improve the
generally be done by including individuals with incom‑ validity of medical research. However, the multiple
plete data. Sometimes this is possible by building a more imputation procedure requires the user to model the
general model incorporating information on partially distribution of each variable with missing values, in
observed variables—for example, using random effects terms of the observed data. The validity of results from
models to incorporate information on partially observed multiple imputation depends on such modelling being
variables from intermediate time points11 12 or by using done carefully and appropriately. Multiple imputation
bayesian methods to incorporate partially observed vari‑ should not be regarded as a routine technique to be
ables into a full statistical model from which the analysis applied at the push of a button—whenever possible spe‑
of interest can be derived.13 Other approaches include cialist statistical help should be obtained.
weighting the analysis to allow for the missing data,14 15
and maximum likelihood estimation that simultaneously Pitfalls in multiple imputation analyses
models the reasons for missing data and the associations A recent BMJ article reported the development of the
of interest in the substantive analysis.13 Here, we focus QRISK tool for cardiovascular risk prediction, based
on multiple imputation, which is a popular alternative on a large general practice research database.17 The
to these approaches. researchers correctly identified a difficulty with missing
data in their database and used multiple imputation
What is multiple imputation? to handle the missing data in their analysis. In their
Multiple imputation is a general approach to the prob‑ published prediction model, however, cardiovascular
lem of missing data that is available in several com‑ risk was found to be unrelated to cholesterol (coded as
monly used statistical packages. It aims to allow for the the ratio of total to high density lipoprotein cholesterol),
uncertainty about the missing data by creating several which was surprising.18 The authors have subsequently
different plausible imputed data sets and appropriately clarified that when they restricted their analysis to indi‑
combining results obtained from each of them. viduals with complete information (no missing data)
The first stage is to create multiple copies of the data‑ there was a clear association between cholesterol and
set, with the missing values replaced by imputed values. cardiovascular risk. Furthermore, a similar result was
These are sampled from their predictive distribution obtained after using a revised, improved, imputation
based on the observed data—thus multiple imputation procedure.19 It is thus important to be aware of prob‑
is based on a bayesian approach. The imputation proce‑ lems that can occur in multiple imputation analyses,
dure must fully account for all uncertainty in predicting which we discuss below.
the missing values by injecting appropriate variability
into the multiple imputed values; we can never know Omitting the outcome variable from the imputation
the true values of the missing data. procedure
The second stage is to use standard statistical meth‑ Often an analysis explores the association between
ods to fit the model of interest to each of the imputed one or more predictors and an outcome but some
datasets. Estimated associations in each of the imputed of the predictors have missing values. In this case,
datasets will differ because of the variation introduced in the outcome carries information about the missing
BMJ | online
research methods & reporting
values of the predictors and this information must Data that are missing not at random
be used.20 For example, consider a survival model Some data are inherently missing not at random
relating systolic blood pressure to time to coronary because it is not possible to account for system‑
heart disease, fitted to data that have some miss‑ atic differences between the missing values and
ing values of systolic blood pressure. When missing the observed values using the observed data. In
systolic blood pressure values are imputed, individu‑ such cases multiple imputation may give mislead‑
als who develop coronary heart disease should have ing results. For example, consider a study investi‑
larger values, on average, than those who remain gating predictors of depression. If individuals are
disease free. Failure to include the coronary heart more likely to miss appointments because they are
disease outcome and time to this outcome when depressed on the day of the appointment, then it
imputing the missing systolic blood pressure values may be impossible to make the missing at random
would falsely weaken the association between systo‑ assumption plausible, even if a large number of var‑
lic blood pressure and coronary heart disease. iables is included in the imputation model. When
data are missing not at random, bias in analyses
Dealing with non-normally distributed variables based on multiple imputation may be as big as or
Many multiple imputation procedures assume that bigger than the bias in analyses of complete cases.
data are normally distributed, so including non- Unfortunately, it is impossible to determine from
normally distributed variables may introduce bias. the data how large a problem this may be. The onus
For example, if a biochemical factor had a highly rests on the data analyst to consider all the possible
skewed distribution but was implicitly assumed to reasons for missing data and assess the likelihood of
be normally distributed, then imputation proce‑ missing not at random being a serious concern.
dures could produce some implausibly low or even Where complete cases and multiple imputation
negative values. A pragmatic approach here is to analyses give different results, the analyst should
transform such variables to approximate normality attempt to understand why, and this should be
before imputation and then transform the imputed reported in publications.
values back to the original scale. Different problems
arise when data are missing in binary or categori‑ Computational problems
cal variables. Some procedures21 may handle these Multiple imputation is computationally intensive
types of missing data better than others, 13 and this and involves approximations. Some algorithms
area requires further research.22 23 need to be run repeatedly in order to yield adequate
results, and the required run length increases when
Plausibility of missing at random assumption more data are missing. Unforeseen difficulties may
“Missing at random” is an assumption that justi‑ arise when the algorithms are run in settings differ‑
fies the analysis, not a property of the data. For ent from those in which they were developed—for
example, the missing at random assumption may example, with high proportions of missing data,
be reasonable if a variable that is predictive of miss‑ very large numbers of variables, or small numbers
ing data in a covariate of interest is included in the of observations. These points are discussed more
imputation model, but not if the variable is omit‑ fully elsewhere.25
ted from the model. Multiple imputation analyses
will avoid bias only if enough variables predictive Practical implications
of missing values are included in the imputation The imputation models that were used in the origi‑
model. For example, if individuals with high socio‑ nal and revised versions of the QRISK cardiovascu‑
economic status are both more likely to have their lar risk prediction tool discussed above have been
systolic blood pressure measured and less likely clarified. 26 The main reasons for the unexpected
to have high systolic blood pressure then, unless finding of a null association between cholesterol
socioeconomic status is included in the model used level and cardiovascular risk were omission of the
when imputing systolic blood pressure, multiple cardiovascular disease outcome when imputing
imputation will underestimate mean systolic blood missing cholesterol values and calculation of the
pressure and may wrongly estimate the association ratio of cholesterol to HDL based on imputed cho‑
between systolic blood pressure and coronary heart lesterol and HDL values, which led to extreme val‑
disease. ues of the ratio being included in estimations. The
It is sensible to include a wide range of variables impact of these pitfalls was increased by the high
in imputation models, including all variables in the proportion of missing data (70% of HDL cholesterol
substantive analysis, plus, as far as computation‑ values were missing).
ally feasible, all variables predictive of the missing
values themselves and all variables influencing the Reporting in recent literature
process causing the missing data, even if they are Multiple imputation usually involves much more
not of interest in the substantive analysis.24 Fail‑ complicated statistical modelling than the single
ure to do so may mean that the missing at random regression analyses commonly reported in medi‑
assumption is not plausible and that the results of cal research papers. However, constraints on the
the substantive analysis are biased. length of medical research papers mean that the
BMJ | online
RESEARCH METHODS & REPORTING
details of the imputation procedures are often partially reported comparisons of distributions of
reported briefly, or not at all. Peer reviewers’ lack key variables in individuals with and without miss‑
of familiarity with multiple imputation may make it ing data. The number of imputation based datasets
difficult for them to ask appropriate questions about was reported in 22 papers. Results of both imputed
the methods employed. and complete cases analyses were fully reported in
To examine recent use and reporting of multiple only seven papers, with one reporting sensitivity
imputation, we searched four major general medi‑ analyses. It was thus rarely possible to assess the
cal journals (New England Journal of Medicine, Lan- impact of allowing for missing data. The variables
cet, BMJ, and JAMA) from 2002 to 2007 for articles used in imputation models were rarely listed, and
reporting original research findings in which multi‑ the plausibility of the missing at random assumption
ple imputation had been used. Articles were located was rarely assessed or discussed.
by using search facilities on each journal’s website to
search for the phrase “multiple imputation” in the Suggested reporting guidelines
full text of all articles published during the speci‑ In the era of online supplements to research papers,
fied period. We found 59 articles, and the reported it is feasible and reasonable for authors to provide
use of multiple imputation roughly doubled over sufficient detail of imputation analyses to facilitate
the six years. peer review, without distracting from the substan‑
The table summarises the results of our survey. tive research question. Box 2 lists the information
Various methods for multiple imputation were that should be provided, either as supplements or
used, with the specific method often reported only within the main paper. This extends guidance pro‑
vaguely (for instance with a book reference). Thirty vided as part of the STROBE initiative to strengthen
six papers reported at least some information on the reporting of observational studies,27 and com‑
the amount of missing data, but only seven fully or plements suggestions for reporting of analyses
using multiple imputation in the epidemiological
Reporting of multiple imputation in 59 papers published in general medical journals from 2002 to literature.28
2007* Box 3 relates the suggested guidelines to the use
Reported characteristic No of papers of multiple imputation in a published paper that
Amount of missing data examined the cost effectiveness of chemotherapy
No 23 with that of standard palliative care in patients with
Partially 6 advanced non-small cell lung cancer.
Yes 30
Comparison of distribution of key variables in individuals with and without missing data
Summary
No 52
We are enthusiastic about the potential for multi‑
Partially 2
ple imputation and other methods14 to improve the
Yes 5
No of imputations
validity of medical research results and to reduce the
No 35
waste of resources caused by missing data. The cost
Yes 22 of multiple imputation analyses is small compared
Unclear† 2 with the cost of collecting the data. It would be a pity
Results if the avoidable pitfalls of multiple imputation slowed
Both multiple imputation and complete case results tabulated 7 progress towards the wider use of these methods. It
Multiple imputation results tabulated: is no longer excusable for missing values and the
Complete case results not reported 28 reason they arose to be swept under the carpet, nor
Complete case results in text 1 for potentially misleading and inefficient analyses of
Complete case results stated to be similar 2 complete cases to be considered adequate. We hope
Complete case results tabulated:
that the pitfalls and guidelines discussed here will
Multiple imputation results not reported‡ 1
contribute to the appropriate use and reporting of
Multiple imputation results in text 4
methods to deal with missing data.
Multiple imputation results stated to be similar 11
Stated no significant difference from multiple imputation 4
We thank Lucinda Billingham for checking our description of the article
described in box 3.
Sensitivity analysis done 1
Contributors: JACS, IRW, JBC, and JRC wrote the first draft of the paper.
Variables used in imputation
MS conducted the review of the use of multiple imputation in medical
No 53
journals and analysed the data. All authors contributed to the final draft and
No but normality discussed 1 subsequent redrafts of the paper. JACS, IRW, and JRC will act as guarantors
Yes 5 Funding: Funded by UK Medical Research Council grant G0600599. IRW
Plausibility of the missing at random assumption was supported by MRC grant U.1052.00.006 and JBC by NHMRC (Australia)
No 56 grant 334336.
Invalid discussion 2 Competing interests: None declared.
Yes (sensitivity analysis) 1 Provenance and peer review: Not commissioned; externally peer
*One paper that used multiple imputation to conduct sensitivity analyses rather than to deal with missing data was
reviewed.
excluded from the table. 1 Wood A, White IR, Thompson SG. Are missing outcome data
† Both papers used hotdeck imputation. One referred to a paper on multiple imputation but gave no further details, adequately handled? A review of published randomised
the other stated that “1000 imputation sequences” were used. controlled trials. Clin Trials 2004;1:368-76.
‡The methods section reports that a range of multiple imputation techniques were used to assess the robustness 2 Royston P. Multiple imputation of missing values. Stata J
and sensitivity of conclusions, but no results are reported. 2004;4:227-41.
BMJ | online
research methods & reporting
3 Royston P. Multiple imputation of missing values: update of ice. Stata

Box 2 | Guidelines for reporting any analysis potentially affected by missing data J 2005;5:527-36.
• Report the number of missing values for each variable of interest, or the number of cases 4 Multiple Imputation Online. Software.www.multiple-imputation.com.
with complete data for each important component of the analysis. Give reasons for 5 SAS Institute. The MI procedure. http://support.sas.com/rnd/app/
papers/miv802.pdf.
missing values if possible, and indicate how many individuals were excluded because 6 Little RJ, Rubin DB. Statistical analysis with missing data. 2nd ed. New
of missing data when reporting the flow of participants through the study. If possible, York: Wiley, 2002.
describe reasons for missing data in terms of other variables (rather than just reporting a 7 Vach W, Blettner M. Biased estimation of the odds ratio in case-control
studies due to the use of ad hoc methods of correcting for missing
universal reason such as treatment failure) values for confounding variables. Am J Epidemiol 1991;134:895-907.
• Clarify whether there are important differences between individuals with complete and 8 Carpenter JR, Kenward MG. A critique of common approaches to
incomplete data—for example, by providing a table comparing the distributions of key missing data. In: Missing data in randomised controlled trials— a
practical guide. Birmingham: National Institute for Health Research,
exposure and outcome variables in these different groups 2008. www.pcpoh.bham.ac.uk/publichealth/methodology/projects/
• Describe the type of analysis used to account for missing data (eg, multiple imputation), RM03_JH17_MK.shtml.
and the assumptions that were made (eg, missing at random) 9 Steyerberg EW, van Veen M. Letter: Imputation is beneficial for
handling missing data in predictive models. J Clin Epidemiol
For analyses based on multiple imputation 2007;60:979.
• Provide details of the imputation modelling: 10 Allison PD. Multiple imputation for missing data. A cautionary tale.
Sociol Methods Res 2000;28:301-9.
• Report details of the software used and of key settings for the imputation modelling 11 Carpenter JR, Kenward MG. MAR methods for quantitative data. In:
• Report the number of imputed datasets that were created (Although five imputed Missing data in randomised controlled trials— a practical guide.
Birmingham: National Institute for Health Research, 2008. www.
datasets have been suggested to be sufficient on theoretical grounds,10 11 a larger pcpoh.bham.ac.uk/publichealth/methodology/projects/RM03_
number (at least 20) may be preferable to reduce sampling variability from the JH17_MK.shtml.
imputation process29) 12 Goldstein H, Carpenter J, Kenward MG, Levin K. Multilevel models with
multivariate mixed response types. Stat modelling (in press).
• What variables were included in the imputation procedure? 13 Schafer JL. Analysis of incomplete multivariate data. London: Chapman
• How were non-normally distributed and binary/categorical variables dealt with? and Hall, 1997.
• If statistical interactions were included in the final analyses, were they also included 14 Scharfstein DO, Rotnitzky A, Robins JM. Adjusting for non-ignorable
drop-out using semiparametric non-response models. J Am Stat Assoc
in imputation models? 1999;94:1096-120.
• If a large fraction of the data is imputed, compare observed and imputed values 15 Carpenter JR, Kenward MG, Vansteelandt S. A comparison of multiple
imputation and inverse probability weighting for analyses with
• Where possible, provide results from analyses restricted to complete cases, for missing data. J R Stat Soc [Ser A] 2006;169:571-84.
comparison with results based on multiple imputation. If there are important differences 16 Rubin D. Multiple imputation for nonresponse in surveys. New York:
between the results, suggest explanations, bearing in mind that analyses of complete Wiley, 1987.
cases may suffer more chance variation, and that under the missing at random 17 Hippisley-Cox J, Coupland C, Vinogradova Y, Robson J, May M, Brindle
P. Derivation and validation of QRISK, a new cardiovascular disease
assumption multiple imputation should correct biases that may arise in complete cases risk score for the United Kingdom: prospective open cohort study. BMJ
analyses 2007;335:136.
• Discuss whether the variables included in the imputation model make the missing at 18 Peto R. Doubts about QRISK score: total/HDL cholesterol should be
important [ electronic response to Hippisley-Cox J, et al]. BMJ 2007
random assumption plausible www.bmj.com/cgi/eletters/335/7611/136#172067.
• It is also desirable to investigate the robustness of key inferences to possible departures 19 Hippisley-Cox J, Coupland C, Vinogradova Y, Robson J, May M, Brindle
from the missing at random assumption, by assuming a range of missing not at random P. QRISK— authors’ response [electronic response]. BMJ 2007 www.
bmj.com/cgi/eletters/335/7611/136#174181.
mechanisms in sensitivity analyses. This is an area of ongoing research30 31 20 Moons KG, Donders RA, Stijnen T, Harrell FE. Using the outcome for
imputation of missing predictor values was preferred. J Clin Epidemiol
2006;59:1092-101.
21 Van Buuren S, Boshuizen HC, Knook DL. Multiple imputation of
missing blood pressure covariates in survival analysis. Stat Med
Box 3 | Example of use of multiple imputation 1999;18:681-94.
Burton et al32 used data from a randomised controlled trial to compare the cost 22 Horton NJ, Kleinman KP. Much ado about nothing: a comparison of
effectiveness of chemotherapy with that of standard palliative care in patients with missing data methods and software to fit incomplete data regression
models. Am Stat 2007;61:79-90.
advanced non-small cell lung cancer. Costs were obtained for a subset of 115 patients but 23 Bernaards CA, Belin TR, Schafer JL. Robustness of a multivariate
were complete for only 82 patients. normal approximation for imputation of incomplete binary data. Stat
They gave the extent and distribution of missing data in table 1 of their paper. Patient Med 2007;26:1368-82.
24 Collins LM, Schafer JL, Kam CM. A comparison of inclusive and
and tumour characteristics were stated to be comparable in those with complete and
restrictive strategies in modern missing data procedures. Psychol
incomplete data, but the effect of treatment on survival was stated to differ. The authors Methods 2001;6:330-51.
used the multiple imputation procedure in SAS statistical software (PROC MI) to impute 25 Carpenter J, Kenward M. Brief comments on computational issues with
the missing data. Variables included in the imputation models were listed. Five imputed multiple imputation. www.missingdata.org.uk/mi_comp_issues.pdf.
26 Hippisley-Cox J, Coupland C, Vinogradova Y, Robson J, Brindle P. QRISK
datasets were created. A total run length of 12 500 iterations was used with imputations cardiovascular disease risk prediction algorithm—comparison of the
made after every 2500th imputation. Log and logit transformations were used to deal revised and the original analyses. Technical supplement 1. 2007.
with non-normality, and a two stage procedure was used to deal with variables with www.qresearch.org/Public_Documents/QRISK1%20Technical%20
a high proportion of zero values (semicontinuous distributions). Complete data were Supplement.pdf.
27 Von Elm E, Altman DG, Egger M, Pocock SJ, Gøtzsche PC,
transformed back to their original scales before analysis. Vandenbroucke JP, STROBE initiative. strengthening the reporting
The complete case analysis resulted in a higher mean cost for chemotherapy compared of observational studies in epidemiology (STROBE) statement:
with palliative care (£2804 (€3285; $4580), 95% confidence interval £1236 to £4290) guidelines for reporting observational studies. BMJ 2007;335:806-8.
than did the analyses using multiple imputation (£2384, 95% CI £833 to £3954). The 28 Klebanoff MA, Cole SR. Use of multiple imputation in the
epidemiologic literature. Am J Epidemiol 2008;168:355-7.
complete case analyses implied that chemotherapy was not cost effective (mean net 29 Horton NJ, Lipsitz SR. Multiple imputation in practice: Comparison of
monetary benefit −£3346), but the multiple imputation analyses implied that it was cost software packages for regression models with missing variables. Am
effective (mean net monetary benefit £1186), although confidence intervals were wide. Stat 2001;55:244-54.
30 Demirtas H, Schafer JL. On the performance of random-coefficient
In the discussion, the authors noted the multiple imputation analysis “assumes that pattern-mixture models for non-ignorable drop-out. Stat Med
the incomplete cost data are missing at random such that the missingness of the cost 2003;22:2553-75.
components are associated only with the observed data, either the observed covariates 31 Carpenter JR, Kenward MG, White IR. Sensitivity analysis after
or effectiveness.” They did not, however, discuss whether the missing at random multiple imputation under missing at random: a weighting
approach. Stat Methods Med Res 2007;16:259-75.
assumption was plausible or conduct sensitivity analyses investigating the robustness of 32 Burton A, Billingham LJ, Bryan S. Cost-effectiveness in clinical
the findings to assumed missing not at random mechanisms. trials: using multiple imputation to deal with incomplete cost
data. Clin Trials 2007;4:154-61.
BMJ | online

Missing Data and Multi Imputation

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Missing Data and Multi Imputation

Uploaded by

Copyright:

Available Formats

research methods

3 Royston P. Multiple imputation of missing values: update of ice. Stata

You might also like