Download as pdf or txt
Download as pdf or txt
You are on page 1of 33

Missing Data

Multiple Imputation
Proposed Research
Handling Data with Three Types of Missing Values
Jennifer Boyko
Department of Statistics
University of Connecticut
Storrs, CT
Jennifer Boyko Handling Data with Three Types of Missing Values 1 / 33
Missing Data
Multiple Imputation
Proposed Research
Outline
1
Missing Data
Problem
Characterization
Methods for Handling
2
Multiple Imputation
Standard MI
Two Stage MI
3
Proposed Research
Procedure
Combining Rules
Ignorability and Rates of Missing Information
Application
4
Conclusion
Jennifer Boyko Handling Data with Three Types of Missing Values 2 / 33
Missing Data
Multiple Imputation
Proposed Research
Problem
Characterization
Methods for Handling
The Missing Data Problem
Present in many areas of research
Small amounts can cause issues (Belin, 2009)
Most statistical package defaults use complete case analysis
Problems include
bias
ineciency
unrealistic standard errors
Jennifer Boyko Handling Data with Three Types of Missing Values 3 / 33
Missing Data
Multiple Imputation
Proposed Research
Problem
Characterization
Methods for Handling
Pattern of Missingness
Maps which values are missing in a data set
Figure: Schafer & Graham (2002)
Jennifer Boyko Handling Data with Three Types of Missing Values 4 / 33
Missing Data
Multiple Imputation
Proposed Research
Problem
Characterization
Methods for Handling
Mechanisms of Missingness
Let
Y be the complete data partitioned as (Y
obs
, Y
mis
)
R be an indicator variable indicating whether or not Y is
observed or missing
be the parameter of interest
be the parameter of the missing data process
M
+
be a matrix the same size as Y containing 0s and 1s
corresponding to observed values of Y and missing values of
Y, respectively
Jennifer Boyko Handling Data with Three Types of Missing Values 5 / 33
Missing Data
Multiple Imputation
Proposed Research
Problem
Characterization
Methods for Handling
Mechanisms of Missingness
Missing At Random (MAR)
P(R|Y, ) = P(R|Y
obs
, )
Missingness depends on observed values of Y only
Missing Completely At Random (MCAR)
P(R|Y, ) = P(R, )
Missingness not dependent on observed or unobserved values
of Y
Special case of MAR
Missing Not At Random (MNAR)
Occurs when condition of MAR is violated
Missingness is dependent on Y
mis
or some unobserved covariate
Jennifer Boyko Handling Data with Three Types of Missing Values 6 / 33
Missing Data
Multiple Imputation
Proposed Research
Problem
Characterization
Methods for Handling
Ignorability
A missing data mechanism is classied as ignorable if two
conditions are met:
1
The data must be MAR or MCAR
2
and must be distinct
P(, ) = P()P()
Joint parameter space is the Cartesian cross-product of the
individual parameter spaces
Ignorability representes the weakest set of conditions under which
the distribution of R does not need to be considered in Bayesian or
likelihood-based inference of (Rubin, 1976)
Jennifer Boyko Handling Data with Three Types of Missing Values 7 / 33
Missing Data
Multiple Imputation
Proposed Research
Problem
Characterization
Methods for Handling
Older Methods
Complete Case Analysis (CCA)
Can produce biased results
Default in many statistical packages
Loss of information
Single Imputation
Fills in missing values with plausible values
Imputing unconditional means
Hot deck imputation
Conditional mean imputation
Last Observation Carried Forward (LOCF)
Jennifer Boyko Handling Data with Three Types of Missing Values 8 / 33
Missing Data
Multiple Imputation
Proposed Research
Problem
Characterization
Methods for Handling
Alternative Methods
Maximum likelihood
Bayesian
Multiple imputation
Jennifer Boyko Handling Data with Three Types of Missing Values 9 / 33
Missing Data
Multiple Imputation
Proposed Research
Standard MI
Two Stage MI
Standard Multiple Imputation
Multiple imputation (Rubin, 1987) uses a three step process to
analyze incomplete data sets:
1
Imputation
2
Analysis
3
Combination
Jennifer Boyko Handling Data with Three Types of Missing Values 10 / 33
Missing Data
Multiple Imputation
Proposed Research
Standard MI
Two Stage MI
Imputation Stage
Idea: ll in m > 1 plausible values for the missing data to
account for model uncertainty
Create m complete data sets by drawing from the posterior
predictive distribution of the missing values
Jennifer Boyko Handling Data with Three Types of Missing Values 11 / 33
Missing Data
Multiple Imputation
Proposed Research
Standard MI
Two Stage MI
Analysis Stage
Analyze each of the m data sets using complete data methods
Let Q denote the parameter of interest
Let

Q be the complete data estimate
Let U be the variance of Q
Assumption: (

Q Q)/

U N(0, 1)
Jennifer Boyko Handling Data with Three Types of Missing Values 12 / 33
Missing Data
Multiple Imputation
Proposed Research
Standard MI
Two Stage MI
Combination Stage

Q =
1
m
m

j =1

Q
(j )

U =
1
m
m

j =1
U
(j )
B =
1
m 1
m

j =1
_

Q
(j )


Q
_
2
T =

U + (1 + m
1
)B
Jennifer Boyko Handling Data with Three Types of Missing Values 13 / 33
Missing Data
Multiple Imputation
Proposed Research
Standard MI
Two Stage MI
Combination Stage
(

Q Q)

T
t

= (m 1)
_
1 +

U
(1 + m
1
)B
_
2
Jennifer Boyko Handling Data with Three Types of Missing Values 14 / 33
Missing Data
Multiple Imputation
Proposed Research
Standard MI
Two Stage MI
Benets of Multiple Imputation
Adds variability to the imputed values
Uses standard data analysis procedures after imputation
Can be very ecient
Can use the same set of imputations for several analyses
Jennifer Boyko Handling Data with Three Types of Missing Values 15 / 33
Missing Data
Multiple Imputation
Proposed Research
Standard MI
Two Stage MI
Two Stage Multiple Imputation
Two stage multiple imputation (Harel, 2009) considers a situation
where we can have data missing for two dierent reasons
Dropout in a longitudinal study vs. intermittent missing
follow-up
Refusal to answer a question vs. a dont know response
Latent variable vs. missing planned observed values
Death vs. dropout for other reasons
Unit nonresponse vs. item nonresponse
Jennifer Boyko Handling Data with Three Types of Missing Values 16 / 33
Missing Data
Multiple Imputation
Proposed Research
Standard MI
Two Stage MI
Computational Eciency
Originally developed by Shen (2000) with the intention of
improving computational eciency.
Y
1
Y
2
Y
3
Y
4
Y
5
?
?
?
? ? ? ? ?
? ? ? ? ?
? ? ? ? ?
? ? ? ? ?
Jennifer Boyko Handling Data with Three Types of Missing Values 17 / 33
Missing Data
Multiple Imputation
Proposed Research
Standard MI
Two Stage MI
Procedure
Imputation step is broken into two stages:
1
First draw m imputations of Y
A
mis
2
Conditioned on Y
A
mis
, draw n imputations of Y
B
mis
Yields a total of mn completed data sets
Jennifer Boyko Handling Data with Three Types of Missing Values 18 / 33
Missing Data
Multiple Imputation
Proposed Research
Standard MI
Two Stage MI
Two Stage MI Combining Rules

Q =
1
mn
m

j =1
n

k=1

Q
(j ,k)

U =
1
mn
m

j =1
n

k=1
U
(j ,k)
B =
1
m 1
m

j =1
_

Q
j .


Q
..
_
2
W =
1
m(n 1)
m

j =1
n

k=1
_

Q
(j ,k)


Q
j .
_
2
T =

U + (1 + m
1
)B + (1 n
1
)W
Jennifer Boyko Handling Data with Three Types of Missing Values 19 / 33
Missing Data
Multiple Imputation
Proposed Research
Standard MI
Two Stage MI
Two Stage MI Combining Rules

Q Q

T
t

=
1
m(n 1)
_
(1 1/n)W
T
_
2
+
1
m 1
_
(1 + 1/m)B
T
_
2
Jennifer Boyko Handling Data with Three Types of Missing Values 20 / 33
Missing Data
Multiple Imputation
Proposed Research
Standard MI
Two Stage MI
Benets
Can simplify imputation computationally
Able to quantify how much missing information is due to each
type of missing value which can help in planning future studies
Allows for dierent mechanisms of missingness for each type
of missing value (one ignorable and one nonignorable type of
missing data)
Jennifer Boyko Handling Data with Three Types of Missing Values 21 / 33
Missing Data
Multiple Imputation
Proposed Research
Procedure
Combining Rules
Ignorability and Rates of Missing Information
Proposed Research
1
Multiple imputation in three stages including derivation of
combining rules
2
Ignorability and rates of missing information
3
Application of methodology to cognitive functioning data
Jennifer Boyko Handling Data with Three Types of Missing Values 22 / 33
Missing Data
Multiple Imputation
Proposed Research
Procedure
Combining Rules
Ignorability and Rates of Missing Information
Benets
Extend the benets of two stage MI to allow for greater
specicity regarding the data analysis
Allows for missing data to be of three dierent types
Allows for three dierent assumptions of the mechanisms of
missingness
Can quantify the variability and missing information due to
each type of missing value
Jennifer Boyko Handling Data with Three Types of Missing Values 23 / 33
Missing Data
Multiple Imputation
Proposed Research
Procedure
Combining Rules
Ignorability and Rates of Missing Information
Example 1
Example of missing data due to dropout, intermittent missingness,
and a missing covariate
Y
1
Y
2
Y
3
Y
4
Y
5
?
?
?
?
? ?
? ? ?
? ?
? ? ?

Y
1
Y
2
Y
3
Y
4
Y
5
A
B
C
B
A B
C C C
B C
C C C
Jennifer Boyko Handling Data with Three Types of Missing Values 24 / 33
Missing Data
Multiple Imputation
Proposed Research
Procedure
Combining Rules
Ignorability and Rates of Missing Information
Example 2
Example with missing values due to item nonresponse, unit
nonresponse, and latent class
Y
1
Y
2
Y
3
Y
4
Y
5
? ?
?
? ?
?
?
? ?
? ?
? ? ? ? ?
? ? ? ? ?
? ? ? ? ?
? ? ? ? ?

Y
1
Y
2
Y
3
Y
4
Y
5
A B
A
A B
A
A
A B
A B
A C C C C
A C C C C
A C C C C
A C C C C
Jennifer Boyko Handling Data with Three Types of Missing Values 25 / 33
Missing Data
Multiple Imputation
Proposed Research
Procedure
Combining Rules
Ignorability and Rates of Missing Information
Process
Same as standard and two stage MI but with three stages in the
imputation step and dierent combining rules
1
Impute L values of Y
A
mis
2
Conditioned on Y
A
mis
, impute M values of Y
B
mis
3
Conditioned on Y
A
mis
and Y
B
mis
, impute N values of Y
C
mis
Yields a total of LMN completed data sets
A second, but equivalent, method draws simultaneously from the
joint distribution of Y
A
mis
, Y
B
mis
, and Y
C
mis
Jennifer Boyko Handling Data with Three Types of Missing Values 26 / 33
Missing Data
Multiple Imputation
Proposed Research
Procedure
Combining Rules
Ignorability and Rates of Missing Information
Three Stage MI Combining Rules

Q =
1
LMN
L

l =1
M

m=1
N

n=1

Q
(l ,m,n)

U =
1
LMN
L

l =1
M

m=1
N

n=1
U
(l ,m,n)
B =
1
L 1
L

l =1
_

Q
l ..


Q
...
_
2
W
1
=
1
L(M 1)
L

l =1
M

m=1
_

Q
lm.


Q
l ..
_
2
W
2
=
1
LM(N 1)
L

l =1
M

m=1
N

n=1
_

Q
(l ,m,n)


Q
lm.
_
2
Jennifer Boyko Handling Data with Three Types of Missing Values 27 / 33
Missing Data
Multiple Imputation
Proposed Research
Procedure
Combining Rules
Ignorability and Rates of Missing Information
Three Stage MI Combining Rules
T =

U + (1 + L
1
)B + (1 M
1
)W
1
+ (1 N
1
)W
2

1
=
_
_
1 +
1
L
_
B
T
_
2
(L 1)
1
+
_
_
1
1
M
_
W
1
T
_
2
(L(M 1))
1
+
_
_
1
1
N
_
W
2
T
_
2
(LM(N 1))
1
Jennifer Boyko Handling Data with Three Types of Missing Values 28 / 33
Missing Data
Multiple Imputation
Proposed Research
Procedure
Combining Rules
Ignorability and Rates of Missing Information
Ignorability
Extension of Rubins theory of MAR and ignorability as
presented in Rubin (1976)
Harel & Schafer (2009) present an extension to two types of
missing values
Conditional ignorability; possible to dene weaker conditions
under which M
+
can be ignored in one or more stages
Jennifer Boyko Handling Data with Three Types of Missing Values 29 / 33
Missing Data
Multiple Imputation
Proposed Research
Procedure
Combining Rules
Ignorability and Rates of Missing Information
Rates of Missing Information
Helps with determination of number of imputations required
at each stage
Small numbers of imputations are required when the main
concern is relative eciency of point estimates
Estimates for rates of missing information can be noisy for
small numbers of imputations
Derivation of the asymptotic distribution of rates of missing
information
I will derive the estimates and asymptotic distribution for the
rates of missing information for three types of missing values
Jennifer Boyko Handling Data with Three Types of Missing Values 30 / 33
Missing Data
Multiple Imputation
Proposed Research
Procedure
Combining Rules
Ignorability and Rates of Missing Information
Application
Cognitive functioning data
Three types of missing values will be dropout due to
dementia, dropout due to death unrelated to dementia, and
an intermittently missing covariate
Large amounts of missing data are common in studies of
cognitive functioning (Coley et al., 2011)
Jennifer Boyko Handling Data with Three Types of Missing Values 31 / 33
Missing Data
Multiple Imputation
Proposed Research
Conclusion
Applicable in analysis of many types of data sets
Allows researchers to quantify amount of variance attributable
to each type of missing value
Informative in analysis of data and planning of future studies
Jennifer Boyko Handling Data with Three Types of Missing Values 32 / 33
Missing Data
Multiple Imputation
Proposed Research
Belin, T. (2009). Missing data: what a little can do and what
researchers can do in response. American Journal of Opthalmology
148, 820822.
Coley, N. et al. (2011). How should we deal with missing data in
clinical trials involving alzheimers disease patients? Current
Alzheimers Research 8, 421433.
Harel, O. (2009). Strategies for Data Analysis with Two Types of
Missing Values: From Theory to Application. Saarbr ucken, Germany:
Lambert Academic Publishing.
Harel, O. & Schafer, J. L. (2009). Partial and latent ignorability in
missing-data problems. Biometrika 96, 3750.
Rubin, D. B. (1976). Inference and missing data. Biometrika 64,
581592.
Rubin, D. B. (1987). Multiple Imputation for Nonresponse in Surveys.
Hoboken, New Jersey: John Wiley & Sons, Ltd, 1st ed.
Shen, Z. (2000). Nested Multiple Imputation. Ph.D. thesis, Department
of Statistics, Harvard University, Cambridge, MA.
Jennifer Boyko Handling Data with Three Types of Missing Values 33 / 33

You might also like