Imputation

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 10

Iancu Dorina

MDE
GR 2

Imputation
Dealing with missing data

Imputation is the process of replacing missing data with substituted values.


Because missing data can create problems for analyzing data, imputation is seen as a way
to avoid pitfalls involved with cases that have missing values.
Imputation preserves all cases by replacing missing data with an estimated value based
on other available information. Once all missing values have been imputed, the data set can then
be analyzed using standard techniques for complete data.
You put time and money into a research study. You do what you can to prevent missing
data and dropout, but missing values happen and you have to deal with it.
How do you address that lost data?
With all of these issues, you also need to determine if the data is missing:
First, determine the pattern of your missing data. There are three types of missing data:

Missing completely at Random: There is no pattern in the missing data on any


variables. This is the best you can hope for.
Missing at Random: There is a pattern in the missing data but not on your primary
dependent variables such as likelihood to recommend or SUS Scores.
Missing not at Random: There is a pattern in the missing data that affect your primary
dependent variables. For example, lower-income participants are less likely to
respond and thus affect your conclusions about income and likelihood to recommend.
Missing not at random is your worst-case scenario. Proceed with caution.

MAR can never be proved or falsified using data alone, as NMAR assumption asserts that
something is not available to the researchers from the observed data. It is possible, however, to
test if data are MCAR in many situations: if meaningful differences exist between those with and
without missing data for some variables, this provides evidence against MCAR.
Why do we use imputation?
All analytical software packages including SAS require complete case for an
observation to be included in the analysis. This means that if there are 100 variables and an
observation is missing just ONE value, the entire case is removed from the analysis.
And, you lose the other 99 perfectly good values.
We need a way to replace those values logically.

Seven things you can do about that missing data:


1.
List wise Deletion:
Delete all data from any participant with missing values. If your sample is large enough, then
you likely can drop data without substantial loss of statistical power.
Be sure that the values are missing at random and that you are not inadvertently removing a
class of participants.
2.
Recover the Values:
You can sometimes contact the participants and ask them to fill out the missing values.
For in-person studies, we've found having an additional check for missing values before the
participant leaves helps.
3.
Educated Guessing:
It sounds arbitrary and isn't your preferred course of action, but you can often infer a missing
value.
For related questions, for example, like those often presented in a matrix, if the participant
responds with all "4s", assume that the missing value is a 4.
4.
Average Imputation:
Use the average value of the responses from the other participants to fill in the missing value.
If the average of the 30 responses on the question is a 4.1, use a 4.1 as the imputed value.
This choice is not always recommended because it can artificially reduce the variability of
your data but in some cases makes sense.
5.
Common-Point Imputation:
For a rating scale of 5, we use the middle point or most commonly chosen value. For
example, on a five-point scale, substitute a 3, the midpoint, or a 4, the most common value (in
many cases).
This is a bit more structured than guessing, but it's still among the more risky options. Use
caution unless you have good reason and data to support using the substitute value.
6.
Regression Substitution:
You can use multiple-regression analysis to estimate a missing value. Regression substitution
predicts the missing value from the other values. In the case of missing SUS data, we had enough
data to create stable regression equations and predict the missing values automatically in the
calculator.
Instead of deleting any case that has any missing value, this approach preserves all cases by
replacing the missing data with a probable value estimated by other available information. After
all missing values have been replaced by this approach; the data set is analyzed using the
standard techniques for a complete data.
In regression imputation, the existing variables are used to make a prediction, and then the
predicted value is substituted as if an actual obtained value.

7.

Multiple Imputation:

The most sophisticated and, currently, most popular approach is to take the regression idea
further and take advantage of correlations between responses.
In multiple imputation, software creates plausible values based on the correlations for the
missing data and then averages the simulated datasets by incorporating random errors in your
predictions. It is one of a number of examples where computers continue to change the statistical
landscape.
Most statistical packages like SPSS come with a multiple-imputation feature. More on
multiple imputation.
In a multiple imputation, instead of substituting a single value for each missing data, the
missing values are replaced with a set of plausible values which contain the natural variability
and uncertainty of the right values.
This approach begins with a prediction of the missing data using the existing data from other
variables. The missing values are then replaced with the predicted values, and a full data set
called the imputed data set is created. This process iterates the repeatability and makes multiple
imputed data sets (hence the term "multiple imputation"). Each multiple imputed data set
produced is then analyzed using the standard statistical analysis procedures for complete data,
and gives multiple analysis results. Subsequently, by combining these analysis results, a single
overall analysis result is produced.
The benefit of the multiple imputation is that in addition to restoring the natural variability of
the missing values, it incorporates the uncertainty due to the missing data, which results in a
valid statistical inference. Restoring the natural variability of the missing data can be achieved by
replacing the missing data with the imputed values which are predicted using the variables
correlated with the missing data. Incorporating uncertainty is made by producing different
versions of the missing data and observing the variability between the imputed data sets.
Multiple imputation has been shown to produce valid statistical inference that reflects the
uncertainty associated with the estimation of the missing data. Furthermore, multiple imputation
turns out to be robust to the violation of the normality assumptions and produces appropriate
results even in the presence of a small sample size or a high number of missing data.
With the development of novel statistical software, although the statistical principles of
multiple imputation may be difficult to understand, the approach may be utilized easily.
Because I work in the research department Im always dealing with missing data. For
example, I want to know the prices for milling wheat, in three regions, for July. For that Ive
asked farmers to give me a price in order to obtain an average. Unfortunately I have a lot of
missing data. I decided to use the SPSS program to deal with my missing data. So, here is my
table where we can observe the missing data.
In this way, using SPPS program I tried to find the best predicted values for my missing data.
So, I started to follow the steps.

Step 1: Put the data in the program and define the variables.

Step 2:Analyse patterns

Step 3:Missing value analyses

Step 5: Imputed data


Missing data is like a medical concern: ignoring it doesn't make it go away. Ideally your data
is missing at random and one of these seven approaches will help you make the most of the data
you have.

You might also like