Professional Documents
Culture Documents
Creating Dummy Variables in SPSS Statistics
Creating Dummy Variables in SPSS Statistics
Statistics
Introduction
If you are analysing your data using multiple regression and any of your independent variables
were measured on a nominal or ordinal scale, you need to know how to create dummy
variables and interpret their results. This is because nominal and ordinal independent variables,
more broadly known as categorical independent variables, cannot be directly entered into a
multiple regression analysis. Instead, they need to be converted into dummy variables. The
exception is ordinal independent variables that are entered into a multiple regression as
continuous independent variables, which do not need to be converted into dummy variables.
Therefore, in this guide we show you how to create dummy variables when you have categorical
independent variables.
First, we set out the example we use to show how to create dummy variables in SPSS Statistics,
before explaining how to set up your data in the Variable View and Data View windows of
SPSS Statistics so that you can create dummy variables. If you are unfamiliar with the use of
dummy variables, we recommend that you then read about some of the basic principles of
dummy variables and dummy coding, including: (a) the number of dummy variables you need to
create in your analysis; and (b) how to create dummy variables and dummy coding. In
the Procedure section that follows, we set out the simple, 3-step Create Dummy
Variables procedure in SPSS Statistics that can be used to create dummy variables. Finally, we
explain the SPSS Statistics output after running the Create Dummy Variables procedure,
including how your dummy variables will now be set up in the Variable View and Data
View windows of SPSS Statistics.
Note: If you find that the procedures in this guide do not cover the type of dummy variables
you want to create, please contact us. We may be able to add another guide to the site to help.
SPSS Statistics
Example used in this guide
In this guide we will be using the example of 10 triathletes who were asked to select
their favourite sport from the three sports they perform when doing a
triathlon: swimming, cycling and running. Their answers were recorded in the nominal
independent variable, favourite_sport , which has three categories: "swimming", "cycling" and
"running". This nominal independent variable, favourite_sport , was to be included in a multiple
regression analysis that also had a number of continuous independent variables. Since this
independent variable was categorical (i.e., nominal variables and ordinal variables can be
broadly classified as categorical variables), dummy variables had to be created before it could
be entered into the multiple regression analysis.
Important: Notice that favourite_sport is a nominal variable, but you can also create dummy
variables for an ordinal variable. Furthermore, the process for creating dummy variables is
the same irrespective of whether you have an ordinal or nominal variable, with the exception
of one small change you have to make when setting up your data, which is explained below.
SPSS Statistics
Setting up your data in SPSS Statistics
When creating dummy variables, you will start with a single categorical independent variable
(e.g., favourite_sport ). To set up this categorical independent variable, SPSS Statistics has
a Variable View where you define the types of variable you are analysing and a Data
View where you enter your data for this variable. In this section, we first show you how to set up
a categorical independent variable in the Variable View window of SPSS Statistics, before
showing you how to enter your data into the Data View window. We do this using our
categorical independent variable, favourite_sport , which has three categories: "swimming",
"cycling" and "running".
the tab in the bottom left-hand corner of the SPSS Statistics software.
The name of your categorical independent variable should be entered in the cell under
the column (e.g., "favourite_sport" in row to represent our
categorical independent variable, favourite_sport . There are certain "illegal" characters that
cannot be entered into the cell. Therefore, if you get an error message and you
would like us to add an SPSS Statistics guide to explain what these illegal characters are,
please contact us.
Note: For your own clarity, you can also provide a label for your variables in
the column. For example, the label we entered for "favourite_sport" was
"Triathlete's favourite sport".
The cell under the column should contain the information about the categories of
your categorical independent variable (e.g., "swimming", "cycling" and "running"
for favourite_sport . To enter this information, click into the cell under the column
for your independent variable. The button will appear in the cell. Click on this button and
the Value Labels dialogue box will appear. You now need to give each category of your
independent variable a "value", which you enter into the Value: box (e.g., "1"), as well as a
"label", which you enter into the Label: box (e.g., "swimming"). By clicking the button
the coding will appear in the main box (e.g., "1.00="swimming" for favourite_sport ). The setup
for our categorical independent variable is shown in the Value Labels dialogue box below:
You have now successfully entered all the information that SPSS Statistics needs to know about
your categorical independent variable into the Variable View window. In the next section, we
show you how to enter your data into the Data View window.
Your categorical independent variable will be displayed in the first column since this was the
order we entered the variable into the Variable View window. In our example, the responses of
the 10 triathletes are presented under the column. Now, you simply have to
enter your data into the cells under this first column. Remember that each row represents one
case (e.g., a case could be a single participant). Therefore, in row of our example,
the first case represented a triathlete whose favourite sport was "swimming". Since these cells
will initially be empty, you need to click into the cells to enter your data. You will notice that
when you click into the cells under the column, SPSS Statistics will give you a
drop-down option with your categories already populated.
Now that you have set up your data in the Variable View and Data View windows of SPSS
Statistics, we recommend reading next section: Understanding dummy variables and dummy
coding, where we explain the basic principles of dummy variables and dummy coding. However,
if you already familiar with the fundamentals of dummy variables and dummy coding, you can
skip this section and go straight to the Procedure section where we set out the Create Dummy
Variables procedure in SPSS Statistics that is used to create dummy variables.
SPSS Statistics
Understanding dummy variables and dummy coding
As we mentioned in the Introduction, if you are analysing your data using multiple regression
and any of your independent variables were measured on a nominal or ordinal scale, you need
to know how to create dummy variables and interpret their results. This is because categorical
independent variables (i.e., nominal and ordinal independent variables) cannot be directly
entered into a multiple regression. Instead, they need to be converted into dummy variables. The
exception is ordinal independent variables that are entered into a multiple regression as
continuous independent variables, which do not need to be converted into dummy variables. In
the sections below, we explain: (a) the number of dummy variables you need to create; and
(b) how to create dummy variables and dummy coding.
Name of the
categorical Type of Number of Number of dummy
independent variable categories variables
variable
Two One=Males
1 Gender Nominal (Males & "Females" is the
Females) reference category
Three Two=African
(African American &
3 Ethnicity Nominal American, Caucasian
Caucasian & "Hispanic" is the
Hispanic) reference category
Two=Low &
Three
Physical Moderate
4 Ordinal (Low, Moderate
activity level "High" is the
& High)
reference category
Four Three=Surgeon,
(Surgeon, Doctor & Nurse
5 Profession Nominal
Doctor, Nurse & "Therapist" is the
Therapist) reference category
"Strongly disagree"
Strongly
is the reference
disagree)
category
Five
(Business Four=Business
studies, studies, Psychology,
Psychology, Biological sciences
7 Subject area Nominal
Biological & Engineering
sciences, "Law" is the
Engineering & reference category
Law)
As shown in the table above, you only need to create one less dummy variable than the number
of categories in your categorical independent variable. This is because you only need to (and
should) transfer this number of dummy variables into a multiple regression when you have a
categorical independent variable. However, there are good reasons to create a dummy
variable for every category of the categorical independent variable: (a) it is more flexible
and (b) it allows multiple comparisons to be made (see the note below). In other words, if your
categorical independent variable has three categories you would create three dummy
variables, not just two.
It is more flexible:
When you have created a dummy variable for every category of your categorical independent
variable, you can then consider any category as a reference category. In our example, we
considered the "running" category as the reference category, which means we would have
transferred "swimming" and "cycling" into the multiple regression equation. However, if we
later changed our mind about our choice of reference category, we would have to run the
dummy variable procedure again (unless you have SPSS Statistics version 22 or above). For
example, let's assume we now wanted to consider the "cycling" category as the reference
category. We could now transfer the "swimming" and "running" dummy variables into the
multiple regression equation because we also have the "running" dummy variable.
Explanation: Dummy variables are simply new variables that act as "placeholders" for a
particular coding scheme. They do not contain any data at all, per se. Instead, data/values need to
be added to these dummy variables so that they can fulfil their purpose of representing the
categories of your categorical independent variable. There are many different types of coding
scheme that will dictate the values that are entered into dummy variables, but we use a very
common coding scheme called dummy coding or, alternatively, indicator coding (N.B., do not
get confused because dummy variables and dummy coding are not the same thing). Dummy
coding works by using each dummy variable to identify a specific category of a categorical
independent variable with the exception of a reference category, which we explain below.
Let's start by considering our example categorical independent variable, favourite_sport , which
has three categories: "swimming", "cycling" and "running". Since there are three categories,
there needs to be two dummy variables representing two of the categories, and a reference
category representing the third category.
For example, let dummy variable #1 represent the "swimming" category and dummy variable #2
represent the "cycling" category. This leaves no dummy variable for the "running" category. This
"missing" category is the reference category and it is not needed. Furthermore, it is entirely
your decision which category you want to use as the reference category. We could have just as
easily chosen the "swimming" category as the reference category rather than the "running"
category. The only reason we didn't is that by default SPSS Statistics uses the last category you
have coded in the Variable View for your categorical independent variable as the reference
category (see the note below).
When you create dummy variables you should give them a meaningful name. Since each of our
dummy variables represents a category of our categorical independent variable, it is customary to
refer to each dummy variable by the name of the category it represents. Therefore, we have
called dummy variable #1 "swimming" as it represents the swimming category. Similarly, we
have called dummy variable #2 "cycling" as it represents the cycling category. By creating these
two dummy variables, we will have two new columns in our data set in SPSS Statistics, as
shown below:
Published with written permission from SPSS Statistics, IBM Corporation.
Now that we have created two dummy variables and given them appropriate names, we need
to enter values into these variables so that each dummy variable really does represent its
category of the categorical independent variable. With dummy coding this is very simple. You
enter a "1" to represent any case (e.g., a participant in your data set) that has the category and
enter a "0" (zero) if they do not have the category. First, consider the "swimming" dummy
variable, as shown below:
Published with written permission from SPSS Statistics, IBM Corporation.
If one of the triathletes stated that "swimming" was their "favourite" sport, we would enter a "1"
into the cell under the swimming dummy variable column ( ) for that triathlete
who stated that swimming was their "favourite" sport. Alternatively, if one of the triathletes
stated that "cycling" or "running" was their "favourite" sport, we would enter a "0" into the cell
under the swimming dummy variable column ( ) for that triathlete who stated
that swimming was "not" their favourite sport (i.e., this means that either "cycling" or "running"
was that triathlete's favourite sport). This is highlighted below for all 10 triathletes:
We repeat this process for the other dummy variable, "cycling", as shown below:
Published with written permission from SPSS Statistics, IBM Corporation.
If one of the triathletes stated that "cycling" was their "favourite" sport, we would enter a "1"
into the cell under the cycling dummy variable column ( ) for that triathlete who
stated that cycling was their "favourite" sport. Alternatively, if one of the triathletes stated that
"swimming" or "running" was their "favourite" sport, we would enter a "0" into the cell under
the cycling dummy variable column ( ) for that triathlete who stated
that cycling was "not" their favourite sport (i.e., this means that either "swimming" or "running"
was that triathlete's favourite sport). This is highlighted below for all 10 triathletes:
By entering "1"s and "0"s into your dummy variables in this manner, you will havecreated a set
of dummy variables that you can enter into a multiple regression analysis. In
the Procedure section that follows, we show you how to create these dummy variables using
the Create Dummy Variables procedure.
SPSS Statistics
Procedure in SPSS Statistics to create dummy variables
There are two procedures in SPSS Statistics to create dummy variables: the Create Dummy
Variables procedure and the Recode into Different Variables procedure. In this guide, we
show you how to use the Create Dummy Variables procedure, which is a simple 3-step
procedure. However, it is only available if you have SPSS Statistics version 22 or later,
with version 26 (and the subscription version of SPSS Statistics) being the latest version of
SPSS Statistics. If you are unsure which version of SPSS Statistics you are using, see our
guide: Identifying your version of SPSS Statistics. If you have SPSS Statistics version 21 or
earlier or are interested in making multiple comparisons when carrying out your multiple
regression analysis, please see the Note below:
To create dummy variables when you have SPSS Statistics version 22 or later, follow the 3-
step Create Dummy Variables procedure below:
1. Click Transform > Create Dummy Variables on the main menu, as shown below:
Published with written permission from SPSS Statistics, IBM Corporation.
You will be presented with the Create Dummy Variables dialogue box, as shown
below:
Published with written permission from SPSS Statistics, IBM Corporation.
Variables for: boxby selecting it (by clicking on it) and then clicking on the button.
Also, enter a "root" name that can represent all of the new dummy variables into the Root
Names (One Per Selected Variable): box in the –Main Effect Dummy Variables– area. We
entered the root name "fs" as an abbreviation for our categorical independent variable,
"favourite_sport", as shown below:
Published with written permission from SPSS Statistics, IBM Corporation.
Also, the root name you enter into the Root Names (One Per Selected
Variable): box cannot be the same as the name of your categorical independent
variable, as shown below (i.e., where we have entered the root name, "favourite_sport",
to illustrate what we could not call our root name):
If the root name you enter is th sam as the name of your categorical independent
variable, as shown above, when you click on th button, you will get the
following warning
After carrying out the 3-step Create Dummy Variable procedure above you will have created
dummy variables for your categorical independent variable. In the next section, highlight the
output that is created in the Variable View and Data View of SPSS Statistics after running
this Create Dummy Variables procedure.
SPSS Statistics
Output and data setup in SPSS Statistics after creating dummy
variables
After creating your dummy variables, SPSS Statistics produces the following Variable
Creation table its IBM SPSS Statistics Viewer:
Published with written permission from SPSS Statistics, IBM Corporation.
The Variable Creation table confirms that you have successfully created dummy variables.
There should be as many rows as there are new dummy variables. Since we
created three dummy variables, there are three rows in the table, "fs_1", "fs_2" and "fs_3",
which reflect the root name and sequential numbering entered in Step 2 of the Create Dummy
Variables procedure in the previous section. For each of these dummy variables, a label is
provided in the table to make it clear which category of the categorical independent variable each
dummy variable represents. For example, the label, "favourite_sport=swimming", is provided
for "fs_1", indicating that "fs_1" is the dummy variable for the "swimming" category of the
categorical independent variable, favourite_sport .