Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 25

PREFACE

The Statistical Package for the Social Sciences (SPSS) is a comprehensive statistical analysis
software program that is widely used by behavioral researchers. The program can calculate
virtually any univariate of multivariate statistic used in behavioral research, and can also create
charts and tables for presentation of data. Furthermore, as you will see, SPSS is very easy to
use. Data are easily entered into a spreadsheet data editor where they can be modified and
transformed, and the necessary statistics can be chosen using a user-friendly set of pull-down
menus.

This manual is designed to help you learn the basics of SPSS for Windows. We will cover the
techniques of entering and saving data and working with the SPSS data editor in Chapter 1.
In Chapter 2, 3 and 4 we will review some methods of data definition, variable manipulation and
case manipulation. In Chapter 5 we will consider some procedures used to conduct statistical
analyses. In Chapter 6 we will consider the graphs and pivot tables. If you have followed this
manual carefully, and worked out the practice exercises and assignments, you will be a proficient
user of the basics of SPSS.

In this manual we will consider some examples of statistical procedures. Although the output is
sometimes explained to you, it is assumed that you have good backgrounds in statistics. For
instance, you should be aware of the general uses of the various descriptive statistics that
we will be computing and you should understand the basics of inferential statistics.
Understanding SPSS output is only possible in conjunction with knowledge of statistical
procedures.

The following 11 SPSS data files will be used to conduct your practice analyses.
They can be found in My Courses on the Student Portal (EBS2038).

- 1991 US General Social Survey.sav

- Automatic-recode-uk.sav

- Bier.sav

- Count-uk.sav

- Employee data.sav

- Esteem.sav contains the scores of a group of 100 college students on the Rosenberg
self-esteem scale.

- GSS93 Subset.sav

- Hsb75.sav contain responses from 28.240 high school seniors.


A random sample of 75 seniors was selected for the purposes of this
manual.

- Hwj100.sav contains information on 100 applicants for management positions at a large


corporation. The data come from a study of applicant profiles.

- Ode.sav lists information from 94 school districts in Northwest Ohio. The data come
from the Ohio Department of Education.

- Survey.sav represents a hypothetical survey from a group of 25 college students.


CHAPTER 1 : THE BASICS

Starting SPSS for Windows

SPSS for Windows runs under any of the Windows operating systems. Windows-based software
is accessed by double clicking with the left mouse button on the SPSS for Windows icon. This
icon will be found on your computer or computer network, probably in a folder that contains data-
base or other software programs. Once you find the icon simply click on it and wait for the
program to load.

The Data Editor

When SPSS begins, an image of the SPSS Statistics Data Editor will appear on your screen.
The SPSS data editor is a spreadsheet into which data to be analyzed can be entered by hand,
can be saved to a file, and into which data files that have already been created and stored can
be retrieved. The names of the variables will be listed across the top of the screen and the
individuals to be analyzed (they are known as the cases or the observations) will be listed down
the side. When the data editor is originally opened the variables do not yet have names (they are
only indicated as var). Each observation will be entered on one line of the matrix. There is no
limit to how many variables or observations can be entered.

Using the Menu Bar

Across the top of the screen is the menu bar which contains the pull-down menus which form the
basis of all commands. The choices for these menus include file, edit, view, date, transform,
analyze, graph, utilities, add-ons, window and help. The basic uses of these menus are
summarized.
Box 1 : Summary of SPSS Pull-down Menus

2
Use the File menu to open and save data files, to read in spreadsheet or database files
created by other software programs, and to print the contents of the Data Editor.
Use the Edit menu to cut, copy, and paste data values; to find data values; and to change
options settings.
Use the View menu to turn toolbars and the status bar on and off, to turn grid lines on and
off, and to control the display of value labels and data values.
Use the Data menu to make global changes to SPSS data files, such as transposing
variables and cases, creating subsets of cases for analysis, and merging files.
Use the Transform menu to make changes to selected variables in the data file and to
compute new variables based on the values of existing ones.
Use the Analyze menu to select the various statistical procedures you want to use, such as
cross tabulation, analysis of variance, correlation, and linear regression.
Use the Direct Marketing menu to understand your contacts, to improve your marketing
campaigns or score your data.
Use the Graphs menu to create bar charts, pie charts, histograms, scatter plots, and other
full-colour, high-resolution graphs.
Use the Utilities menu to get information about variables in the working data file and to
control the list of variables that appears in dialog boxes.
Use the Add-ons to go to the statistic guides, services or other applications.
Use the Window menu to switch between SPSS windows or to minimize all open SPSS
windows.
Use the Help menu to access online help on the many features available in SPSS.

3
Because the pull-down menus form the basis of everything that we are going to be doing, a
notational system that indicates how to use them has been developed. For instance, the first
command that we are going to use is to open an existing data file. This is done by clicking (once)
with the left mouse button on the word file at the top of the screen menu, then clicking (again
once) on the Open Data sub-menu. We will notate this procedure as: - - file, open - - to indicate
that you begin with file menu and preceed to the open sub-menu. This chapter will discuss
primarily commands located in the File and the Data pull-down menus. The other chapters will
discuss using the Data,Transform, Analyze and Graphs pull-down menus. The Help menu has a
wide variety of information about SPSS that you can refer to as you need to.

Opening an Existing Data File

As with all windows-based software, the names of data files in SPSS are indicated using a file
name and a file extension. For instance, the file survey.sav is said to have the file name of
survey and the file extension of sav. SPSS data files have a file extension of .sav. In some
cases the data that you will want to work with have already been entered and saved for you. For
instance, lets open the SPSS data file survey.sav. We will use the - - file, open, data - - menu
sequence. After doing so, the Open File box will appear. Go to ELEUM and download the 11
data files. You can save these files on your own drive. Then double-click on the file survey.sav
and wait for the data to load.

The data editor now contains all of the data in the file survey.sav. You can see that the variable
names across the top now reflect the variables in the survey data file. These include for instance,
the sex, ethnicity, and age of the participants, as well as variables indicating their life satisfaction
and family income. Perhaps all of the data does not fit on the screen, but you may use the scroll
bars at the bottom and the side of the data window to scroll up and down in the data file (you may
want to confirm that there are 25 cases in the data file). At this point the data can be modified in
many different ways should you desire to. For instance, we can delete or add new variables, and
delete or add new cases. For your information: the techniques for making these modifications are
summarized in Box 2. You do not have to make these modifications now.

4
Delete Variable Box
single-click on variable name in the top row 2 :
press delete key
Case
single-click on case number in far left column
press delete key
Add
New variable
click on an entry in a variable column
- - edit, insert variable - - (or click on the right mouse button)
New case
click on an entry in a variable row
- - edit, insert case - - (or click on the right mouse button)

Move a variable
create a new column for the variable (- - edit, insert variable - - )
click on variable name to highlight column containing variable
- - edit, cut - - the variable column
click on variable name in new column
- - edit, paste - - the variable in place

Summary of Data Editing Commands


Creating a New Data File

In many cases, the data that you need to analyze will not already exist as a file, but you will need
to enter the data yourself. Lets begin to do so by choosing - - file, new, data - -. At this point, if
you already have a data file open and you want to close this file, click on - - file, close - -. If you
have modified it, you will be asked to choose whether the changes should be saved or discarded.
If you have been using survey.sav and you have modified the file, lets just discard it (save
changes to the following dataset: NO!). We are now back to an empty data editor which we want
to insert our new data.

Defining Variables

Open again the file survey.sav. To define a new variable, we click on the label below Variable
View. Lets begin by defining a new variable for the sixth column in your data editor (so the sixth
row in your variable view). We want to give the variable a name. To do so simply type a name.
Then click on the button Type to indicate which type you would like to use.

Its also possible from this box to make other definitions or changes in the variables that you enter
into SPSS. For instance you can click on the labels box to create a longer label for your variable.
Although most of the variables that you will want to analyze are numeric variables, it is also
possible to enter other types of variables, such as those that contain letters (string type), by
changing the variable type using the Type button. If your variable is a nominal variable such as
the sex of the participant, you may either enter the variable as words (male and female) or as
numbers. However, it is always better to use numbers to define values of variables, because
SPSS requires numbers as input to many of its statistical procedures. Once you have entered the
numbers, you may label the different numeric values with words. This is also accomplished by
using the button Values. For instance, you could indicate that 0 indicates female and 1
indicates male (Value = 1, Value Label = male and Add; Value = 0, Value Label = female and
Add). Look at the values for column 1 by simply clicking on it.
At this point, return to the data editor (label Data View) and you can enter the values for each
of your cases in the column under the new variable name. To continue adding variables, simply

5
continue double clicking on the variable name boxes, changing each to the name you desire,
adding labels if you wish, and then entering the data for your variables. If you made a new
variable (sixth column) delete this column in the data editor now. The option Width in the
Variable View indicates the number of characters of your answers/survey results, and the option
Column indicates the width of the column.

Saving Data Files

Saving data files is essentially the same procedure as opening them, except we begin with - - file,
save - -. If you have created a brand new data file, you will need to type the name of the file in
the box at the top of the save file box. Remember to save the data file with an extension of .sav.
If you have already opened a file, the file name will automatically be there and you can choose
whether you wish to replace the file including the changes that you have made. You should
naturally be sure to save your data often in case there is a malfunction with the computer, and
also to be certain that you do not write over data files that you do not wish to destroy. Then in
doubt, make a new copy of the data file with a new name. This is accomplished using - - file, save
as - - and typing a new name for the file in the Save Data As box.

Printing Data in the Data Editor

You may wish to make a printed copy of the data in the data editor either for your own records or
to serve as a backup to your diskette files. You can print either the entire spreadsheet, including
all of your data or you can select a subset of those data to be printed.
To print the entire file, choose - - file, print - -. To choose a subset of the data, first locate the
mouse cursor in the upper left corner of the section of data that you wish to print, click on that cell
with the left mouse button (and hold it down), and then drag the mouse down to highlight the
block that you wish to print. At this point when you choose - - file, print - - you can choose to print
only the selected block.

Selecting Variables for Analysis

Open the data file survey.sav. As we begin to select variables for analysis in the following
chapters, we will need to learn how to choose the ones we wish to use. This is very easy. In each
case, when we select the statistical procedure we want to use, a box opens that allows us to
indicate the variables we want to analyze. For instance, use the sequence - - analyze, descriptive
statistics, descriptives - - and see the Descriptive Statistics dialogue box, a typical box for
entering variable names. Choose the variable you wish to enter (for instance SEX) and then click
on the arrow in the center of the box to move the variable into the right box. Its also possible to
exclude the variables by high lightening them in the right box and using the (reversed) arrow to
move them back to the variable list on the left. Continue until you have chosen the variables you
need, and then click on OK. Only the variables in the right box will be used in the analyses.

The Output Navigator Window

Output from SPSS procedures, such as the output from a descriptive statistics command, appear
in the Output Navigator Window. The output navigator window will appear whenever a procedure
is run, or you can move to it at any time by using the - - window, output 1 - - sequence at the top
just before Help or by switching below. The output navigator (or SPSS Statistics Viewer) contains
two panels. On the left side of the screen is a panel that lists the recent SPSS procedures that
you have run. For instance, you can see that the descriptive statistics procedure has been run.
The printed output of the procedure is in the right panel. You can locate output in the right panel
by clicking on its label in the left panel, or you can locate the output by scrolling up and down in
the right panel. To print output, highlight the part of the printout that you wish to print on either
panel and then choose - - file, print - - . To quickly delete output that you do not want, highlight it
in the left window, and press the delete key. It is also possible to add new information to the

6
output, for instance a title, by using the - - insert - - sequence while you are in the left panel, and
typing in your title. After viewing your output, if you need to return to the data editor, use the - -
window - - sequence, or click on the Survey.sav button on the toolbar below.

Practice Exercises

1. Retrieve the file Hsb75.sav from the data file. How many variables and how many
participants are in this data file ?

2. Enter the following data into a new SPSS data file. Define each of the variable names and
save the file with the file name newfile.sav. Notice that the first column is a participant
number that used to identify each of the observations, and that the answers of
participants 5 and 9 are missing. Enter the variable SEX using a numeric coding system
that you create (for instance 0 = female and 1 = male). So use the button Values under
the Variable View to label the values for the variable SEX. Define the Decimals for each
column.

7
ID STUDY HOURS GPA
SEX
1 32 3,6
1
2 16 3,5
0
3 21 2,8
1
4 23 3,7
0
6 8 3,5
0
7 4 3,7
0
8 10 2,5
1
10 15 2,3
0
11 31 3,0
0
12 40 3,9
1
13 5 3,1
0
14 28 2,7
1
15 15 2,3
0

3. Retrieve the data file, newfile.sav that you created in #1, and enter the following two
cases to the data file. Keep them ordered by participant number by using the insert case
command to add new cases as shown in Box 2.

ID STUDY HOURS GPA SEX


5 13 3,9
1
9 21 3,1
0

8
CHAPTER 2 : WORKING WITH DATA : DATA DEFINITION

The aim of this chapter is to define data. Open the data base 1991 US General Social
Survey.SAV.
As an example you will ask a frequency distribution of Race of Respondent (- -Analyze -
Descriptive Statistics, Frequencies -- variable race + OK). The table is not really readable. What
mean the codes 1, 2 and 3 ? We would like to see White instead of 1, Black instead of 2 and
Other instead of 3 in the frequency table.

2.1 Attribute values

So we are going to attribute value labels to the variable race. Go back to the data editor and
work in the Variable View. Select the variable which has to be labelled by clicking on the variable
name.

Double click on the button LABELS.


Describe after Variable Label what the variable race means, namely Race of Respondent.
Now click on the button VALUES:
type 1 after Value and White after Value Label. Add this to the list by using the button ADD.
type 2 after Value and Black after Value Label + Add.
type 3 after Value and Other after Value Label + Add +OK.
Ask again for a frequency distribution of the variable race (-- Analyze, Descriptive Statistics,
Frequencies). Now we see the Value Labels (White, Black and Other) instead of 1, 2 and 3.

2.2 Attribute missing values

In some cases you may find that not all data are available for all of your participants. When data
are missing, the entry in the data editor that would contain the value is usually left blank, and
SPSS indicates this by inserting a period into the cell. SPSS refers to this entry as a system
missing value. However, it is also possible to enter a value for the missing variable (such as 99 or
999) to indicate that this value means that the variable is missing (a user-defined missing value).
If you choose the latter option, you must then tell SPSS that your code represents a missing
value. This is accomplished through the Missing button under the Variable View label (Discrete
missing values).
When some values of a variable are missing this means that not all analyses will be able to be
performed on all participants. In some cases, such as when the information about the individuals
experimental condition is missing, it will mean that the individual cannot be used in any analyses.
In other cases (for instance when data are only missing on some, but not all, dependent
measures) the data from the individual can be used except for analyses involving that variable.

Go to the Variable View and click on the cell none under MISSING for the variable sex.
Choose Discrete Missing Values and fill in the value 999 for example. Confirm with OK.
This can be done too under VALUES (see 2.1 Attribute values). Here we indicate the missing
value 999 and also give a description for this missing value, for example 999 means not
answered or missing.
(Hint: the width in the Variable View!).
We can define the type of the variable and the format of the column too with TYPE, COLUMNS,
and ALIGN.

2.3 Make a template

If we make a new data base, often using the same variable, and we want to prevent that for each
variable we have continually to define the same column format, type and so on, we can make a
template (a kind of predefined variable).

9
Go to the Variable View. Select as an example a variable which you will often use. Highlight this
variable. Then EDIT COPY. Go to the end of your variables (end of all the rows) and highlight the
next empty row. Then EDIT PASTE. You just have to change the name of your new variable.
CHAPTER 3 : WORKING WITH DATA : VARIABLE MANIPULATION WITH TRANSFORM

Once the data have been retrieved from a file or once you have entered them by hand, you may
need to make some transformations to them to develop the variables that you wish to use for
statistical analyses. Some examples of the things that you might wish to do would be to recode
the initial values of the variables into other values, to add or multiply variables together or to
subtract them, or to select only certain cases for analysis. SPSS provides a full range of data
transformation capabilities.

3.1 Computations: compute variable

In many cases the user needs to compute new variables as mathematical combinations of old
variables. SPSS has many built-in arithmetic functions, statistical functions, distribution functions,
and string functions for you to choose from. For instance, we might want to take the sum of all of
the scores after some of the variables have been reverse scored.

Simple example: open the data base Employee data.SAV. Do not save the contents/changes
you made in the date base 1991 US General Survey.SAV.
Every person will get payed in proportion to the number of years of education (variable educ).
Lets say for each year of education one will get 500 Euro.
So a person with 10 years of education earns 5.000. A person with 11 years of education earns
5.500. To calculate a persons salary we have to compute the following formula
salary = 500 * education.
The new variable will be called wage.

Select TRANSFORM COMPUTE VARIABLE.


Fill in the new variable name wage into the Target Variable box in the upper left corner (you may
provide a label for the variable if you wish, using the Type&Label button), and fill in the formula
(500 * educ) under Numeric Expression. Confirm with OK.

With this new variable (at the right-hand side in your data base) you can do everything. For
example asking for the mean, asking for a histogram and so on.

Another example.
-- Transform, Compute Variable --.
The new variable will be called sum (in the Target Variable box).
We desire the mathematical combination sum for the existing variables salary and salbegin.
We create the expression by scrolling down in the Functions box (ALL) until we find the SUM()
function, and then use the upward facing arrow to paste it into the Expression box. Then we
paste in the names of the variables (salary, salbegin) by highlighting the variables in the left box
and using the right arrow to move them into the expression. You should have a comma between
each variable name and you will have to delete the ? sign : SUM(salary,salbegin). Once we are
finished, we press OK and the transformation is computed (see last column in data editor).

3.2 Make a classification: recode

Recoding variables involves changing one value of a variable into another value. This procedure
is accomplished through the - - transform, recode into different variables - - sequence, which
brings us the Recode Variables Box. Variables can be recoded either using an If ... statement, or
by using the Old and new values approach. In this section we will consider the old and new
values approach. Using an if statement, in which transformations only occur under certain
conditions, will be covered in the section on computational transformations.

10
In some cases, the desired transformation of the data is a simple recoding of one value into
another value.

Suppose you dispose of the variable educ (number of years of education) but you want to make
a new variable classedu (years of education classified).
As introduction we will have a look at the frequency distribution of educ.
Suppose you want to make the following classification:
1 = 1 - 12 years of education
2 = 13 - 16 years of education
3 = 17 or more years of education.

Select TRANSFORM - RECODE INTO DIFFERENT VARIABLES.


Fill in educ as Numeric Input Variable. Fill in the new variable name, namely classedu under
Output Variable if desired with a label, for example years of education classified.
Click CHANGE.
Choose now for the option Old and New Values and make the classification.
Under Old Value you indicate the values which have to be connected. In this case range lowest
through 12 and fill in 1 after New Value - Value. Confirm with ADD.
Then indicate range 13 through 16 and fill in Value 2 + Add. Indicate range 17 through highest
and fill in Value 3 + Add. (No overlap in the new ranges !).
Leave this window with CONTINUE and confirm with OK.
Note that we have used numerical codes for the different categories, which can be encoded
afterwards. We can also use text (string values) for example the categories low, middle and
high.
Attention: the use of string values has to be indicated under the option Old and New Values
(output variables are strings!). However, the philosophy of SPSS is : keep everything numerical.
See the difference by asking a new frequency distribution of classedu.

Another example. For instance, some of the items may have been coded such that a higher
number indicates that the individual has high self-esteem whereas other items may have been
coded such that a lower number indicates high self-esteem. We therefore need to reverse scores
of some of the variables. Let us consider the variables on the scale of self-esteem (Esteem 1 to
Esteem 10) in the file esteem.sav. Do not save the contents/changes in the data base Employee
data.SAV.
We want to change the scores on the variables esteem3, esteem5, esteem8, esteem9 and
esteem10, because they are scored in the opposite direction, for example (1 = 4), (2 = 3), (3 = 2),
(4 = 1) and (5 = 5). After the recoding, all of the variables will be scored in the same direction,
such that a higher number means higher self-esteem.

--Transform - Recode - Into Different Variables--


Move the variables that are needed to the right side, and then indicate what the new names for
the variables will be after they are recoded. The latter can be done by highlighting the variable
(esteem3) in the right box, typing the new name (esteem3r) into the Name box, and then clicking
on Change. We now have indicated the new variables with an r to show that they are reversed.
When we are finished, the old variables as well as the reversed variables will both be in the data
file.
Once the names of the variables are indicated, we want to click on the Old and new values.

Here we have to indicate each pair of old and new values (e.g. 1,4 - 2,3 - 3,2 - 4,1 and 5,5) and
then click on Add after each one. In addition to entering specific values to be recoded, you may
wish to use the Range button to indicate that a range of values should be recoded (for instance
we could indicate on a variable of salary that the range from 0 to 30.000 should be recoded into a
new value of 1, whereas the range from 30.000 to highest should be given a new value of 2).
Missing values can also be specified on both the old and new variables. Once you click on

11
Continue, and then OK, the recoding will occur, and the new variables will appear in the data file
(see last columns in data editor).

3.3 Automatic numbering of string values: automatic recode

Use the data base Automatic-recode-uk.SAV. Do not save the contents/changes in the data
base Esteem.SAV.

We can make a numerical variation of a alphanumerical variable. The string values have to be
changed in codes.

Select TRANSFORM - AUTOMATIC RECODE.


Select civilsta (civil status) and click this to the right.
Fill in a name after New Name, for example civil. Click on Add New Name and confirm with OK.
In the data window you can see the new variable. It seems to be that both variables are the
same, however the new variable is encoded and now has as Value Labels the civil status. You
can check this with -- View, Value Labels or by clicking on the white-red A/1 label.

3.4 Computations with a condition: if

In still other cases the transformations that you wish to compute should only be done under
certain conditions. If you desire this approach, you have to indicate under which conditions you
wish the transformation to be made using a computational statement.

If is comparable to compute, on the understanding that the computation will only be carried out
for those cases which fulfil a certain condition.

Open the data base 1991 US General Social Survey.SAV. Do not save the contents/changes in
the data base Automatic-recode-uk.SAV.

Select TRANSFORM COMPUTE VARIABLE.


The IF-button is present here with all possibilities.

Some relative operators are:


>= larger or equal to < smaller than
<= smaller or equal to = equal to
> larger than ~= unequal to.

For example the condition to make the transformation will be : age <= 25.

The Target Variable (new name) is for example newage.


The Numeric Expression will be age.
Click on the IF-button. Then Include If Case Satisfies Condition.
Select: age<=25. CONTINUE and confirm with OK.
See the last column in your data base.

You can also use a mix of logical relations via Compute Variable. See the window COMPUTE ...
IF ...
And & Or | Not ~

Example: sex=1 & age<=25.

3.5 Make an index variable: count

Count is useful to make an index variable that shows for every case how often (one or more)

12
values occur for (one or more) variables.

Example with the data base Count-uk.SAV. Do not save the contents/change in the data base
1991 US General Social Survey.SAV.
Suppose our cases are students who made 3 multiple-choice questions.
Alternative 1 was the right answer for each question. We would like to have one variable, lets say
Correct showing us the number of correctly answered questions.

Select TRANSFORM COUNT VALUES WITHIN CASES.


Under Target Variable you have to indicate the new variable name, Correct. You can directly give
a description by the option Target Label: Total Correct Answers.
Under Numeric Variables you have to indicate those variables for which you want to count the
correct answers per case, so: question 1, question 2 and question 3.
With the button Define Values you can indicate the values to what the variables satisfy: value = 1
(right answer for each question).
Click on ADD to drag this to the window Values to Count.
CONTINUE and confirm with OK. The new variable Correct will be shown.

Suppose now that the right answer to question 1 = 1 and to question 2 = 1 but the answer to
question 3 = 2. When you fill in the values to count 1 and 2 under Define Values, SPSS will check
each question on both values. This means that the number of correct answers will be equal to
three as someone answered alternative 2 for all questions. This is not the intention. Nevertheless,
we can count the number of correct answers with the option Count. Therefore we will have a look
to the Syntax which has been placed in the Syntax Window with Paste:

Select TRANSFORM COUNT VALUES WITHIN CASES


Under Define Values fill in answer 2 too (add value 2 to count). This is namely the correct answer
to question 3.
Fill in a new variable name under Target Variable, lets say Exact with the label Exact Correct
Answers. To get the syntax window click on the PASTE button in the window (next to OK). Now
we are going to change the old text in the syntax, namely

COUNT
exact = quest 1 quest 2 quest 3 (1) quest 1 quest 2 quest 3 (2) .
VARIABLE LABELS exact exact correct answers.
EXECUTE .
in
COUNT
exact = quest 1 quest 2 (1) quest 3 (2) .
VARIABLE LABELS exact exact correct answers .
EXECUTE .

Click on the option Run in the Window (next to Add-ons) Run All and close this window to go
back to your data base. You do not have to save the contents of your syntax. See the last column
for the Exact Answers.

Practical Exercises

1. Open the data file survey.sav (do not save the contents of Count-UK.sav) and use the
transform - recode into different variables (old and new values) procedure to recode the
variable ethnic (into the variable ethnic2), such that on the new variable all white students
have the value 1' whereas all other students have the value 2'.

2. Open the data file, Ode.sav (do not save the contents of survey.sav), and recode the

13
variable income into a new variable, called income2. Make income2 have two values (1
and 2), such that the value 1 represents individuals who have income less than $30.000,
whereas the value 2 represents individuals who have income greater than $30.000. Hint:
use the recode variables approach, and set the new value 1 using the range button
lowest through and the new value 2 using the range button through highest and never
use an overlap!

3. Open the file Ode.sav and compute a new variable (called ratio) that represents the
teachers salary (salary) divided by the number of students in the district (students).
Do not save the contents of Ode.sav.

14
CHAPTER 4 : WORKING WITH DATA : CASE MANIPULATION WITH DATA

4.1 Sort cases

Suppose, using the data base Employee data.SAV, we want to sort the educational level in
years of all persons.

Select DATA - SORT CASES.


Fill in the variable Education Level (educ) under Sort By.
Conform with OK.
The order of ID numbers has changed!

You can cancel the sorting by selecting DATA - SORT CASES again, replace the variable educ
to the left and select the variable employee code (id) ascending.
Confirm with OK. The order of ID numbers is now 1, 2, 3, ... again.

4.2 Select cases

You may find that you do not always want to include all of your cases in all of your analyses. For
instance, you may want to select only the participants who indicated that they are married or only
those who have an annual income greater than $30.000 for a given analysis. SPSS provides
several ways of selecting a subgroup of cases. Open the file survey.sav. Do not save the
contents of the data editor esteem.sav. Choose the -- data, select cases -- sequence and the
Select Cases box will appear. One common way of selecting is to use an if statement. That is,
we may wish to select participants if their family income is greater than $30.000. To do so, we
click on the If condition is satisfied box - If - and then create a computational statement that
indicates to select the case only if income > 30000'. After closing the computational box with
Continue and then closing the select cases box again with OK the selection is complete.

At this point the cases that do not meet the selection criteria will be either filtered or deleted
from the data file. It is safer to choose the Filter out unselected cases rather than the Delete
unselected cases option in the lower output box. In this case, the unselected cases are indicated
with a slash through the row number in the Data Editor, and will not be used in any further
analyses, but they are not deleted from the file.
If the Deleted option is chosen, however, the cases are deleted from the file (be careful not to
save the file or the cases will be lost). To turn filtering off and include all cases in your analysis
again, Select Data - All Cases. There is no selection anymore in your database (no slash through
row numbers) but you can see that the new variable (filter_$) is still present in your data editor.
This in case you need to do analyses only for participants for which the family income > 30000 (in
our example).

You can cancel the selection as follows : DATA SELECT CASES select all cases. Continue.
OK.

4.3 Analyses per subgroup: split files

Open the file employee data.sav. If you have to do a certain analysis for each group separately,
then it is not necessary to do this with Select Cases for one group and then again for another
group. With Split File you can do this all at once.

Suppose you want to do a separate analysis for men and women. So you have to split the data
base by gender.

Select DATA - SPLIT FILE.

15
Click on Organize output by groups.
The groups are based on gender. OK.
Ask for a frequency distribution, for example for current salary. The results are given in two
separate tables, one for men and one for women.

Split File can be cancelled by selecting DATA - SPLIT FILE again. Select Analyse All Cases, Do
Not Create Groups, and then by selecting DATA SORT CASES Sort by Employee code (id).

Practical Exercises

1. Open the database bier.sav. We want to sort the data in such a way that all prices of
beer are descending. Which beer is the most expensive one?
Undo this selection (names of the beer ascending).

2. First, select all American beers by using the variable herkomst (Land van herkomst).

herkomst = Verenigde Staten

(please note that the text has to be indicated between brackets and exactly in the
same way as in your data view. Verenigde Staten is not the same for SPSS as
verenigde staten.)

and then ask for a frequency distribution (herkomst). How many of all our beers are
American (frequency)? Undo this selection.

3. Split the data in American (Verenigde Staten), Dutch (Nederland) and Belgian (Belgie)
beers by using the variable herkomst (Land van herkomst).
Then ask for a frequency distribution of quality by using the variable Kwaliteit (oordeel).
How many valid percent of the Belgian beers (Belgie) scored perfect (= uitstekend)?

16
CHAPTER 5 : ANALYSE WITH ANALYZE

In this chapter we discuss some statistical analyses.

5.1 Describe: descriptive statistics

Descriptive statistics contains a number of popular procedures like frequencies, crosstabs and
descriptives. Descriptives presents important descriptive measures like means, minimum and
maximum.
Descriptives is only useful for variables of which the mean can be meaningful calculated, for
example age or salary. Names of cities or color of hair is meaningless.
Another menu-entry of Descriptive statistics is Explore. With Explore you can explore variables.
Open the data base 1991 US General Social Survey.SAV.
Interesting is for example the variable Highest Year of School Completed (educ) with a 95%
confidence interval.

-- Analyze, Descriptive Statistics, Explore --


Click educ to the Dependent list and choose for a 95% Confidence Interval for the Mean under
Statistics. Continue and OK. This is useful if you want to make some statements for the
population on the basis of a random sample. So: the average of years of education in the random
sample is 12.88 years. If we have to estimate the average of years of education for the
population, then the mean lies between 12.73 and 13.03 with a possibility of 95%.
Moreover, with Explore you can make a so-called boxplot, a graphic in which outliers are
displayed. This can be done for example for the variable age as follows.
-- Analyze, Descriptive Statistics, Explore, PLOTS, boxplot (factor levels together) --
Continue and OK. You can see in the boxplot that the average age of the respondents is +/- 40
years.

The goal of frequency distributions and descriptive statistics is to summarize the data so that they
are easy to see and understand. These procedures also help the user determine whether the
data have been entered correctly and are thus ready for further analysis. Lets consider some
possibilities for summarizing the variables in the dataset survey.sav, which contains hypothetical
raw data from 25 participants on five variables collected in a sort of mini-survey. The variables
include the respondentssex, ethnicity (ethnic), age, a measure of rated life satisfaction (satis)
and family income (income). Do not save the contents of 1991 US General Social Survey.sav.

Descriptive statistics are numbers that summarize the pattern of scores observed on a measured
variable. This pattern is called the distribution of the variable. Most basically, the distribution can
be described in terms of its central tendency - that is, the point in the distribution around which
the data are centered - and its dispersion or variability. Dispersion refers to whether the scores
are all tightly clustered around the central tendency, or wether they are more spread out away
from it. Central tendency is summarized using the mean, the median, and the mode, and
dispersion is summarized using the range, the variance, and the standard deviation.

Lets open the data file survey.sav, and compute descriptive statistics on the three variables, age,
satis and income. Use the - - analyze, descriptive statistics, descriptives - - sequence to open the
Descriptive Statistics box. Then move the three variables into the variables box and then click on
OK.

The output of the descriptive statistics procedure includes the mean, the standard deviation, and
the minimum and the maximum values (if we had wanted to obtain the range and/or the variance,
we would have clicked on the Options box in the Descriptive Statistics window and chosen these
statistics before running the command).

The median is used as an alternative measure of central tendency when the distribution is not

17
symmetrical. The median is obtained in SPSS using the - - analyze, descriptive statistics,
frequencies - - sequence, and then clicking on the Statistics box and selecting Median.
Frequency Distributions

Nominal variables, such as sex or ethnicity, can be summarized using a frequency distribution. A
frequency distribution is a table that indicates how many, and in most cases what percentage of,
individuals in the sample fall into each of a set of categories.

Lets compute a frequency distribution on the variable ethnic. After opening the dataset
survey.sav, the - - analyze, descriptive statistics, frequencies - - sequence will bring you to the
Frequencies dialogue box. At this point the variable ethnic has to be moved into the variables
box, to indicate that we wish to analyze only this variable. We dont need to choose any particular
statistics at this point, as the frequency distribution is the default output.
The frequency distribution shows that there are 3 African-Americans, 7 Asians, 5 Hispanic, and
10 White participants in the sample. The percentage of the total sample that have each ethnicity
is also indicated, as is the cumulative percentage. In this case, because the variable is nominal,
the cumulative percentage is not meaningful, however. The valid percent column will be different
than the percent column only if some cases have missing values.

Data Charts

The frequency distribution can be displayed visually using a bar chart, by clicking on the Charts
button and choosing Bar charts in the frequencies dialogue box. If you choose to create a bar
chart, a visual display of the frequency distribution will be produced. When summarized using a
frequency distribution or a bar chart, the characteristics of the sample are easily seen.

5.2 Compare group means: compare groups

Open the data base 1991 US General Social Survey.sav.

Under Compare Means you can find a lot of popular tests.

For example ANALYZE - COMPARE MEANS - MEANS.


With this option you will get means split up to certain groups.
By Dependent list we fill in educ. By Independent list sex.
In the output you can see that in the random sample, men average have 13.23 years of
education and women 12.63 years.

What proves this random sample difference for the population ?

A suitable test herefore is the T-test.

Select ANALYZE - COMPARE MEANS - INDEPENDENT SAMPLES T-TEST.


Drag educ to Test Variable and sex to Grouping Variable.
Define Groups (sex = 1 and 2) as follows: Group 1 = 1 (males) and group 2 = 2 (females).
CONTINUE and confirm with OK.

The output consists of two tables.


In the first one you find the describing statistics, in the second one the factual test.
We notice that on average, men have significantly more years of education than women, at
least .298 years of education more and at the most .906 years more.

5.3 Correlation and regression

Often one is interested in the connection of certain variables. For variables as age and salary,

18
correlation is herefore the suitable technique.
As an example we will have a look at the correlation between the variables of education.

Is it right the more education a father of a respondent has, the respondent himself has more
education too?
Or: the more education the respondent has, his/her partner has more education too?

Select ANALYZE - CORRELATE - BIVARIATE.


The variables are educ, maeduc, paeduc and speduc.
We calculate the Pearson Correlation Coefficient.
In the output we see that all Pearson correlations are positive.
So there is correlation !
We find the highest score between the education of the father and the education of the mother,
namely .672.
The respondents education particulary correlates with the partners education, namely .619.

By this kind of analyses one will often go further than correlations.


Can we forecast a variable on the basis of another variable?
For example, can we forecast the number of years of education of the respondent on the basis of
the number of years of education of the father?
The suitable technique herefore is regression.

Select ANALYZE - REGRESSION - LINEAR.

Drag educ to Dependent Variable and paeduc to Independent Variable.


In the output we see under B Constant 9.926 and under Highest Year School Completed, Father .
322.

The regression equation is then


years of education of the respondent =
9.926 + .322 * years of education.
Interpretation of the output:

if the father has 0 years of education, then we expect that the respondent still has 9.926,
lets say 10 years of education,
for two respondents of which their fathers education differ 1 year, their own education will
differ only .322 years (0.3).

The Pearson Correlation Coefficient

The Pearson product-moment correlation coefficient (Pearsons r) is used to specify the direction
and magnitude of linear association between two quantitative variables. The correlation
coefficient can range from r = -1.00 to r = +1.00. Positive values of r indicate that the relationship
is positive linear, and negative values indicate negative linear relationships. The strength of the
correlation coefficient is indexed by the absolute value of the correlation coefficient.

Lets practice computing some Pearson correlation coefficients using the dataset Hsb75.sav.
Pearson correlations are obtained using the - - analyze, correlate, bivariate - - sequence, after
which the Bivariate Correlation Dialogue box appears. Lets calculate the correlations among
three quantitative variables (visual, mosaic, and mathach), which represent the high school
students scores on tests of visual and spatial ability, and their mathematical achievement,
respectively. Simply move the names of these three variables into the variables box and click OK
(Pearson correlation has already been selected for you).

When there are more than two correlation coefficients to be reported, it is common to place them

19
in a correlation matrix. The output shows the correlation matrix produced by the Bivariate
Correlation Procedure. Note that in addition to the correlation coefficients, r, the two tailed
significance level (p-value) and the sample size (N) are also printed. For instance, we can see
from the output that the correlation between visual ability (VISUAL) and mathematics
achievement (MATHACH) is r = .423 and is highly significant (p = .000). Notice that in a SPSS
output, a p-value such as p = .000 means that the p-value is smaller than p = .001, and thus is
highly statistically significant. We can see that the sample size is 75.

Contingency Tables

A contingency table or cross tabulation displays the number of individuals who have each value
on each of two or more nominal variables (well only consider the case of two nominal variables
here). The size of the contingency table is determined by the number of values on the variable
that represents the rows of the matrix as well as the number of values on the variable that
represents the columns of the matrix. For instance, if there are two values of the row variable and
three values of the column variable, the table is a 2 x 3 contingency table.
Consider as an example the relationship between sex of student (sex) and whether the student
has taken geometry (geo) for the high school students whose data are in the data file Hsb75.sav.
After opening the data file, choose the sequence - - analyze, descriptive statistics, crosstabs - -
from the menu bar, and then select the two variables to be crossed. At this point you should click
on the Statistics button in the Crosstabs dialogue box to indicate which of the many available
statistics you wish to use to analyze the contingency table.
For this example, lets request the chi-square statistic for analysis by placing a check in the chi-
square box (other analysis options are described in statistics or research methods textbooks).
Continue and OK.

The output presents the output of the analysis. The 2 x 2 contingency table displays the number
of men and women in the sample (34 men and 41 women). These numbers are known as the
row marginal frequencies and the column marginal frequencies, respectively. The contingency
table also indicates, within each of the boxes (they are called the cells), the observed frequencies
or observed counts (that is, the number of each sex who have and have not taken geometry).
The Chi-square test of independence, symbolized as 2, is most commonly used to assess the
degree of association between two nominal variables. The output shows that the Pearson chi-
square statistic (2 = 12.714) is highly significant in our case, with a significance level of p < .001.
The degrees of freedom for the chi-square statistic is 1. This result indicates that there is an
association between the sex of the student and whether or not they have taken geometry. If we
examine the frequencies in the contingency table (-- Analyze, Descriptive Statistics, Crosstabs,
Cells - Percentages (Total) - - Continue and OK, we can see in the crosstabulation that a greater
percentage of boys (32% or 24 boys) than girls (16% or 12 girls) have taken geometry.

5.4 Non-parametric tests

Open the data base 1991 US General Social Survey.sav. Do not save the contents/changes of
Hsb75.sav.
We have seen that many classic tests have been developed for testing the correlation between
tidy variables like age and salary. If we have classified variables we may not use these tests.
Herefore we use non-parametric tests.

Select ANALYZE NONPARAMETRIC TESTS.

Example: Is our survey representative for the population if we pay attention to sex? Suppose for
example that we know that the half of the American population is a man. The expectation then is,
that in case of a representative random sample, the half of the respondents is a man too. Is this
right?

20
Select the option LEGACY DIALOGS - BINOMIAL.
Select sex as Test Variable List. We are going to test the half, so the Test Proportion is .50.

In the output we see under Asymp.Sig. that the possibility for the random sample to be
representative is .000 (remember: small significant values (. < 05) indicate that the observed
distribution differs from the hypothesized distribution). With other words, our random sample,
paying attention to sex, is not representative for the American society.

Practice Exercises

1. Compute a frequency distribution for the number of jobs that the job applicants in the
dataset Hwj100.sav have had. The variable is jobs. Do not save the contents of 1991 US
Social General Survey.sav. How many job applicants (frequency) had already 5 jobs?

2. Compute a bar chart for the frequency distribution in exercise 1.

3. Compute, by using Frequencies, the mean, median, mode, standard deviation and
variance for the teacher salaries in the dataset Ode.sav and do not display the frequency
table (see the option in the frequency distribution sequence box).
The variable is salary. Do not save the contents of Hwj100.sav.

21
CHAPTER 6 : GRAPHS AND PIVOT TABLES

SPSS contains a graphic part too. This offers you a number of possibilities to present your data in
a graphical way. See the menu GRAPHS LEGACY DIALOGS.

6.1 A pie chart / bar chart / line chart

Suppose you want to represent how many persons are Very Happy, how many Pretty Happy
and how many Not Too Happy.
Open the data base 1991 US General Social Survey.SAV.

Select GRAPHS - LEGACY DIALOGS - PIE. The most used are :


Summaries for Groups of Cases : EACH category of a variable you want to represent will
be a SEPARATE slice of the pie.
Summaries of Separate Variables : MORE variables will be represented in a pie chart,
EACH variable will be a separate slice.

In our example we will represent each category of a variable.

Select GRAPHS LEGACY DIALOGS - PIE - Summaries for Groups of Cases.


Click on Define.
Under Define Slices By you can indicate which category of variable you want to use. Click
General Happiness (happy) to the right and confirm with OK.
We see that EACH category of the variable is represented by ONE slice.

Bar charts and line charts can be made in the same way.

6.2 The menu chart

By double clicking on the graph you will get the opportunity to edit the graphic as desired,
because you are now opening the CHART EDITOR.
Options like text layout, text style, fill and border, title, footnote, to change the chart can be
found under Options. Other options like explode slice, show data labels, can be found under
Elements or by clicking on the right mouse button. Important : always first select the slice(s) you
want to change !

6.3 A scatter plot

SPSS contains different scatter plots (option Scatter/Dot under Graphs - Legacy Dialogs - in main
menu), for example
simple: one variable on the X-axis (horizontal) and one on the Y-axis (vertical).
overlay: one or more variables on the X-axis, more on the Y-axis.
matrix: each variable crossed against another variable. The variable names are mentioned
on the diagonal.
3D: one variable on the X-axis, one on the Y-axis and one on the Z-axis.

Example of a simple scatterplot. The independent variable age will be placed on the X-axis and
the dependent variable educ on the Y-axis.
If you need further information on a certain point in the scatter plot, then double click on the
graph to open the Chart Editor. Then click on the button Data Label Mode in the toolbar and then
on a point in the scatter plot. Now you see the corresponding case number in your data editor.

22
6.4 A histogram

Select, for example, the variable age and indicate in the -- Graphs Legacy Dialogs - Histogram
-- window that you want to display the normal curve. In the output you see the histogram of the
age of the respondents (std.dev. = 17.808, mean = 45.63 and the number of respondents =
1514). If you want to edit this histogram, then double click on the graph to open the Chart Editor.
(Options: toolbar or right mouse button).

PIVOT TABLES (to edit your tables)

6.5 Edit/open a pivot table

Open the database 1991 US General Social Survey.SAV.


Make a frequency distribution of the variable happy.
The output of frequencies consists of: title, notes, active dataset, statistics and the frequency
distribution itself. We will start with the table statistics.
Double click with the left button of your mouse on the frequency distribution table. Now the table
has been selected. Click once on the right button of your mouse to get options like SPSS
Pivoting Trays and Formatting Toolbar.

6.6 The menu pivot

Transpose rows and columns and cancel by using reset pivot to defaults
Click on the right mouse button and then on Pivoting Trays.
On the three borders of the frame you can see which dimension the table has. This table has
three dimensions (columns, rows and layers) but you only see two pivots: one pivot for the row
dimension (below, at the left) and one pivot for the column dimension (top, at the right).
The pivot in the row represents the variable happy and the pivot in the column the statistics.
Statistics can form a dimension too.

Move pivots
Select for example the pivot in the row dimension. Drag the pivot to the column dimension. See
the results: the lay-out of your table changes (or use the option - - Pivot, transpose rows and
colums - -).

6.7 Formatting in general

First double click on the table to open the OUTPUT VIEWER. Then select - - Toolbar - - with the
right mouse button.
To start we will make the column in which Frequency is represented much larger. Put the cursor
on the vertical dividing line and drag the mouse to the right.

To change the alignment of the text in the columns, select the text and use the different icons at
the right in the toolbar. Click somewhere outside the cell to de-select the cell.

The type and the size of the text can be changed too see the different icons in the toolbar.
Change text: always first click in the corresponding cell.
For more possibilities like value, marges, alignment and shading see Format - Cell Properties
(right mouse button).
To adjust the borders (style and color of lines) see Format - Table Properties (right mouse button
- - Borders --).

23
ASSIGNMENTS

DO NOT SAV E THE CO NTENTS / CHANGE S


IN THE DATA FILES

Data base : EMPLOYEE DATA.sav


1. Ask a frequency distribution of the variable educational level. Make a bar chart of this
variable. How many years of education are mostly represented in this chart?
2. Ask a frequency distribution of the variable salary and ask for the minimum and maximum.
How many persons have a salary of $ 20,850?
3. Make a crosstab of gender and employment category. How many men are manager?

Data base : BIER.sav


4. The prices are now in US Dollars. Make a new variable, called Euro, which indicates the
price (prijs) in Euro (lets say that 1 USD is equal to 0,695).
5. Ask for all statistics of the variables prijs en kosten (mean, median, mode, standard
deviation, variance, range, minimum and maximum). Do not show the frequency
distribution.
What is the price of the most expensive beer? And of the cheapest beer?
And what is the exact difference between the most expensive and cheapest beer?
Do the different beers vary more in price or in costs?
6. Ask a frequency distribution of alcohol and show this distribution. Study this frequency
distribution to find the median of the variable alcohol. What can you find in this table?

Data base : GSS93 SUBSET.sav


7. Take the variable age. Make a new variable, for example newage, which represents the
age in months. Use as label age in months. How old (in months) is the respondent in row
number 10?
8. Take the variable age. Make a new variable, for example agecat (categories for age : 10-
20 = 1, 21-30 = 2, 31-40 = 3, 41-50 = 4). In which new category of age can you find
respondent number 21?
9. Take the variable childs. Sort the cases based on the number of childs (ascending). How
many children has the respondent in row number 415? Undo this sorting!
10. Take the variable marital status. Select all married persons. The not-selected cases have
to be filtered. Which respondent (row number) is the first not-filtered/married? Undo this
selection!
11. Take the variables sex and childs. Split the data base. Make a frequency distribution. How
many men have 4 children and how many women have 4 children? Undo the splitting!

24
Data base : EMPLOYEE DATA.sav
12. Make a classification of educational level (new variable: neweduc), with the categories 8-
14 = low, 15-18 = middle, and 19-21 = high. Ask a frequency distribution of the new
variable. How many % of the respondents has a low educational level?
13. Make a classification of the current salary (new variable: newsalar), with the categories $0
- 19.999, $20.000 - 39.999, $40.000 - 59.999, $60.000 - 79.999, $80.000 - 99.999,
$100.000 - 119.999, and $120.000 - 140.000.
14. Take the variable previous experience and compute a new variable year (years instead of
months).
15. Make a crosstab of employment category and gender and ask for the chi-square. What is
the value of Pearsons chi-square?

Data base : GSS93 SUBSET.sav


16. We would like to know the average (mean) age of the respondents, as well as the
average of times the respondents read a newspaper. What are the standard deviations?
17. Make a bar chart of the variable race. The bars should represent the % of cases . Make
the title race and show data labels in the bars.

Data base : BIER.sav


18. Suppose, you think that the costs (kosten) of beer determine the price (prijs) of beer.
Regression is good technique can you use to study this supposition. What is the
regression equation here? How do you interpret it?
19. Suppose, we think that the costs (kosten) are not the only factor influencing the price
(prijs), but the % of alcohol too (so we suppose that different beers with the same costs
but a different % of alcohol will be different is price too). Check with (multiple) regression if
the variable alcohol is of importance for the price (prijs). Which variable if the most
important: costs (kosten) or alcohol?

Data base : EMPLOYEE DATA.sav


20. In 3 steps, we will study if there is a linear connection between salary and
(1) educational level,
(2) educational level and previous experience,
(3) educational level and previous experience and begin salary.
At each of the 3 analyses please mention the R-square and the standardized coefficients.
Which of the 3 used variables shows the greatest linear connection with the variable
salary: educational level, previous experience, or begin salary?

25

You might also like