Professional Documents
Culture Documents
SPSS Tutorials
SPSS Tutorials
SPSS Tutorials
Module 1
Type of file
- Introduction
o To open a file = file > open > data (there are also other types
o Then, select dataset and click open > a dataset window and an output
window will appear
o DATA.sav: data view and variable view
o OUTPUT.spo: each shown in a specific output window
o SYNTAX.sps: shown in a specific syntax window + can have a script in the SPSS
coding language (e.g. useful when needed to repeat the same command with
small changes)
- Data
o Data view: reports each case in a row and the values of each variable in
columns
o Variable view: for each variable (in rows), there are different types of info
reported in columns
If we click on the row of the chosen variable, we are moved in its
position within the data view
o Variable view info: name, type (e.g. numeric, string…), label (detail the name
of the variable), values (reports the legend of the levels, e.g. 1 “yes”, 2 “no”),
missing (report the levels for missing values, e.g. -1 “no answer”), measure
(reports variable type as scale of measurement used, i.e. scale, ordinal, or
nominal)
- Output
o Output window: reports everything that we asked SPSS to do and contains
the results that it provides us for the analysis
o Analyze > descriptive statistics > frequencies > drag variable and click OK
o Log and analysis (e.g. frequencies), which is divided into title, notes, statistics,
and variable question (e.g. do you have a phone?)
o Can be saved: file > save or save as...
- Syntax
o In the output file, log reports the code corresponding to the command
launched through the drop-down menu
We can use a syntax file to draft the script with the command lines of
the analysis we want to perform according to the SPSS language
o Analyze > descriptive statistics > frequencies and drag the variable to analyze
> if you click “paste” instead of “ok”, then the command will appear in the
syntax window
> from there, you can select it, click on the play green button, and the
results of the analysis will appear on the output window
Command name is blue, variable name is black, parameter is red, and
the option is green
Can add comments using * at the beginning of the phrase
Can copy and paste commands, e.g. changing variable name and
running the analysis again
- The four main menus
o In the graphical user interface of SPSS, there are 4 very relevant menus: data,
transform, analyze, and graph
1. Data: collects commands that perform operations at the dataset level, e.g.
merging datasets, sorting cases, filtering cases, extracting a smaller portion of
the dataset by selecting some variables…
2. Transform: commands for managing and preparing the dataset before doing
an analysis, e.g. recording a variable, creating a new variable through a math
operation on other variables, collapsing some variables etc…
3. Analyze: commands to perform analysis (descriptive statistics and model
computation)
4. Graph: commands to ask SPSS to plot a graph
Data management
- Creating a smaller dataset
o File > save as… > variables… (a new window appears – only the selected
variables will be saved) > “drop all” and select the needed variables >
continue (and paste or save)
- Selecting cases
o Data > select cases > “if condition is satisfied” > “if…” (a new window
appears) > select the needed variables, drag them and write the condition
(e.g. PESEX=2) > continue > choose final output
“Filter out selected cases”: from the variable view, double-click on the
row of the variable PESEX and you’ll see that the cases with value
different than 2 will be crossed out
If you launch a command, e.g. analyze > descriptive statistics >
frequencies > PESEX variable chosen > “ok” > the output
analysis will be run only on PESEX=2
“Copy selected cases to a new dataset” > “ok” > a new dataset will
appear with only the cases where PESEX=2 (remember to save this
window as it won’t automatically save)
“Delete unselected cases” (not suggested because those cases will be
deleted and won’t be recovered in any way)
- Transforming a variable
o Transform > compute variable
o “Target variable” (to give it a name) = “Numeric expression” (can search
“function group”, e.g. for log, do arithmetic > Ln”)
To write in “numeric expression”, either use the blue arrow or rewrite
the function (same thing to select the variable), e.g. LN(PTC1Q10) >
then click “ok”
Output window result + new variable appears on the main grid
In variable view, you can write a label for the variable
- Recording the values of a variable
o E.g. PEEDUCA is a categorical variable with many categories, so we might
want to reduce them (have only 3) in order to lower complexity
o Transform > “record into different variables…” > drag chosen variable > on the
right, write name and label and press “change”
o Click “old and new values” (a new window appears) > “system-missing” to
“system-missing” and add, “system- or user- missing” to “system-missing”
and add (don’t forget to do this!)
Then, old “value” e.g. 31, to new “value” e.g. 1 and click add
Then, old “range” e.g. 32 through 40, to new “value” e.g. 2
Then, old “range, value through highest” e.g. 41, to new “value” e.g. 3
Click “continue” (you’ll go back to the previous window) and “ok” >
results will appear in the output window (reporting syntax” and in the
data view (new variable will show)
o Useful when creating a dummy variable: identifying that the case belongs to a
category of interest, with value 1 (of interest) or 0 (not of interest)
Transform > “record into different variables…” > “reset” to clean up
Drag chosen variable and write name & label > “old and new values…”
Do everything as before with the system-missing and user-missing
Old “value” e.g. 31 to new “value” e.g. 1 (value of interest), and add
Old “all other values” to new “value” e.g. 0, “add” > “continue” >
“change” > “ok”
Evaluating association
- Crosstabs
o To study the association between two variables that are categorical or
numerical with a small number of values, we can use cross tabs
E.g. understand if visits to an art museum or gallery are related to
gender
o Analyze > descriptive statistics > crosstabs
Put one variable in rows and the other in columns
To evaluate the presence of association, we need a condition of
frequencies: click on “cells” > click “row” (relative frequencies of the
variable in column conditioned to the variable in row) and “column”
(for the opposite) > “continue”
To carry out a test for independence, click on “statistics” and “chi-
square” > “continue” and “ok”
o The output will show a new table “crosstabulation”
First row (count) = joint absolute frequencies (e.g.
Second row shows relative frequencies by row
Third row shows relative frequencies by column
o For example:
Module 3
Odds ratio
- Binary categorical value = odds ratio to evaluate association
o Association between response variable (visits to art museums) and variable
sex
- Analyze > descriptive statistics > crosstabs
o Response variable in column and independent variables in rows
o Cell > percentages = row > continue
o Statistics > risk and chi-square > ok
- Output = risk estimate shows the odds ratio of visiting an art museum or gallery,
comparing males against females
Module 4
Factor analysis
- To create new variables, equal to existing ones, e.g. N_jazz_zero_new: transform >
compute variable > choose and drag “number of live jazz performances” and “ok”
o Then, change the missing values to create the new variable: transform >
recode into same variables > choose and drag N_jazz_zero_new > select “old
and new values” (system or user missing have new value = 0) > continue
o Select “if” > “include if case satisfies condition” > choose and drag “attended
a live jazz performance > set it equal to =2 > continue > ok
- To perform a factor analysis, analyze > dimension reduction > factor
o Under variables, select and drag variables to include in the analysis
o Descriptives > under “statistics” select univariate descriptives and initial
solution, under “correlation matrix” select coefficients, significance levels,
inverse (for partial correlations), anti-image (for measure of sampling
adequacy for each variable), and KMO and Bartlett’s test (to understand if
factor analysis is possible because of strong correlation) > continue
o Extraction > select scree plot as well (can also change “extract” to “fixed
number of factors” and choose a number) > continue
o Rotation > Varimax > continue
o Options > select “sorted by size” and “suppress small coefficients” (can
choose absolute value, e.g. 0.3) > continue
o Scores > select “save as variables” (saving factors created as new variables –
will appear in variable view window) > continue > ok
Cluster analysis
- Analyze > classify > K-means cluster
o Can create it only on the variables created for the factor analysis
o Choose and drag variables and choose number of clusters
- Output window: iteration history could say “…iterations failed to converge…etc etc” >
could be a problem for the result > change this number for more robust results
o Analyze > classify > K-means cluster
o Iterate > maximum = e.g. 20 > continue
o Options > ANOVA table > continue > ok
- New output: iteration history “convergence ratio reached…” = good (also became = 0
at the last iteration)
o Number of cases in each cluster table shows uneven concentration in cluster
2 (too much) > need to run analysis again
o Analyze > classify > K-means cluster > change number of clusters to e.g. 5
o Save > cluster membership (to describe clusters) > continue > ok
- New output: iteration history is good + ANOVA p-values are significant + number of
cases in each cluster is still not balanced but it’s the best option
o Under variable view, there will be a new variable (QCL_1): can analyze it, e.g.
analyze > descriptive statistics > frequencies and choose the new variable
o Analyze > descriptive statistic > cross tabs (new variable in row and another
one in column)
Cells > click row and column percentages
Statistics > chi-square (to see if there is association)