Professional Documents
Culture Documents
How To Enter Data in SPSS
How To Enter Data in SPSS
1
Statistical Software Packages Most Commonly Cited in the
NEJM and JAMA between 1998 and 2002
SAS 302
SPSS 87
STATA 80
Epi Info 49
SUDAAN 43
S-PLUS 33
StatXact 18
BMDP 9
StatView 9
Statistica 8
2
Before you perform analysis in SPSS, let’s set up the following option.
Go to Edit, Options,..
3
SPSS Windows has 3 windows:
Data Editor
Data View window, which displays data from the active file in
spreadsheet format
4
SPSS Data View
5
SPSS Variable View
6
1.2 Data Entry into SPSS
7
Figure 1. Data from Hell
8
Data from Heaven
9
How to move from Hell to Heaven (1):
1. Add a patient’ ID number
2. Delete the first row with the title of the project
3. Delete the 2 rows under the variable name.
4. Delete the 2 row between the groups.
5. Delete the row of average at the bottom.
6. Add a variable called group and code the first 10 with Drug A as 1 and the
next 10 as 2.
7. Change the variable names to less than 8 or 8 characters with no spaces,
(you can use numeric, but not starting with numeric, avoid symbols).
8. Insert 2 columns before BP as SYSBP and DIASBP. Delete the BP text column.
9. Change missing values, NA, unknown, ?, to blanks.
10. Change age of 6 months to 0.5 (years). Fix errors.
11. Code males=1 and females=2.
12. Code complications as 0 for no and 1 for yes
13. Go back to the source and complete the missing information
14. If a column was entered as a string (words), you may have to select
the column and format the cells for change it to numeric.
10
General guidelines for data entry
4. Give each patient a unique, sequential case number (ID). Place this
ID number in the first column on the left
11
5. Each variable should be in its own column.
12
7. Each patient should be entered on a single line or row. Do not copy a
patient’s information to another row to perform subgroup analysis.
13
9. For yes/no questions, enter “0” for no and “1” for yes. Do not leave
blanks for no. Do not enter “?”, “*”, or “NA” for missing data because
this indicates to the statistical program than the variable is a string
variable. String variables cannot be used for any arithmetic
computation.
10. Put ordinal variables into one column if they are mutually exclusive.
Avoid: Preferred:
Pain Pain
Mild Moderate Severe
1 0 0 1
0 1 0 2
0 0 1 3
11. Do not make columns wider then 8 characters, unless absolutely essential.
14
Entering Date in Excel.
In Excel,go to:
Format, Cells, select Date under Category,
Choose Type for a format you like
15
Entering Time in Excel.
In Excel, go to:
Format, Cells, select Time under Category,
Choose Type for a format you like
16
Entering Date / Time in Excel.
In Excel, go to:
Format, Cells, select Time under Category,
Choose Data/Time format
17
Entering Date, Time in SPSS
In SPSS, open Variable View, Click Type for the variable you want to
Assign date format, click on Date, and select a format of your choice.
18
Importing data from Excel spreadsheet into SPSS.
In SPSS, go to:
File, Open, Data
Select Type of file (for example, Excel) you want to open
Select File name you want to open
19
Importing data from SPSS to Excel.
In SPSS, go to:
Data, Save as,
Select Type of file (for example, Excel) you want to save into
Give File name you want to save into
20
Data merging in SPSS (1)
1. Make sure that both files are sorted by Key variable in ascending order
2. In SPSS, open Data from Hell to Heaven.sav
3. Select Add Variables under Data, Merge Files
21
Data merging in SPSS (2)
4. Select the dataset you want to merge into the working file.
22
Data merging in SPSS (3)
5. Click on Match cases on key variables in sorted files,
6. Click on Both files provide cases
7. Highlight ID in the excluded variables box, then click ► near key
Variables
23
Note in Data merging in SPSS (3)
Cases must be sorted in the same order in both data files. If one or
more key variables are used to match cases, the two data files must
be sorted by ascending order of the key variable.
Variable names in the second data file that duplicate variable names in
the working data file are excluded by default because Add Variables
assumes that these variables contain duplicate information. Thus
before you merge data files, you need carefully to check two variables
with the same name. If two variables contain different information,
SPSS automatically delete variable from the file, which is being
merged into (Birthday.sav).
24
1.3 Data Cleaning in SPSS
25
Data cleaning in SPSS (1): Recoding existing variables (1)
Old New
ID Group Group
1 A 0
2 A 0
3 B 1
4 B 1
26
Data cleaning in SPSS (2): Recoding existing variables (2)
From SPSS dialog box, go to:
Transform
Recode
Into Same variables
27
Data cleaning in SPSS (1): Recoding existing variables (3)
1. Select Group from the variable box into String Variables box
2. Click on Old and new Values to proceed
28
Data cleaning in SPSS (1): Recoding existing variables (4)
1. Type the old value and the new value you want to convert into
2. Click on Add (To remove, or change, click on Change or Remove)
3. Type all values in the Old New box, then click Continue
4. Click OK to execute the commands.
29
Data Cleaning in SPSS (2)
Creating a new variable for Diastolic blood pressure (DiasBP):
In SPSS, go to Variable View,
Then type DiasBP at the last row under
Name
Go back to Data View and directly type diastolic blood pressure to separate
from SysBP. For ease of data entry, you can move DiasBP right after
SysBP. Now also edit sysBP.
30
Data Cleaning in SPSS (3)
Computing patient’s age from birthday and date enrolled into the study.
31
Data Cleaning in SPSS (4): Data labeling and formatting (1)
Specifying Type of Variable
HT
61.00
68.00
47.00
66.00
72.00
67.00
72.00
72.00
66.00
60.00
61.00
59.00
73.00
65.00
71.00
68.00
69.00
66.00
66.00
68.00
32
Data Cleaning in SPSS (4): Data labeling and formatting (2)
Data Labeling
33
Data Cleaning in SPSS (4): Data labeling and formatting (3)
Variable Formatting
34
Data Cleaning in SPSS (4): Data labeling and formatting (4)
35
Data Cleaning in SPSS (4): Data labeling and formatting (5)
Measurement category
36
Retrieve data property from existing files in SPSS (1)
Now let’s create a copy from “Data from heaven.sav” after you
delete formats and labels you just created. Save it as “Data
from hell to heaven without format.sav”. Modified
Note: Before you perform this commands, make sure that Type of
variables matched between the two datasets.
37
Retrieve data property from existing files in SPSS (2)
38
Retrieve data property from existing files in SPSS (3)
39
Using syntax in SPSS:
SPSS has its great advantage in producing high level graphs and
statistical analysis by easy point-and-click operations. However,
some people may criticize SPSS for irreproducibility of analysis which
were conducted before. In fact, SPSS has a high level capacity of
programming syntax which can be saved and repeatedly operated.
Throughout the course, I will provide “how to” box to conduct all
analysis used in the class, here I will show how to save your
commands in syntax. I highly recommend the use of syntax for
better organization on haw has been done.
40
Using syntax in SPSS (1): Creating a new syntax file
41
Using syntax in SPSS (2): Editing a syntax file
42
Using syntax in SPSS (3): Saving a syntax file
43
Using syntax in SPSS (4): Opening an existing syntax
44
Using a syntax in SPSS (5): Example Syntax
I find syntax very handy especially when you get tired of clicking so many times!
45
Using syntax in SPSS (6):Recoding syntax from command dialog box
You can in fact use command dialog box (point and click method) as your
main tool and still save what you did with point and click into syntax.
Then later you can simply execute the syntax to repeat the analysis.
Step 1
46
Step 2: Saved syntax from the previous PASTE command
47
Using syntax in SPSS (7): Executing the syntax
48
Data confidentiality
49
Communication with a biostatistician:
Most statisticians prefer to have data submitted as SPSS format or
in the statistical software they use. An advantage of entering data
directly into a statistical package, such as SPSS is that one can
enter variable label and value labels in the file.
When communicating with a biostatistician, also describe the research
problem, study hypothesis, and the primary comparison that you are
interested in. Explain any variables that need to be controlled for.
Explain the code used for missing values.
Also answer the following questions:
What is the name of your study?
What is the purpose of your study?
What is the type of your study?
Will all subjects be included in the analysis?
Was there any matched (repeated) measures?
How will outliers be defined and handled?
Has the data been cleaned?
What is our goal and deadline for this goal?
50