Download as pdf or txt
Download as pdf or txt
You are on page 1of 37

Data Entry

DATE: 13-11-2022
Learning Outcomes 学习成果
At the end of this class, students will be able to:

❑ Describe the process of data entry on SPSS

❑ Create variables on SPSS

❑ Enter values into SPSS data view window

❑ Enter values for categorical variables

❑ Perform descriptive statistics

❑ Interpret analysis outputs

❑ Create graphical representation of descriptive statistical analysis

❑ Calculate and interpret Z-score

❑ Define, calculate, and interpret Pearson’s correlation

❑ Create scatter plot on SPSS


Introduction to Data Entry in SPSS
❑ Data entry in SPSS is the most important task involved in any analysis.

❑ Data may exist in any form; it may be written on a piece of paper or it may be

typed in a computer in raw data form.

❑ Before doing data entry in SPSS, one should start SPSS.

❑ As soon as SPSS opens, a window will appear, which is called the “data

viewer window.”

❑ If the data size is small, then the data entry in SPSS can be done manually.

❑ However, when a data size is large, then the data entry in SPSS is not

possible manually.

❑ In such case, the data can be imported into SPSS whether they are in MS

Excel, CSV (comma separated value), SAS, STATA etc.


Creating Variables in SPSS
Step 1: Launch SPSS application

Few things to know about naming a variable on SPSS

• No spaces. Blank spaces are not allowed in a variable name. For


example, First name and First_name. If you want to use 2 or more
words in a variable name, use underscore.
• Variable name cannot begin with a special character such as #, @,
& as so on.
• Variable name cannot begin with a number. It must begin with a
letter.
• Each variable name must be unique.
• Variable name should not be more than 64 characters long
Step 2
Click on Variable view -> Enter the variable names as shown below

❑ Height and Weight are both quantitative variables.


❑ They are measuring something.
❑ Both are set to scale because they are both ratio level.
Step 3
Click on Data View -> Enter the values as shown below
Step 4
❑ Enter values for your categorical variables.
❑ In this example, Gender is a categorical variable where 1-Male and 2-Female.
❑ Note that when a categorical variable has only 2 categories, we call it Dichotomous
Step 5

❑ Next is to analyze these data.

❑ One of the simplest things that we can do is to

count up how often things occur.

❑ For example, how many males and females

participated in the survey. We want their

frequencies.

❑ To do this, Click on Analyze -> Descriptive

Statistics -> Frequencies


Step 6
You should see the window shown below ❑ All the variables in our dataset are on the left, and the
variables that we want to analyze go on the right.

❑ You can select a variable for analysis by clicking on its


name and then clicking on the arrow between the
boxes.

❑ Alternatively, you can drag and drop or even double


click the variable.

❑ Your window should look like the one shown below.


❑ Click OK to see how easy it is to run analysis on SPSS.
Step 7
❑ What you are seeing now is the output window.
❑ When compared with other statistical analysis tools, SPSS will give you
copious amounts of output, often more than you really need, and you need to
know how to interpret that output.
❑ In SPSS, it is easy to run an analysis, but interpreting the output requires
practice.
❑ First, we see a summary of the variable in the box labeled “Statistics”.
❑ We have 12 valid scores for gender with no missing data, but for height, we
only have scores for 10 people, with 2 missing values. Valid = participants for
whom we have scores.
❑ The first frequency table is for gender. The total tells us that we have 12 valid
scores. We see that there are 5 males and 7 females.
❑ Notice the columns for “Percent” and “Valid Percent”. They are exactly the
same because we have no missing values for gender.
❑ The second frequency table is for weight. Remember that we have missing
values for weight for 2 of our participants, so we see the valid total is 10. Two
values are missing in the dataset called Missing and the total is 12.
❑ We see that the Percent column is different than the Valid Percent column.
The Percent column is calculated based on the total sample size of 12, while
the Valid Percent is calculated on the valid n of 10 participant for whom we
actually have data.
Step 8
❑ To make some graphical representation of our analysis, ❑ You should see a window like the one shown below.

Click on Analyze -> Descriptive Statistics -> Frequencies


Click on Charts -> Bar chart -

> Percentages -> Continue ->

uncheck Display frequency

tables since you already have

them -> OK
Step 9

❑ In the output window, you can see the

chart for gender and it really looks good.

❑ We have two distinct bars, one for male,

one for female, and we can estimate the

percentages of each.
Step 10
❑ When we look at the bar chart for height, the options
don’t look as good. ❑ We can do better by running another analysis.

❑ Click on Analyze -> Descriptive Statistics -> Frequencies -> Charts ->

Histograms -> check Show normal curve on histogram -> Continue


Step 11
❑ Click on Statistics -> choose other options as desired -> Continue
❑ Now click OK
Step 12
❑ Your output window should display as shown below.
❑ You will notice that all the statistics we asked for were displayed on the
Statistics table.
❑ The important thing to learn here is that, you should

choose the statistics and the graphs that are

appropriate to your data.

❑ A nominal variable like gender should be reported with

frequencies and a bar chart.

❑ Scale variables like height should be reported with a

mean, standard deviation, and a histogram.


❑ If we want to examine the differences in height based on gender,
❑ Click on Analyze -> Compare Means -> Means

❑ Drag and drop gender into the independent list box -> Height to dependent
list box -> OK
❑ Now we can see the mean and the standard deviation
from males and females separately and together.

❑ Overall frequency counts, charts, and descriptive statistics are

a great way to take a peek at your data and see just what you

have.

❑ It is a good idea to do this before running any other kind of

analysis because they describe what your dataset looks like.

❑ There are 5 each for males and females, 10 total.

❑ We see that males were a few cm taller on the average than


females.

❑ The total mean and standard deviation here are the same as
the values that we got earlier using the frequencies command.
❑ To get more descriptive statistics, Click on Analyze -> Descriptive
Statistics -> Descriptives

❑ Move the variables as shown below. ❑ Click on Options.

❑ Here we are presented with all kinds of options for descriptive statistics.

❑ Some of these options are already checked by default. Others are


available to check if we want them.

❑ We just go with the default selected options -> Continue -> OK


❑ We have some useful information on the output window.

❑ For example, we see the number of valid scores for each variable.

❑ We also see the Valid N (listwise), which is the number of complete


cases with no missing data.

❑ For gender, we see that the minimum is 1 and the maximum is 2. That is
useful atleast for checking that we do not have any data entry errors, but
the mean and standard deviation for gender is pointless.

❑ The average of 1.58 for male and female doesn’t really tell us anything.

❑ However, the mean and standard deviation for height and weight can be
very useful.

❑ For example, the average weight was 58.50 kg.

❑ The standard deviation was 9.49 kg which tells us that about two-thirds
of our participants are going to weigh between 9.49 kg heavier and 9.49
kg lighter, than the mean of 58.50 kg.
❑ Another method to get descriptive statistics that will give you even more detail about each variable
and more options for plotting and statistics is given below.

❑ To do this, go to Analyze -> Descriptive Statistics -> Explore

❑ Move height and weight into the


dependent list box as shown
below -> Statistics -> Outliers ->
Percentiles -> Continue

Outliers are data points that are far


from other data points and they can
distort statistical results.

❑ Click on Plots -> Histogram -> Continue -> OK


❑ Now we have every kind of descriptive statistics you could

dream of.

❑ We have a special box for Percentiles.

❑ We have another box that would identify if we have any

outliers or extreme values.

❑ The output window also came with histogram plus a small

stem-and-leaf plot and a new graph called a box plot.

❑ You can see that we got a lot of information here.


❑ We can even split all of this by gender.

❑ To do this, Analyze -> Descriptive Statistics -> Explore ->


❑ Now you will notice that the same statistics have now been
move gender into the factor list box as shown below -> OK calculated separately for male and female.
❑ The box plots are now side-by-side, so that we can do
comparisons. The last piece of the box is Quartile 3. It
divides the upper 25% observations from
the rest of the data.

The line at the middle is the median. The median


divides the lower 50% of observations from the ❑ Knowing about these options for descriptive statistics can
upper 50% of observations
help us to visualize our data, depending upon the level of

information that we need.

❑ If you need only the basics, use the Descriptive

command.

❑ If you want flexibility to choose exactly what output you get,

use Frequencies.

❑ If you want to know the exquisite details or to split the


The tails are called the whiskers This part of the box is called Quartile 1.
It separates the lowest 25% analysis by a categorical variable, use Explore.
observations from the rest of the
Lowest weight observation observations
observed for Male and
Female
Standard Score (z-score)
❑ A Z-score is a numerical measurement that describes a

value's relationship to the mean of a group of values.

❑ Z-score is measured in terms of standard deviations

from the mean.

❑ If a Z-score is 0, it indicates that the data point's score

is identical to the mean score.


Z-score Calculation on SPSS
❑ Next is how to convert descriptive statistics to z scores.

❑ To standardize values. Go to Analyze -> Descriptive Statistics -> Descriptives

❑ Move gender out of variable list -> check Save standardized values as
variables -> OK
❑ In the output window, we see exactly the descriptive statistics
table we had earlier.

❑ If you minimize the output window, and go back to your


spreadsheet or data view window, you will notice two new
variables. They are called “Z height” and “Z weight”.

❑ Those are the z-scores.

❑ SPSS converts each height measurements into a standardize score


that tells us how many standard deviation units this score is away from
the mean.

❑ Negative z-scores mean that a raw score is below average.

❑ A positive z-score means that it is above average.

❑ That is a quick and easy way to get z-score in SPSS.


Correlation Calculation on SPSS
❑ Notice how the data have
been setup.
❑ The correlation that we are going to be doing is
❑ Each person has a pair of
called Pearson’s r.
scores.
❑ A Pearson’s correlation describes the ❑ Your height should be paired
with your weight. it is very
relationship between two variables.
important that each pair
stays together.
❑ Pearson’s r ranges between -1.00 and +1.00
❑ We have 10 pairs of scores.
❑ 0 indicates no relationship at all.
❑ Our sample size is 10.
Sample size (n) = number of
❑ The closer the correlation n is to either +1 or -1,
pairs
the stronger the relationship between the
❑ Each pair counts as one
variables. case.
❑ The 2 people without height
❑ In this example, we are interested in the and weight scores are not
relationship between height and weight. going to be included in this
analysis. SPSS will simply
ignore those cases with
missing values.
❑ To calculate the correlation, go to Analyze -> Correlate -> A Pearson’s r
correlates 2 variables so choose Bivariate as shown below.

❑ Drag and drop Height and Weight into the variable box as
shown below -> OK.
❑ The box shown below is called a correlation matrix.

❑ It shows the correlation coefficient for every combination of variables.

❑ In this quadrant of the matrix, we see the correlation coefficient

between height and itself.

❑ It is 1 means it is a perfect correlation.

❑ Same applies to Weight as it also has a perfect correlation.

❑ We have 2 rows. One for Height and one for Weight, and we have 2 columns,
one for height and one for weight.

❑ Where each row and column intersect, we see the correlation coefficient
between the two variables.
❑ SPSS will compare every combination of variables including each
variable and itself.

❑ We already know that every variable will always correlate with itself at a
❑ The top right box is the correlation coefficient. It will always be
+1, no matter the variable.
between +1 and -1.
❑ The interesting correlations are in the off diagonals as shown below.
❑ Below that is the significance level. Significance level smaller than
0.05 are statistically significant.

❑ Below that is the N or the sample size, which is our 10 pairs of


scores.

❑ Notice that the off diagonal correlations are the same (.127)
because height correlates with weight exactly the same as weight
correlates with height.

❑ In this case, it is a .127 which is pretty strong, but not significant


because the sample size of 10 is pretty small. p(.728) > .05

❑ You are always more likely to find significance with larger sample
sizes.

❑ If this correlation was significant, we would see some asterisks


next to the coefficient.
In A Case of More Than 2 Variables
❑ You can correlate more than 2 variables at a time, and you can even
use correlation with nominal variables as long as it only has two levels.
❑ Drag and drop gender into the variables box as shown below -> OK
❑ This is called a point biserial correlation.
❑ To do this, go to Analyze -> Correlate -> Bivariate as shown below.
❑ Now we get another correlation matrix, but this time it is bigger.

❑ It has three rows and three columns.

❑ The correlation between height and weight are exactly the


same as before (.127), but we also have correlation with
gender (-.167 & -.293).

❑ Because the correlations are negative, as one variable


goes up, the other goes down.

❑ Remember that we coded males as 1, females as 2. So,


the 1 is smaller.

❑ In this negative correlation, the smaller numbers are


associated with larger values.

❑ So basically, the males were taller and weighed more.

❑ p(.645) > .05 shows there is no significant relationship


between weight and gender.
Scatter Plot
❑ A scatter plot (aka scatter chart, scatter graph) uses dots to ❑ Click on Scatter/Dot in the gallery as shown below
represent values for two different numeric variables.
❑ The position of each dot on the horizontal and vertical axis indicates
values for an individual data point.
❑ Scatter plots are used to observe relationships between variables.
Next is how to make a picture of the correlation statistics.
❑ The picture is called a scatter plot and it is created using a tool called
the chart builder.
❑ To do this, go to Graphs -> Chart Builder as shown below

Now we see our nine


options. If you hover
your cursor above
them, SPSS will tell
you what they are.
❑ In this example, we want the first option: Simple Scatter.

❑ Click and drag it into the canvas as shown below.

❑ So lets use height to predict weight.

❑ Drag height to the x axis drop zone -> drop weight to the y axis drop

zone as shown below.

❑ That is all you have to do. Click OK.

❑ You will see that we now have three drop zones: one for the x axis,
one for the y axis, and one for filter.
❑ The output window shows the scatter plot of all 10 of the ❑ There are much more that could be done with
pairs of scores. correlation, for instance, we could format the scatter
plot in APA style.

❑ We could do other types of correlations as per research


requirements.

❑ We could even use some variables to predict other


variables using a technique called regression.

❑ We will learn more about scatter plot, simple


regression, and multiple regression in our next class.

❑ Correlations are about relationships between variables,


but we might also be interested in differences between
variables.

❑ In our next class we will learn how this can be done


using t-tests.
Class Attendance 课堂出勤
Please click on the link below to submit your class attendance.

https://forms.gle/SPizKfEhKFNGrbNh6

You might also like