SPSS BRM

Business Research Methods
SPSS/Statistical Component
DRAFT v0.10-0
Table of contents
Preface .................................................................................................................................................................................... 4
1 Introduction: SPSS........................................................................................................................................................... 5
1.1 SPSS case studies: A great way to learn! ................................................................................................................ 6
2 Introduction: Definitions................................................................................................................................................. 8
3 Introduction: Business “Reality” versus Experimental Quality ..................................................................................... 11
4 Survey Data ................................................................................................................................................................... 11
4.1 Coding Survey Data ............................................................................................................................................... 11
4.1.1 Codebook ...................................................................................................................................................... 11
4.1.2 Coding Open Text Response Items ............................................................................................................... 12
4.1.3 Coding Multiple Response (MR) Items ......................................................................................................... 13
4.1.4 Measure Types: Nominal, Ordinal, Scale ...................................................................................................... 15
4.1.5 Entering Data in SPSS: Variable View............................................................................................................ 16
4.1.6 Coding Missing Values .................................................................................................................................. 19
4.1.7 Entering Data in SPSS: Data View ................................................................................................................. 21
4.1.8 Multiple Response Items in SPSS .................................................................................................................. 21
4.1.8.1 Multiple Response: SPSS Variable Sets ..................................................................................................... 22
4.1.8.2 Multiple Response: SPSS Multiple Response (MR) Sets ........................................................................... 24
4.2 Analysis of Survey Data ......................................................................................................................................... 30
5 Statistical Experimentation Process: Overview ............................................................................................................ 32
6 Statistical Experimentation Process: Example .............................................................................................................. 33
7 SPSS: Selecting cases, and Splitting output .................................................................................................................. 34
7.1 Select Cases........................................................................................................................................................... 34
7.2 Split File ................................................................................................................................................................. 36
8 Statistical Summaries: Descriptive Statistics ................................................................................................................ 39
8.1 Frequency Tables .................................................................................................................................................. 39
8.2 Statistics ................................................................................................................................................................ 44
8.2.1 Middle/typical value: Mean, Median, Mode ................................................................................................ 46
8.2.2 Spread/variability/dispersion: Standard deviation, Quartile deviation, Percentile Ranges ......................... 46
8.2.3 Spread/variability/dispersion of a statistic! The key idea behind inferential statistics! .............................. 47
8.2.4 Meaningless Statistics: Just because you can, doesn’t mean you should! ................................................... 50
Page 1 of 102
8.3 Charts .................................................................................................................................................................... 51
8.3.1 Display or Comparison of Counts/Frequencies: Bar charts, Pie charts, Histogram ...................................... 51
8.3.2 Display or Comparison of Proportions from Counts/Frequencies: Pie charts, Percentage bar charts ........ 56
8.3.3 Display or comparison of (Scale) Values: Histogram, Line diagram, Error Bars ........................................... 56
8.3.4 Display of relationships between two values: xy-Scatter ............................................................................. 60
8.3.5 Editing Charts ................................................................................................................................................ 60
9 Statistical Tests: Inferential Statistics ........................................................................................................................... 62
9.1 Sampling Distributions: The Background to Hypothesis Testing & Inferential Statistics ..................................... 62
9.2 Hypotheses, Type I and Type II Errors, Power, Effect Size .................................................................................... 63
9.2.1 Type I Errors .................................................................................................................................................. 65
9.2.2 Type II Errors and Power............................................................................................................................... 66
9.2.3 Effect Size ...................................................................................................................................................... 67
9.3 Means ................................................................................................................................................................... 67
9.3.1 Central Limit Theorem and More on Sampling Distributions ....................................................................... 67
9.3.2 The t distribution and t tests ........................................................................................................................ 71
9.3.3 Single Mean: Mean that differs from given value ........................................................................................ 71
9.3.4 Two (Independent) Groups: Means that differ from each other ................................................................. 74
9.3.5 Paired (Dependent) Differences: Means that differ within subjects across 2 times .................................... 77
9.3.6 Three or more (Independent) Groups: Means that differ from each other with 3 or more groups: ANOVA
80
9.3.7 Three or more (Dependent) Groups: Means that differ from each other with 1 group across 3 or more
times: ANOVA ............................................................................................................................................................... 80
9.4 Proportions ........................................................................................................................................................... 80
9.4.1 Single Proportion: Proportion that differs from given value ........................................................................ 81
9.4.2 Two (Independent) Groups: Proportions that differ from each other ......................................................... 85
9.4.3 Paired (Dependent) Differences: Proportions that differ within subjects across 2 times ............................ 85
9.5 Crosstabs and Frequencies: Chi-Squared tests ..................................................................................................... 85
9.5.1 Chi squared independence testing ............................................................................................................... 86
9.5.2 Chi squared in Excel ...................................................................................................................................... 92
9.5.3 Chi-squared goodness of fit testing .............................................................................................................. 93
9.5.3.1 Normal distribution test ........................................................................................................................... 93
9.6 Relationships: Regression and Correlation: An overview ..................................................................................... 97
9.6.1 Residual Analysis ........................................................................................................................................... 98
9.6.2 Correlation Coefficient.................................................................................................................................. 98
9.6.3 Multivariate Regression ................................................................................................................................ 99
9.6.4 Coefficient of Determination: R 2 and Radjusted

2
........................................................................................... 99
9.6.5 Generalised Linear Models: GLM.................................................................................................................. 99
Page 2 of 102
10 Reading and Writing Statistical Results .................................................................................................................. 100
11 Further Study .......................................................................................................................................................... 100
12 References .............................................................................................................................................................. 101
13 Change History ........................................................................................................................................................ 102
Page 3 of 102
Preface
This document is a work in progress, as the “DRAFT” at the start indicates. Feel free to e-mail me if you spot any
mistakes or feel something could/should be improved upon. This document is targeted at business students taking an
SPSS/statistics component of some sort. Much of the detail will be useful for anyone starting off with SPSS/statistics,
who has successfully completed a previous course in basic statistics.
Where the results of a statistical analysis will possibly be of some critical importance than I’d suggest getting a
professional statistician involved!! This document is only an introduction to SPSS and statistics!!
Nov/Dec-2012
© Colm McGuinness
Page 4 of 102
Business Research Methods - SPSS/Statistical Component
1 Introduction: SPSS
SPSS is a powerful statistical software package, whose origins lie in the social sciences. It is capable of doing “simple”
summaries of data using summary statistics, tables and charts, or advanced general linear model and data mining
techniques.
This document was written with V20 in mind. Screen shots are generally from V20, with the odd one from V19 or from a
trial of V21.
There is A LOT of help available within SPSS. Most dialogs have a help button, with generally detailed information and
examples available behind these help buttons. There is also a Tutorial, which is available from the application’s opening
screen or from the Help/Tutorial menu path.
Here’s an example from the tutorial:
Various additional options. Hover mouse

over them to see what they do!
Menu path and step details are

shown here.
Main tutorial details/information appears here,

in the middle frame.
Click on these buttons to move

the tutorial forwards or
You can choose a topic from the left backwards a page. Be patient as I
hand frame if there is something in have sometimes found it can be a
particular you want to start with. little slow!
Various additional options. Hover mouse over them

to see what they do!
There are also Case Studies and a Statistics Coach, available from the Help menu. The Statistics Coach starting screen is:
Page 5 of 102
All of the help files you can

select. Each is additionally
available from an appropriate
menu path within SPSS, with
the middle three above being
available from the Help menu.
Good description of the sorts of things you might set out to

use statistics/SPSS for!! Bear these in mind when creating
surveys/questionnaires, rather than trying to “bolt on” an
analysis afterwards!!
Actual functionality available in SPSS depends on the license. Individual modules have potentially separate licensing
requirements, depending on the “base package” chosen. Some “base packages” come with a number of add-on
modules, and some don’t! If you notice options missing from menu paths or from particular dialogs, then it could be
down to a license restriction.
1.1 SPSS case studies: A great way to learn!

When you want to use a menu command that you are not fully familiar with, or just don’t fully recall, then a useful
approach is to access the command, and click on the help button therein: For example: Say you wanted to know about
“Linear Regression”. Open the menu item: Analyze/Regression/Linear … Click on the Help button. This in itself might tell
you all you want, but if you wanted to see a worked case study, then simply note which module of SPSS that the help
brings you to:
Page 6 of 102
If the immediate help (or the expanded tree of help under the option
you land on) isn’t sufficient, then note which module of SPSS that the
help brought you to above (Statistics Base here) … and use this to find a
corresponding case study (details below).
Navigate now to Help/Case Studies, and then into the case study section that corresponds to the SPSS module you
found above:
Page 7 of 102
Here … the case studies are divided up into modules, so expand the module
tree corresponding to the SPSS module found from the direct help above:
“Statistics Base” here.
Now within this you will generally be able to find a case study of the topic you
are interested in, assuming it is a high level (ie important) command!
You can then click through details of a step by step analysis, with
comments/conclusions, and follow/repeat the steps yourself to see exactly
how it works. Very useful …
2 Introduction: Definitions
There are lots of terms whose definitions we might benefit from knowing. I’ve included a couple of the main ones
below. In reality I often double check on Wikipedia if I am in doubt!! I have generally found the statistical information
there to be quite good/useful.
Terms that are of particular importance to know + understand prior to conducting statistical analyses are the following:
- Null hypothesis
- P-value
- Significance
- Alternative hypothesis
- Confidence interval
- Outlier
- Influence
- Normal
Term Description
Alternative Hypothesis (H1) See Null Hypothesis first:
This is the alternative statement to the “null hypothesis”, and is often (but not always!)
the one we are interested in proving likely/true.
Central Limit Theorem (CLT) This theorem asserts that means (or in general “sums”) calculated from samples of
sufficient size (n>30?) will be approximately normally distributed. This is VERY useful
Page 8 of 102
since it allows us assume normality, for example of the sample means, with no
restriction on, or even knowledge of, the population data distribution!
Confidence Interval A range of values within which we are confident at a certain specified level, that a
population parameter/statistic will lie, eg a 95% confidence interval for a population
mean consists of a lower and upper limit value, derived from a sample, and we can be
95% confident that the population mean will lie within this range1.
Exploratory Data Analysis This is when we investigate a set of data, in an ad-hoc fashion, in an attempt to gain
(EDA) insights about and from the data. Typically we might use statistics, tables and charts as
investigative tools. Might also be seen as part of inductive research, where we explore
the data to generate a hypothesis or theory: This could later be tested against a new set
of data, from a new experiment.
Hypothesis A testable statement, eg Drug X reduces total cholesterol by more than the current
standard treatment, drug Y.
Influence A data point is said to be influential if omitting it makes a significant difference to the
result(s). Clearly a result that is significantly changed by a change in one (or a small
number) of data point(s) is not likely to be reliable.
Normal A very commonly occurring distribution in statistics is the “normal distribution”. It is the
classic “bell curve”:
470 480 490 500 510 520 530
Figure: On the left is the histogram we might get with just 5 groups/bins, summarising a
relatively small amount of collected sample data. On the right is what we would get, if
the data were normally distributed, and we could gather sufficient data to “fill” many
bins. The normal curve shape is shown by the dashed line.
Many statistical tests assume normality either of the data, or of the distribution of the
statistic being tested, such as testing for differences in means. The central limit theorem
is very useful in this respect.
Null Hypothesis (H0) This is often the “nothing new going on here” hypothesis. It is often (but not always!) the
alternative hypothesis that we are actually most interested in. The null hypothesis is
often referred to in statistical packages as H0. See also p-value below.
Outlier A point that we believe or can measure to be “extreme” in some way, large or small.
Such points often require special attention as part of statistical analysis.
p-value Typically the probability that the null hypothesis, H0, is likely to be true, given a statistic
calculated from your data.
Reliability This term has various levels of definition, but two important levels of reliability are:
- Internal reliability: The agreement within a questionnaire, or set of data, of
points that address the same topic.
- External, or test-retest, reliability: The agreement of results across a repeat of
the entire experiment.
See http://en.wikipedia.org/wiki/Reliability_(statistics), for example.
Robust Robust statistics are statistical methods that are not heavily influenced by small
departures from statistical assumptions.
Sampling Distribution Just like data being distributed in some way, such as shown for the “normal” term above,
1
This is not quite technically correct, but it is a fairly decent working understanding.
Page 9 of 102
sample statistics also have distributions. For example we take a sample and calculate
some statistic, for example the mean (but also applies to any statistic!). We take a fresh
sample and calculate the statistic again, and then again, and again, etc. All of these
attempts will result in results that will differ somewhat. This is just a result of random
variation amongst the samples. If we were to plot a histogram of the calculated statistics
we would see the “sampling distribution” for that statistic.
Because of the central limit theorem (CLT) we know that the sampling distribution for
the mean is normal. It is also known that the sampling distribution of differences
between means is also normal. And many statistics have known distributions, all of
which is then used by statistical packages as part of statistical testing.
Significance Another term used to describe or interpret a p-value. Statistical significance (typically
seen from a low p-value, with “low” meaning below 0.05, for example) is separate from
practical significance! A new drug might be statistically significantly better than some
standard treatment, but if it is only marginally better, and is more expensive or has more
side effects, then it’s practical significance might be “not significant”.
Statistical significance seeks to determine where there is a result that is not explainable
by random chance alone.
Standard Error (SE) Statistics, such as the difference in two means, are subject to variation if we were to take
additional samples. This difference can be the result of just random variation between
samples. The standard error for a statistic is a measure of what we would expect to be
the variation for that statistic. It is in fact the standard deviation of the statistic.
In SPSS if you were to test for a difference in means from two independent samples, you
will see headings such as “Std. Error Difference” in the result output. This is the
“standard error of the difference”.
Validity Like reliability above, this term has various levels of definition, but at a high level
represents the agreement of experiment and “reality” (if we could access it!!). For
example, does a questionnaire designed to measure intelligence actually measure
“intelligence” or is it measuring something else? Intelligence tests can be culturally
biased, for example, and hence might not in fact be measuring “intelligence” but rather a
“cultural exposure”. Actually measuring “intelligence” is not simple, and may not even
be possible!! (IMHO)
See http://en.wikipedia.org/wiki/Validity_(statistics), for example.
Variance This is a statistic that measures the spread/variability within data. The standard
deviation is the square root of the variance.
Table 1: Important and commonly used statistical terms.
Page 10 of 102
3 Introduction: Business “Reality” versus Experimental Quality

In an ideal world, with no restrictions, we would probably always conduct “perfect” research experiments. This would
involve a randomised design of some sort; a sufficiently large sample from the population to be confident that it is
representative of the population; a sufficiently detailed set of questions/data so that validity could be assessed. And so
on. This approach represents the gold standard approach, and should NOT be forgotten!
In reality this gold standard is unlikely to be appropriate for many business applications where time + money are
important limiting factors. However, where compromise is applied it is important to fully appreciate that results
obtained may have flaws of some sort. In addition, while flawed research might randomly not harm a business today, it
might do next time!! Under the pressure of “getting results” (quickly/cheaply), it is important to know what weight to
place on such results, and what flaws might be present as a result of “getting results”!!
Pragmatism and compromise are important in life … but always be aware of what was lost or left out! Don’t substitute
pragmatic reality with actual reality!!
4 Survey Data
Details for questionnaire creation and sampling techniques are not included here as they are covered elsewhere.
4.1 Coding Survey Data

In order to carry out analyses on data, and to have consistent data entry, data are coded before entry into SPSS. For
example a survey question with responses of Male or Female might be coded as 0 for Male response, and 1 for Female
response. While it is not always essential to code data as numbers, it does have some advantages for some analyses. It
is also simply the common practice!
After your questionnaire is designed you should then decide on how every possible response, including “No response”
(ie missing values) will be coded.
4.1.1 Codebook
A codebook is a document that specifies details for each variable in SPSS; How these relate to questionnaire
question/response codes; How missing values will be handled/coded.
If there is a standard codebook available then this should be used so that your analyses can be compared with any
previous analyses. If you were to code Male as 0, and Female as 1, whereas the codebook coded Male as 1 and Female
as 0, this could cause confusion if terms from certain analyses were compared. It does depend on the analysis. For some
analysis it makes no difference! But it is better to stick to a standard coding, if one is available. It can save a lot of work
and re-working in the longer term.
An example entry from a codebook for the gender survey question might be:
Questionnaire Name Description SPSS Variable Responses Missing Value SPSS Measurement
type
Gender Gender of the gender 0 = Male 9 Nominal
respondent 1 = Female
SPSS Analyze/Reports/Codebook can be used to produce a codebook listing for selected variables. For example,
including the gender and level of education variables from the telco.sav sample file, and default output/statistics,
produces:
Page 11 of 102
gender
Value Count Percent
Position 10
Label Gender
Type Numeric
Standard Attributes
Format F4
Measurement Nominal
Role Input
0 Male 483 48.3%
Valid Values
1 Female 517 51.7%
ed
Value Count Percent
Position 7
Level of
Label
education
Standard Attributes Type Numeric
Format F4
Measurement Ordinal
Role Input
Did not
1 complete high 204 20.4%
school
High school
2 287 28.7%
degree
Valid Values
3 Some college 209 20.9%
4 College degree 234 23.4%
Post-
5 undergraduate 66 6.6%
degree
Coding responses with a fixed number of possible responses is relatively straight-forward. To code missing values,
where a respondent didn’t answer a particular question, choose a value that is impossible from the given range of valid
responses. For a Likert scale, coded as 0 to 4 (0 = Strongly disagree, 4 = Strongly agree), a missing response could be
coded as 9 or 99.
4.1.2 Coding Open Text Response Items

Where questions allow open text responses, or where there is an “Other: Please detail” type of response, one needs to
code these too! How these are coded depends on whether a single response is possible, or multiple responses. The
latter is dealt with separately in the next section. Here we will assume there is a single response allowed, but each
respondent’s single response can be different! For example:
Page 12 of 102
When you think of service quality, what is the first word that comes to mind?
___________________________________________________________________________________
These are coded by first making a list of all of the unique responses, and then coding them as normal. Say we had a total
of 5 unique responses as follows:
Word Coded as …
Fast 0
Friendly 1
Competent 2
Cheap 3
Focussed 4
Now we would re-review all of the responses and code them appropriately. So it is a two-pass process to code open text
responses.
4.1.3 Coding Multiple Response (MR) Items

There are several types of multiple response items possible. Four are listed below, along with examples. Others are
probably possible. These are just the ones that come to mind! Feel free to let me know if you have another type and I
will add it.
Description Example
Tick all that apply. How would you like to pay for the service?
Tick all that apply:
 Credit card
 Cash
 Cheque
 Paypal
Select single option for Thinking of the service that is provided, how would you rate the following:
each of multiple Circle your chosen response:
questions. Very Poor Very Good
Speed: 1 2 3 4 5 6 7
Value for money: 1 2 3 4 5 6 7
Usefulness: 1 2 3 4 5 6 7
Enter text with more What problems do you see with the service?
than one response
allowed; Answers have ______________________________________________________________
no order to them.
______________________________________________________________
______________________________________________________________
Enter text with more What are the top two improvements you would like to see made to the service?
than one response
allowed; Answers have 1. _____________________________________________________________
specific order to them.
2. _____________________________________________________________
Page 13 of 102
The initial coding of these involves creating a separate variable for each possible response, and treating these
individually like single response items. It is within SPSS that each of the above may be handled differently. This will be
dealt with in section 4.1.7 below.
Here’s how the sample questions from above could be coded:
Example Coded as …
How would you like to pay for the service? Create a variable for each possible response, so:
Tick all that apply: SPSS Measure
 Credit card Name Variable Responses type
 Cash Credit payCC 0 = No Nominal
 Cheque card 1 = Yes
 Paypal Cash payCash 0 = No Nominal
1 = Yes
Cheque payCheque 0 = No Nominal
1 = Yes
Paypal payPayPal 0 = No Nominal
1 = Yes
Thinking of the service that is provided, how would you rate Create a variable for each possible response, so:
the following: SPSS Measure
Circle your chosen response: Name Variable Responses type
Very Poor Very Good Service servSpeed 1 to 7 Ordinal
Speed: 1 2 3 4 5 6 7 Speed
Value for money: 1 2 3 4 5 6 7 Service servValue 1 to 7 Ordinal
Usefulness: 1 2 3 4 5 6 7 Value
Service servUseful 1 to 7 Ordinal
Usefulness
If appropriate missing could be coded as 9 for each
variable.
What problems do you see with the service? Review all responses, and create a variable list for each of
the unique themes presented across all responses. Create
______________________________________________ sufficient variables to hold respondent’s maximum number
of themes/unique-responses … or limit to your opinion of
______________________________________________ their top three! Now code each theme as the following
sample response themes show:
______________________________________________ Respons Measure
Name SPSS Variable es type
Too costly servCostly 0 = Not Nominal
present
1=
Theme
present
Too slow servSlow 0 = Not Nominal
present
1=
Theme
present
Not servNotFocussed 0 = Not Nominal
focussed present
1=
Theme
present
It might be appropriate to code the responses in a scale:
For example from 0 = not present, to 1 = mildly present, 2
Page 14 of 102
= strongly present, 3 = very strongly present.
This approach works best if there aren’t too many themes!

Or if the top 10 (say) themes would suffice to be included.
If there are lots of themes then creating a variable for each

could be crazy, in which case you could review all
responses and give each response a code. Then review
responses again and code as follows:
SPSS Measure
Name Variable Responses type
Theme1 servTheme1 Response Nominal
code
code
code
Where the “response code” is the code allocated from the
initial review of responses. Here up to three themes per
respondent have been allowed, but obviously you can have
more or less depending on the situation.
The value labels for the servTheme variables in SPSS would

be important as servTheme doesn’t mean a whole lot!
What are the top two improvements you would like to see This is coded similar to the second set of details above for
made to the service? the last open text response, however cognisance must be
paid in the analysis to the fact that the responses have a
1. ________________________________________ specific order.
Measure
2. ________________________________________ Name SPSS Variable Responses type
Improve1 servImprove1 Response Nominal
code
Improve2 servImprove2 Response Nominal
code
This type of question could be coded using the first set of
details from the last open text response above, but then
the coding would have to indicate both presence of an
improvement theme, and it’s position (first or second). This
is awkward.
We will return to these samples in section 4.1.7 below.
4.1.4 Measure Types: Nominal, Ordinal, Scale

In the next section details on creating variables in SPSS will be discussed. Before we can do this we first need to review
the important topic of measure types.
Measure Type Explanation

Nominal Variables that are nominal have no size, magnitude or natural order to them. For example
male/female, or red/pink/blue/white.
Ordinal Variables that are ordinal have a natural/size order to them, but the “gap” between values may be
meaningless. For example small, medium and large. While these have a natural smallest to largest
order, we cannot measure the size of the gaps between values.
Page 15 of 102
Scale Variables that are scale are numeric, and the difference between values can be determined. There
are two types of scale data: Interval and Ratio, but discussion on this is beyond the scope of this
document.
NOTE: Do not confuse the coded values for the actual values. Coded values will typically be numeric, but this does not
then make the variable a scale variable!! A variable’s “measure type” is associated with the original uncoded data for
that variable.
4.1.5 Entering Data in SPSS: Variable View

Now that we have decided on all of our variable names, and measure types, and coding we are ready to enter data into
SPSS. Open SPSS, and select the “Type in data” option:
You will be presented with something like:
Page 16 of 102
The two tabs at the bottom, “Data View” and “Variable View” give you access to the associated area within SPSS. First
we need to setup variables, so click on the “Variable View” tab to get something like the following view. The window
width has extended so that all fields are visible, and the image has been reduced in size to make it fit on the page here:
The fields have the following uses/meanings:
Field Use/Meaning
Name SPSS variable name; Should contain no spaces, and begin with a letter. Can contain letters and numbers.
Type Tells SPSS what type of data you will be entering for this variable. Numeric; Date or String (ie text) are
probably the most common types. You can also set the “Width” and decimal places here.
Page 17 of 102
Width Total number of digits to allow internally for a numeric. Total number of characters for a string/text
variable.
Decimals Out of “Width” how many digits to allow for decimals.
Label User friendly description of the SPSS variable. This will be displayed, rather than the name, for most output.
Values Allowed/valid code values, along with a user friendly description for each.
Missing How missing values will be entered, if any. See missing value section 4.1.6 below later.
Columns Total display width in SPSS data viewer. Has no effect on analysis.
Align Display alignment in SPSS data viewer. Has no effect on analysis.
Measure Scale, Ordinal or Nominal … See section 4.1.4 above.
Role How the variable is to be treated within some analysis. Values possible are Input, Target, Both, None,
Partition or Split. For some analyses a variable is an Input, whereas in others it is a Target meaning it is
what you want other variables in a model to attempt to predict in some way. We’ll be sticking with “Input”
here.
Here’s what the variable view screen looks like after the first six of the multiple response variables from section 4.1.3
above have been entered:
And so on for any other variables. I have only entered some of the MR variables here. Analyze/Reports/Codebook
output for servSpeed and servImprove1, after deselecting the Statistics (since these can’t be generated yet since there is
no data!!) produces:
servSpeed
Value
Position 4
Label Service Speed
Type Numeric
Standard Attributes
Format F4
Measurement Ordinal
Role Input
1 Very poor
2 Poor
Valid Values
3 Slightly poor
4 Neutral
Page 18 of 102
5 Slightly good
6 Good
7 Very good
servImprove1
Value
Position 7
Service
Label
Improvement 1
Format F4
Measurement Nominal
Role Input
1 Speed up
Train staff
2
technically
3 Charge less
Valid Values 4 Hire more staff
More flexible
5
service delivery
Staff more
6
courteous
Once all variables are defined in SPSS then you are ready to enter response data.
4.1.6 Coding Missing Values

It is important that as much information about your survey results gets used as is possible, and this includes information
on where a person didn’t complete some question(s), ie “missing values”. SPSS allows you to define how you will code
your missing values via the “Variable view”, from the “Missing” column, eg The following missing values dialog codes for
two missing values to be used: 99 and 88. The values used, and number of missing values depends on your data.
Page 19 of 102
Whatever missing values you use must not be possible legitimately as responses to questions‼ So if you had a “tick all
that apply” section, with 12 responses, coded 1, 2, …, 12, then you could NOT use 9 as a missing value! Typically 99
would be used here, but any value that is not possible as part of the main response coding can be used.
There are two missing values shown above, 99 and 88, but for many situations just one will suffice.
There are different types of missing values possible. There are questions that someone chooses not to respond to, even
though they “should have” from your point of view, eg someone leaves a gender: Male/Female box unticked. In
addition to this type of missing there are questions which do not apply, which are not “missing”, but are legitimately left
empty as the questions didn’t apply. For example if someone ticks “yes” in question 1 in the following survey extract,
then they will legitimately skip question 2, ie it is not “missing” as such in the same sense that an empty question 1
would be “missing”:
1. Have you ever owned an iPhone?

 Yes: Proceed to question 3
 No: Proceed to question 2
2. Would you consider buying an iPhone?
 Yes: Proceed to question 3
 No: Proceed to question 2
3. And so on …
All of this should ideally be coded into SPSS. Sometimes analysing the missing data can be as informative as the main
data‼
Where you have an item, or block of items (ie multiple response, such as “tick all that apply”) that someone should have
answered but they didn’t then I would suggest coding this as a 9/99/999/etc missing value. I say “I would suggest”
because coding is a subjective task, and can be done other ways, depending on your planned usage of the data. In fact
the planned usage is really key to both creating the survey itself, and coding and entering the data. The planned usage
should come first, and the survey design, coding and data analysis come to match the need of the task at hand!
If you have an item, or block of items, that the interviewee did not need to complete for some reason, such as in the
iPhone example above, then this is not “missing” in the sense that they didn’t do something that they should have! It is
missing in a way that you would have expected … I would suggest to code this differently to the scenario in the last
paragraph, and my suggestion would be to leave it empty, which is called “system missing”, or you could code it as
another missing value, such as 8/88/888/etc. Whatever way you do it, ideally there should be some way to differentiate
between the types of “missing”, so that you can potentially use this information later.
Finally … if you have a multiple response, eg “tick all that apply”, and someone ticks one item, then you do not code the
other items as “missing”. For example on a “tick all that apply” question if they tick one box, then they mean “yes” for
this one, so not ticking the others means “no” for those, and you would code it as such. Another scenario would be
where you present someone with 15 options, and you want their top 3, I would suggest NOT to code the unmarked 12
as being “missing” using 9/99/999/etc. but would probably again use system missing, ie blank, as they are legitimately
“missing”. Or else I would create an additional missing category, eg 7/77/777/etc, and use that.
The above are my broad suggestions, which cover most typical situations, but sometimes the above might not suit, and
you’ll have to think through what it is you are trying to achieve and what coding system might facilitate that. You might
start coding things one way, but end up later recoding things differently for some reason that only became apparent
when you had the actual data to work with! This is to be avoided, as it is costly from a time point of view, but it does
happen‼ A good pilot survey will often help avoid this, and save time later.
SPSS has menu options to help with mass changes, such as the Transform/Recode …. menu options.
Page 20 of 102
4.1.7 Entering Data in SPSS: Data View

Selecting the “Data View” tab, after defining all of the MR variables above, and adding in a “Frequency of use”, Gender,
and “Age in years” variables (just as extra examples) produces:
You now enter your respondent’s data across each row. A row is called a “case” in SPSS. Note that SPSS will allow you
type invalid values, such as codes outside of the range of specified “values” for a variable!! Spotting any such
typos/errors is something that is important to do before a full analysis of the data. The old adage of: “Garbage in,
garbage out” applies!! The Analyze/Descriptive Statistics/Frequencies command is one way to attempt to determine any
errant values as it will list all values with their correspond frequency, so it is easy to spot a value that shouldn’t be there!
For large data sets a lot of time may be spent on data cleaning; Looking for any values that stand out in any unusual
way. This can effectively be a whole analysis in its own right!! It is clearly VERY important, since mistakes in the data
entered might completely change the results from any analysis … In fact this is another way to track down possibly
errant values: Look for individual values that strongly influence the results of any analysis, such as outliers.
The “Variable View + MRSs – Initial” and “Variable View + MRSs – Final” datasets, complete with data are available from
the web site at bbm.colmmcguinness.org/live/AdvancedStats.html. The initial version just has variables and data, with
no MR sets/variables created. The final version has the MR sets/variables created.
4.1.8 Multiple Response Items in SPSS

All of the analysis that is possible for any single variable can also be applied to the “single variables” that make up the
various types of MR items. However, SPSS has some additional features that summarise or group specifically MR
variables.
These are:
Page 21 of 102
This menu path allows you to define “variable sets”. Sets defined here do something completely different to
This essentially just gives a group name to the variables variable sets defined opposite!
included in the set. You can then access the Here the command effectively summaries the original set
Analyze/Multiple Response/Frequencies and variables into a new variable, which is the MR set name. The
Analyze/Multiple Response/Crosstabs commands for entries in this variable will be summaries of the individual
this new group/set name. All output will contain the set variables included in the set!!
variables in a block.
For cross comparison of responses from a MR question MR sets are available for some general commands such as
the blocking makes life easier. Chartbuilder.
Variable sets defined in this way are not available for This command has two somewhat different ways of
general commands, such as Chartbuilder. functioning, of which details will be given below.
Rather confusingly SPSS refers in the menu path to

“variable sets” but in the associated dialog to Multiple
Response Sets!! For my own clarity here I will refer to
these sets as variable sets, and the sets created via the
menu path from the opposite column as MR sets … but
that is purely my invention.
4.1.8.1 Multiple Response: SPSS Variable Sets

One of the case studies in SPSS for MR variable sets uses the telco.sav data file, so we will use that here.
The variables from multline through to ebill specify whether the respondent has the associated service from the
Telecoms company. It would be useful to see output where these are grouped, so we access Analyze/Multiple
Response/Define Variable Sets to get (after some typing + clicking!):
Page 22 of 102
1. Move all of the variables

you want in the set into here.
4. When every else is done, click “Add”

to create the MR variable set. To edit an
existing variable set, click on its name in
the list, make changes, and then click
“Change”.
2. Decide on what you want to be “counted”, ie included in the

variable set. The individual variables are all dichotomous here,
with 0=No, 1=Yes, so having a “counted value” of 1 means that
only those cases where there is at least one Yes response will
be included in the variable set. Could put 0 here if appropriate.
The categories option just allows you to specify a range of

categories to be included if you have categorical responses.
3. MR variable set name, and Note the “Note” here, explaining that these
label. sets are only available for two commands.
Having created the MRServiceSet variable (which SPSS will stick a $ in front of, just to differentiate it from a “real”
variable), now access the Analyze/Multiple Response/Frequencies command. Select your $MRServiceSet variable set,
and click OK to get:
Case Summary
Cases
Valid Missing Total
N Percent N Percent N Percent
$MRServiceSeta 889 88.9% 111 11.1% 1000 100.0%
a. Dichotomy group tabulated at value 1.
$MRServiceSet Frequencies
Responses Percent of
N Percent Cases
Multiple lines 475 12.7% 53.4%
Voice mail 304 8.1% 34.2%

MR Service Seta
Paging service 261 7.0% 29.4%
Internet 368 9.8% 41.4%
Page 23 of 102
Caller ID 481 12.9% 54.1%
Call waiting 485 13.0% 54.6%
Call forwarding 493 13.2% 55.5%
3-way calling 502 13.4% 56.5%
Electronic billing 371 9.9% 41.7%

Total 3740 100.0% 420.7%
a. Dichotomy group tabulated at value 1.
You now have a “nice” grouped output summarising the frequencies, etc., for the individual variables in the set.
Looking at the Analyze/Descriptive Statistics/Frequencies for multline and voice variables:
Multiple lines
Frequency Percent Valid Percent Cumulative

Percent
No 525 52.5 52.5 52.5
Valid Yes 475 47.5 47.5 100.0
Total 1000 100.0 100.0
Voice mail

Percent
No 696 69.6 69.6 69.6
Valid Yes 304 30.4 30.4 100.0
Total 1000 100.0 100.0
It is clear how only the Yes’s have been counted in the MRServiceSet. The 3740 is the total number of all Yes’s, across all
services. The Responses/Percent is the percentage of Yes’s out of this total number of Yes’s. The “Percent of cases” is
similar to the Percent column for the individual variable output, but now with all of those with no Yes’s for any service
eliminated!! Hence the Yes percentages are higher than for those in the individual variable output. If you check (give it a
go!) you will find that there are 111 cases with no service, ie no Yes’s.
4.1.8.2 Multiple Response: SPSS Multiple Response (MR) Sets

As mentioned previously the Data/Define Multiple Response Sets command works completely differently to the above
MR variable set command, although the initial dialog looks almost identical! In addition it has two quite different modes
of operation: The only new thing in the dialog box here (below) is the “Category Labels Source”, which is described in
the textbox in the dialog:
Page 24 of 102
The overall dialog here will create a new variable (MRYesSet) that will
contain only summary/count entries for values matching the
“counted value”. These values can be either labelled by their original
variable name (the default, as selected here) or using the labels of the
values selected in the “counted value” (above), which would be “Yes”
here.
Click the “Add” button to create the MR Set, and then OK to complete the task.
Next use Chartbuilder to create a pie chart of $MRYesSet, and after some label additions (using the chart editor) you
should get:
Page 25 of 102
The Analyze/Reports/Codebook command for the $MRYesSet variable produces:
$MRYesSet
Value Count Percent
MR Set with
Label
Yes's
Standard Attributes
Multiple
Type
Dichotomy Set
callid Caller ID 481 48.1%
callwait Call waiting 485 48.5%
confer 3-way calling 502 50.2%
ebill Electronic billing 371 37.1%

Multiple Response
forward Call forwarding 493 49.3%
Categories
internet Internet 368 36.8%
multline Multiple lines 475 47.5%
pager Paging service 261 26.1%
voice Voice mail 304 30.4%
Can you see how the MR Set has “collapsed”/summarised all of the Yes’s down into this new variable called $MRYesSet?
Labels are the original variable names (which is what we selected). In fact it isn’t “collapsed” but is now a new variable
Page 26 of 102
with 3740 entries, 481 of which are “Caller ID”, 485 of which are “Call waiting”, etc. This is clearly quite a different
summary compared to what happened in the last section!
The “Counted Value” can be any valid value, and only these matching values will be included. The set is dichotomous in
the sense that the counted value items are included and everything else isn’t!! So this dialog can be used to summarise
a set of variables several times, once for each of their individual categories. Say a set of MR variables had responses
from 1 (Very Poor) through to 7 (Very Good), then could create 7 separate MR Sets, one matching each category
response. How this is used really comes down to what you want to produce for your reporting, or what particular idea
you are trying to investigate. As mentioned before, always start with the goals/ideas and then fit SPSS to that! Don’t
start with some fancy output from SPSS and try and force it into your work!!
Next to the second type of functionality available from Data/Define Multiple Response Set: But first we need another
data set to make the new type of summary meaningful. Open the “Variable View + MRSs – Initial” data set (available
from web site???) …
Access Data/Define Multiple Response Set as follows:
Selecting “Categories” here tells SPSS to count the categories across all included
variables. Unlike dichotomies which summaries/works within a variable, categories
summarises/works across variables … It is thus useful when there is an ordering to the
separate but related responses. Here servImprove1 is the respondent’s top suggestion,
and servImprove2 is their second suggestion.
The other textboxes and buttons work as before. Click Add to create the set, and then OK
to exit the dialog.
Access the Analyze/Reports/Codebook for this new variable produces:
Page 27 of 102
$MRImprove
Value Count Percent
Standard Attributes Label MR

Improvement
Suggestion
Type Multiple
Category Set
Multiple Response 1 Speed up 54 21.6%
Categories 2 Train staff 92 36.8%
technically
3 Charge less 90 36.0%
4 Hire more staff 95 38.0%
5 More flexible 91 36.4%

service
delivery
6 Staff more 55 22.0%

courteous
99 23 9.2%
Page 28 of 102
And for comparison, to work out what has happened here, here is the output from the Analyze/Descriptive
Statistics/Frequencies command for the two servImprove variables:
Service Improvement 1
Cumulative
Frequency Percent Valid Percent Percent
Valid Speed up 31 12.4 12.4 12.4
Train staff technically 44 17.6 17.6 30.0
Charge less 53 21.2 21.2 51.2
Hire more staff 48 19.2 19.2 70.4
More flexible service 44 17.6 17.6 88.0

delivery
Staff more courteous 30 12.0 12.0 100.0
Total 250 100.0 100.0
Service Improvement 2
Cumulative
Frequency Percent Valid Percent Percent
Valid Speed up 23 9.2 10.1 10.1
Train staff technically 48 19.2 21.1 31.3
Charge less 37 14.8 16.3 47.6
Hire more staff 47 18.8 20.7 68.3
More flexible service 47 18.8 20.7 89.0

delivery
Staff more courteous 25 10.0 11.0 100.0
Total 227 90.8 100.0

Missing 99 23 9.2
Total 250 100.0
Taking “Speed up” as an example there are a total, across both variables of 31+23 = 54 respondents who included this
response, and this gives the first entry for the $MRImprove variable. The $MRImprove “Percent” column seems odd to
me, as the percentages are out of the number of original cases (ie 250 here). They add up to 200%.
Using Chartbuilder to create a pie of the $MRImprove variable, after some labelling changes gives:
Page 29 of 102
Hopefully you can see that this is quite a different type of summary compared to the way the dichotomy option worked.
It is worth repeating the point from above: How this is used really comes down to what you want to produce for your
reporting, or what particular idea you are trying to investigate. As mentioned before, always start with the goals/ideas
and then fit SPSS to that! Don’t start with some fancy output from SPSS and try and force it into your work!!
4.2 Analysis of Survey Data

One of the main deciders on what might be done with survey data is what the purpose of the survey is! Sometimes a
summary and presentation of the survey results will be sufficient, sometimes it could be a search for or check of a
relationship, and other times inferences about the population will be key.
While lots of analysis and charts might all be very interesting, it is crucial to remain focussed on the research
agenda/hypotheses, unless this is explicitly an exploratory data analysis.
For an exploratory data analysis (EDA) almost anything goes, and every tool might be used to gain greatest insight into
any information and patterns within the data.
For any data analysis it is an excellent idea to first investigate what others have done in a similar context. Try not to
waste time reinventing the wheel!! There are almost always going to be existing theses, journal articles, books, etc. that
will be useful for ideas and methods.
Action How
Summarize/describe data  Frequencies, eg Simple or grouped frequency distributions.
 Charts, eg pie chart to see proportions for ordinal or nominal data, or a
histogram to see the distribution for scale data.
Page 30 of 102
 Statistics, eg mean+SD or median+QD to see the middle/typical value and

spread. Must have a measure of middle and spread, as neither alone is
sufficient to precisely summarise a data distribution.
 Determine the likely distribution for data, eg using a QQ chart, or a goodness
of fit test.
Look for relationships between  Correlation – Tells us the strength of any relationship.
two or more variables  Regression – Tells us the nature of any relationship.
Draw inferences about  Confidence intervals, eg infer, from a sample, a range of values that is likely
population from sample to contain a population’s mean.
o CI for population mean
o CI for population proportion
o CI for correlation coefficient
o CI for regression coefficients
 Hypothesis testing
o Mean = specific value
o Means differ across two (or more) times
o Means differ between (independent) groups
o Proportion = specific value
o Proportions differ across two (or more) times
o Proportions differ between (independent) groups
o Correlation coefficient is zero (ie there is no relationship)
o Regression coefficients are zero (ie there is no regression
relationship)
Page 31 of 102
5 Statistical Experimentation Process: Overview

Business/Personal Need
What information might Exploratory research versus a specific hypothesis (or two/three!)? Inductive versus
address the need? deductive research.
For an experiment to be generalizable from a sample to a population we need to

be confident that the sample is representative of the population, and unbiased.
Typically this is achieved via some level of randomness in the sampling process.
Design an "experiment"
to generate the Where randomness is not feasible/possible it is best to advise of any
information consequences in your later reporting/conclusions.
Any statistical tests required later to prove a hypothesis should be decided on

here, so that what is required is actually confirmed as possible up front!
Collect data
Checking the data can involve double entry, for example. Also checking that all
values entered are “legal”, conforming to the codebook. Also checking for outlier,
missing or unusual values. Checks can be graphical and/or tabular (or whatever you
can think of!). Work here can save you a LOT of time later! “A stitch in time … !”
Exploring the data, which can be an end in itself, means using tables, charts and
descriptive statistics to see patterns or information within the data. Possibly with a
Check + explore data view to constructing and testing a more formal hypothesis with a later experiment:
An inductive approach.
Note: It is bad practice to directly reuse explored data to “prove” a hypothesis! If

you “search”/explore any data for long enough you will likely find a “pattern”. Retro-
fitting a hypothesis to this makes for VERY poor research/result quality/design.
Proving a hypothesis inspired by exploration should be done with a new experiment,
and new data.
Any pre-statistical test Some statistical tests have assumptions, such as “data are normally distributed” or
checks? “all cells have frequency of 5 or higher”. These must be checked and reported on.
Carry out statistical tests
Some statistical tests have assumptions that can only be checked after the statistical
Any post-statistical test model is created, such as “residuals should be normally distributed” or “residuals
checks?
should have constant variance”. These must be checked and reported on.
Results, Analysis + Be careful, honest, and thorough!

Conclusions
Page 32 of 102
6 Statistical Experimentation Process: Example

Some day there will be an example here!! ;-)
The plan is to follow thru an actual example, complete with tests and SPSS output.
Might be better to follow thru an actual questionnaire example, rather than the statistical process example … or is this
section needed at all?? Could be overkill …
???
Page 33 of 102
7 SPSS: Selecting cases, and Splitting output

It is helpful to cover the SPSS commands “Select Cases” and “Split File” before proceeding to the statistical summaries
section (below), and beyond. The examples here use the telco.sav sample data file.
7.1 Select Cases
Use this command to select the cases that will be included for all further analysis/output, until a subsequent Select
Cases is executed.
Page 34 of 102
The first four Select options perform the obvious/stated task. The “Use filter variable” uses a variable uses values should
be 0’s and 1’s only. The 0’s are not selected, and the 1’s are selected. This allows you to create complicated selections
based on variables calculated using the Transform/Compute Variable command.
The output options also perform the obvious/stated tasks.
If you are using the Select Cases command then it is important to keep track for each analysis/chart output as to what is
currently included/excluded!!
Page 35 of 102
7.2 Split File
With a Split File command you can make all subsequent commands apply to sub-groupings within your data. So a single
Analyze/Descriptive Statistics/Frequencies command will in fact be applied to each sub-group based on the Split File
details currently active. As with the Select Cases command it is important to keep track for each analysis/chart output as
to what is the current split, if any!
Page 36 of 102
The Compare Groups option applies subsequent commands to each sub-grouping within the active split file. For
example applying the following Split File to the telco.sav data, and following this with by creating a pie chart of “level of
education” produces four pie charts: One for the level of education for Male, Unmarried, another for Male, Married,
another for Female, Unmarried and a fourth for Female, Married.
This is clearly very useful for drilling into sub-group information, relevant to your analysis.
With the Compare groups option, the following mean values are generated from the Analyze/Descriptive
Statistics/Descriptives:
Page 37 of 102
Whereas using the Organise output by groups active, for the same Descriptives command produces:
Page 38 of 102
8 Statistical Summaries: Descriptive Statistics
8.1 Frequency Tables
Frequency tables are often a very useful way to get a summary for nominal and ordinal data, with the
corresponding charts being bar and pie charts. For scale data it is usually best to group the data, eg using
Transform/Visual Binning, before creating a frequency table, with a histogram being the visual equivalent chart.
These charts are all available as options within the Frequencies dialog.
Tables can be produced with more levels of sub-grouping detail using the Analyze/Descriptive Statistics/Explore
command, or by using the Data/Select Cases command.
For nominal and ordinal data Analyze/Descriptive Statistics/Crosstabs can produce contingency tables of counts,
and carry out subsequent tests for independence (ie chi-squared tests). Even more sub-grouping details are
possible here using the “Layers” option.
Using the telco.sav data, here’s a Frequencies summary of the levels of education of the cases, with a pie chart
(from the Frequencies/Charts button) thrown in:
Statistics
Level of education
Valid 1000
N
Missing 0
Level of education

Percent
Did not complete high

204 20.4 20.4 20.4
school
High school degree 287 28.7 28.7 49.1
Valid Some college 209 20.9 20.9 70.0
College degree 234 23.4 23.4 93.4
Post-undergraduate degree 66 6.6 6.6 100.0
Total 1000 100.0 100.0
Page 39 of 102
Next let’s see how level of education breaks down by gender or retired status, using the Analyze/Descriptive
Statistics/Crosstabs command as follows:
Page 40 of 102
Accepting the default statistics, etc., produces:
Case Processing Summary
Cases
Valid Missing Total
Level of education * Gender 1000 100.0% 0 0.0% 1000 100.0%

Level of education * Retired 1000 100.0% 0 0.0% 1000 100.0%
Level of education * Gender Crosstabulation

Count
Gender Total
Male Female

107 97 204
school
High school degree 134 153 287

Level of education
Some college 90 119 209
College degree 119 115 234
Post-undergraduate degree 33 33 66
Total 483 517 1000
Level of education * Retired Crosstabulation

Count
Retired Total
No Yes

188 16 204
school

Level of education
Total 953 47 1000
We are here getting a count for each level of education, with details for those that are Male, or Female and
separately details for those that are Retired or not.
Say we wanted to get level of education information, by gender AND marital status, then we use the Layers box:
Page 41 of 102
Which will produce output:
Case Processing Summary
Cases
Valid Missing Total
Level of education * Gender

1000 100.0% 0 0.0% 1000 100.0%
* Marital status
Level of education * Gender * Marital status Crosstabulation

Count
Marital status Gender Total
Male Female

49 48 97
school

Level of education
Unmarried Some college 50 63 113
Total 246 259 505

Married Level of education 58 49 107
school
Page 42 of 102

Total 237 258 495
107 97 204
school

Level of education
Total Some college 90 119 209
Total 483 517 1000
This allows you to drill right down into the data, looking for any interesting “interactions” or patterns. As always
don’t get lost in the analyses … Start with your questions, and then produce analyses that attempt to address
these questions!
You could add more layers, to get an even finer level of detail. For example if I added Retired as Layer 2, I’d get
output as followings (only partial output shown here):
Level of education * Gender * Marital status * Retired Crosstabulation

Count
Retired Marital status Gender Total
Male Female

45 43 88
school

Level of education
Unmarried Some college 50 58 108
Total 234 239 473

55 45 100
school
No High school degree 60 73 133

Level of education
Married Some college 40 55 95
Total 232 248 480

100 88 188
school
Total Level of education High school degree 126 141 267
Page 43 of 102
Total 466 487 953

4 5 9
school
Level of education High school degree 8 7 15
Unmarried
Some college 0 5 5
Total 12 20 32
3 4 7
school
Level of education High school degree 0 5 5
Married
Yes Some college 0 1 1
Total 5 10 15
7 9 16
school
Level of education
Total Some college 0 6 6
Total 17 30 47
There is loads more you can do within the various Analyze/Descriptive Statistics menu options. Detailing them
would take too long … it is much better to open the telco.sav (or other) sample file for yourself and try out some
of the commands. See what insights you can gain into profiling who was surveyed, and in particular who churns2
and who doesn’t!
More details will be given in sections that follow, but taken starting with the stats, rather than the menu options.
8.2 Statistics
A statistic is just any result calculated from a set of data. The maximum, minimum, total are all examples of
statistics. So “statistics” need not be all that fancy or technically complex … although they can be!!
In order to summarise data two key types of statistics are required: A measure of middle, and a measure of
spread. Either one of these alone is really quite a poor summary as the following examples will demonstrate.
There are many further statistics possible, but these two types are fundamental.
Example 8.2.1: Why a measure of middle/centre alone is not enough!
Say I take a sample of 100 cartons each from two breakfast cereal production lines. I find the mean weights of
cereal to be identical at 499.5g. Does this tell me that the production lines are operating identically? Well not if
the following are the associated distributions for the sampled data:
2
Churning is the term used to describe customers switching service providers.
Page 44 of 102
While these two distributions of the 100 sample values have the same middle/mean at 499.5g, it is clear that
the two distributions are NOT the same! And consequently the two production lines are not operating the same.
The x-axis scales have been set to be the same on both charts. Here this highlights that the production line 2
data are far more spread out. There is a lot more variability between cartons of cereal on production line 2
compared to production line 1. Thus a measure of spread is needed to reasonably summarise the data.
Example 8.2.2: Why a measure of spread alone is not enough!
Take a similar situation to the above example. We take a sample of 100 cartons and only measure
spread/variability using the standard deviation (or any measure of spread), without a measure of middle/centre.
Say we find that the standard deviation is 4.598g. Does this (alone) mean that the production lines are operating
similarly? Look again at these distributions:
This time, while these two distributions of the 100 sample values have the same spread as measured by the
standard deviation of 4.598gg, it is clear that the two distributions are NOT the same! And consequently the two
production lines are not operating the same. The x-axis scales have been set to be the same on both charts.
Here this highlights that the production line 4 is under-filling cartons (assuming the average should be 500g).
Thus a measure of centre/middle is needed to reasonably summarise the data!
Page 45 of 102
8.2.1 Middle/typical value: Mean, Median, Mode

The main purpose of a measure of middle, sometimes called centre of location, is to summarise where the centre of a
set of data is. This then gives what will hopefully be a typical/representative value for the data. As with any summary
there is more than one approach as detailed in the following:
Statistic What it measures/Advantage/Disadvantage Applies to

Arithmetic centre of the data. Scale data
Mean Takes all data into account, but can be biased by Ordinal data, only if it makes some sense!! See
extreme values. section 8.2.4 below.
Physical centre item, from size ordered list. Scale data – Doable with raw scale data, but might
Median Not biased by extreme values, but ignores all but make more sense with data in groups/bins.
the centre item(s). Ordinal data
Most common item (could be none, one or more!) Scale data – Might make most sense for scale data
Not biased by extreme values, but not generally put into groups/bins.
Mode used for advanced stats … can be awkward since it Ordinal data
might not exist, or there could be more than one Nominal data
mode!!
The mean is probably the single most commonly used statistic. It is a key statistic when dealing with normally
distributed data as the population mean and population standard deviation fully determine the associated normal
distribution. The mean gives the centre of a normal distribution, and the standard deviation relates to the spread of the
distribution.
You can calculate the mean/median/mode for a variable from various menu options under the Analyze/Descriptive
Statistics menu path.
Since the mean is “biased by extreme values” another mean you will see at times in SPSS is the “5% Trimmed Mean”.
This is the mean that results after first eliminating the top and bottom 2.5% of data. If this differs from the “mean” then
it is indicating that there are some extreme and biasing values present. Other trimmed means are possible such as a
10% trimmed mean, but I haven’t seen this available in SPSS.
The mean would be commonly associated with parametric statistics, whereas the median is more common when non-
parametric statistics are appropriate. The simplest example would be using the mean with (approximately) normally
distributed data, but using the median if the data are clearly not normally distributed. Note that this is not saying that
the mean isn’t useable with non-normal data, just that it is common with normal data, and not quite as common with
non-normal data.
8.2.2 Spread/variability/dispersion: Standard deviation, Quartile deviation, Percentile Ranges

The main purpose of a measure of spread/dispersion, sometimes called measure of scale, is to summarise the overall
spread of data from the selected measure of centre. Once one has decided on the appropriate measure of centre, then
there are “natural” partners for the measure of spread as follows:
Statistic Natural Partner What it Applies to

measures/Advantage/Disadvantage
Average distance of data points from the Scale data
Standard
mean. Ordinal data, only if it makes
deviation (SD) or Mean
Takes all data into account, but can be some sense!! See section 8.2.4
Variance
biased by extreme values. below.
Page 46 of 102
Half the width of the central 50% of the Scale data – Doable with raw
data. scale data, but might make
Quartile deviation
Median Not biased by extreme values, but ignores more sense with data in
(QD)
all but the Q1 and Q3 quartile values. groups/bins.
Ordinal data
Basically a more flexible/precise version of As for QD.
the QD, since any range of values can be
measured, eg a 5 to 95 percentile range
Percentile Ranges Median
would be calculated from P95  P5 , which
measures the central width of 90% if the
data.
The variance is simply the standard deviation squared. When dealing with statistics from mathematical point of view it is
often the variance that naturally arises in calculations and not the SD. A population variance would often be written as
 2 , directly indicating it as the square of  , the population SD.
8.2.3 Spread/variability/dispersion of a statistic! The key idea behind inferential statistics!

It is one thing to apply statistics to data, but remember that statistics are themselves just numbers. So these too can
have yet more statistics applied to them!! This is for VERY good reason.
It has been established theoretically that if your population data3 has mean  and standard deviation  then the

standard deviation of means taken from repeated samples of size n will be . This statistic is called the “standard
n
error of the mean”. It is a VERY important statistic, as it represents a measure of how much we might expect means to
vary by, between samples. We might only have one sample, but the theory allows us to know what could have
happened had we had (infinitely) many samples!!
There will be more on this in section 0 below.
Roughly speaking what we are talking about here is “imagining” what would happen if we could take multiple random
samples from a population: The theory work has been done, which we won’t be deriving here, but to give you at least a
clear idea, consider the following diagram of a population4 with some samples also depicted:
3
Typically Greek letters are used to represent statistics for a population. And the likes of x and s are used to represent statistics
for a sample.
4
This is a fairly crude representation! The way the samples are depicted they look like cluster samples, and not random samples!!
For the stats to work they MUST be random samples. If you must deviate from a random sample then this would lower the
robustness of your research, and this must all be reported on!
Page 47 of 102
Overall Population
Sample 1
Mean= x1
Sample 3
Mean= x3
Sample 2
Mean= x2
Sample 4
Mean= x4
Dashed lines are (crude!)
depiction of distances of
each sample mean from Mean of means
the mean of means. (will be the population mean)
Although only 4 samples are shown, you can imagine the setup if there were hundreds or thousands, etc. The “standard
error (SE) of the mean” is a measure of how much these means all vary from each other, or put more accurately how
they vary from the mean of the means!! This is the “natural variation” (sampling error/variation) that we can expect to
occur if we did take more than one sample. Luckily for us we will only need one sample to know the SE of the mean, at

least approximately. If we know  then we know that the SE of mean is , but we generally won’t know  so we
n
s
will approximate the SE of mean with . This approximation has consequences, which will be mentioned later.
n
The following diagram depicts what we know, following a sample, and is a bit more technically accurate than the above
thought experiment!
Page 48 of 102
 2 
N  , 
 n 
Our sample mean x is from

somewhere in the sampling
distribution of means (ie the
distribution of all possible means).
Population mean  , at centre of unknown sampling distribution of

means. The actual sampling distribution is unknown to us, but we know
from theory (the CLT) that it will be normally distributed, with mean 
 s
and have standard deviation . We can estimate this from from
n n
the sample.
An absolutely key point behind many (all?) inferential statistics is the following:
We know that sample results/stats will vary from each other, and we will often (in theory) know some measure of this
sampling error/variation (eg the standard error of the mean above). So if we take a new sample and find a substantially
bigger variation than we would expect according to the sampling error/variation, then we know we have evidence for a
“statistically significant effect”. A great many statistical tests are based on this core idea.
Extending the above example5:
Say our default theory (known as the null hypothesis in statistics) is that the mean for a certain population should be
100g (say). We take a sample, and find the mean is 98g for the sample, with a standard deviation of 3.017g. Graphically
what this example would look like is:
5
The example details here are taken from the “Single Mean: Mean that differs from given value” section.
Page 49 of 102
 3.017 2 
It is not fully explained here, but our N 100, 
sample mean of 98g ends up being  30 
out here (see below), for the default
s
theory of 100g, and the value of
n
3.017
which is 0.55 (2dp).
30
Population mean 100 g , at centre of approximate sampling distribution

of means. It is approximate because we have used s in place of  .
Our null hypothesis was a population mean of 100g, but the evidence (our sample) does not support this! The obtained
sample mean is actually highly improbable (which you will see when you perform the T Test later in section 9.3.3
below!). In this situation the sample mean is just too different from the expected mean; It is at the outer limits of the
normal/expected variation of means, so we would reject the null hypothesis! The evidence does not support a
population mean of 100g. We have found a statistically significant difference between the mean and the expected value
of 100g.
More on all this later …
8.2.4 Meaningless Statistics: Just because you can, doesn’t mean you should!
Even when it is possible to calculate a particular statistic, a key question is “does it make sense?”!! For example with the
telco.sav sample data file, it is possible to calculate the mean of the Geographic indicator variable, as follows:
Analyze/Descriptive Statistics/Descriptives:
Producing output:
Page 50 of 102
Descriptive Statistics
N Minimum Maximum Mean Std. Deviation
Geographic indicator 1000 1 3 2.02 .816

Valid N (listwise) 1000
But does a geographic indicator value of 2.02 mean anything!! Probably not, I would expect in this context.
8.3 Charts
Using charting to get an understanding of your data is generally very important. While it is easy to conduct a statistical
test to compare means (say), this test alone is not giving you a full picture of your data. Any statistically significant
difference detected is still (to some degree) just encouragement to go look at your data to see why there are
differences, and where there are differences! Charting is very important.
8.3.1 Display or Comparison of Counts/Frequencies: Bar charts, Pie charts, Histogram

Not much to be said here … Open the sample telco.sav dataset.
Enter Graphs/Chartbuilder. You will get this message:
If you have setup your variable definitions fully then you can just click OK here. If you haven’t then why haven’t you!?!
You know what to do … ! ;-)
You will then be presented with the following dialog:
Page 51 of 102
You select the type of chart you want on the left. Here we’ll create a bar chart of level of educational information, so
you click on the bar chart you want (the top left one) and drag it into the preview area to get:
Page 52 of 102
The “Element Properties” dialog appears automatically (on the left in the example above). This is a floating dialog, so
can be moved independently of the main dialog. If you close this you can easily retrieve it by clicking on the “Element
Properties …” button in the main dialog.
Next: Drag the “Level of education” variable onto the “X-Axis?” box in the preview area. You’ll get an initial (and small)
preview of your bar chart. You can click on different elements here, and the “Element Properties” dialog will change to
show you options that you can apply to the selected element. You could change the type of y axis from “Count” to
“Percentage” for example. Make whatever changes you want, and then click OK … where you try to create a chart and
the OK button is greyed out, it means that you haven’t fully defined the chart. Having selected Percentage for the Y axis,
and then clicking OK produces:
Page 53 of 102
Say I wanted to break this chart down into male/female for comparison purposes then I could start as before,
but before clicking OK, click on the “Groups/Points ID” tab. For example:
Page 54 of 102
These are the options for creating

multiple/comparison/grouped outputs … I’ve selected the
“Columns panel variable” which adds a “Panel?” box into the
preview. I dragged the gender variable into this box, which
produces the preview above. Try the other options for
yourself to see how they work.
The resulting charts are:
Page 55 of 102
8.3.2 Display or Comparison of Proportions from Counts/Frequencies: Pie charts, Percentage bar charts
The section title says it all here … these are the appropriate charts to use when trying to investigate, display or compare
proportions derived from ordinal or nominal data. For scale data it generally only makes sense after binning the data.
8.3.3 Display or comparison of (Scale) Values: Histogram, Line diagram, Error Bars
Histogram and line diagram are fairly standard diagrams, so are left for you to investigate for yourself. Error bars might
be new, so …
Error bars are a way to display confidence interval (and other) data for sub-groups directly on a chart for overall
comparison. Say we wanted to graphically compare the mean “Household Income” by education level. We could just
have created a scatter/dot plot with level of education as the x-axis categories, and mean household income for y-axis
values:
Page 56 of 102
When you initially drag the “Household Income” variable to the y-axis it would show
household incomes for all cases if you were to proceed to the output now.
In order to summarise down to just the mean within each educational level you must change
the “Statistic” from “Value” to “Mean” in the Element Properties window, and then click the
Apply button.
Then click OK to get the output.
This will produce the following chart:
Page 57 of 102
Interesting as this chart is, as was mentioned in section 8.2 above, it is more informative to include a measure of spread,
in addition to a measure of centre. We can do this in a number of ways, but a common approach is to include a
confidence interval. You can actually do this from the “Element Properties” window of the scatter/dot plot above!
However (for no particular reason) we will use the “Simple Error Bar” template from the Bar chart gallery as follows:
Page 58 of 102
This type of chart automatically applies the mean to each group.
It automatically selects the “Display error bars” option, and defaults to a 95%
confidence interval.
“Simple error bar”

chart template.
The resulting output now contains the “error bars”:
Page 59 of 102
This considerably changes how one might interpret the chart!! For example, the first chart seems to suggest a
considerable difference between “Did not complete high school” and “Post-undergraduate degree” groups, whereas
including the spread shows that there is a lot of variation amongst the latter group, to the extent that it is no longer
“obvious” that these groups differ significantly. We could now perform a T Test to check this statistically, which will be
left for you to check for yourself after you’ve completed section 9.3.4 below6!
8.3.4 Display of relationships between two values: xy-Scatter
8.3.5 Editing Charts

Double click on a chart in the Output window to edit it. Like working in Excel menus are context sensitive, so select the
chart component you want to work on, and then right click to see what you might be able to do with it.
Working with the chart editor is mostly a matter of just working out where the option/button you want is, so details are
left for your own trial and error!
6
You will find that the difference is in fact statistically significant, with a p-value (significance level) of 0.026. The confidence interval
for the difference is [68.7, 4.6] (in thousands).
Page 60 of 102
Page 61 of 102
9 Statistical Tests: Inferential Statistics

While using tables, statistics and charts to describe and summarise a sample is in itself useful, where statistics really
comes into its own is in the area of “Inferential Statistics”. This is where we use information gained from the sample,
typically along with other theoretical information, to produce information about the population! This is a VERY powerful
concept!
There are two broad types of inferential statistics: Parametric and Non-parametric statistics. Many statistical tests
make assumptions of various types. It is VERY important to check that any required assumptions are actually met,
otherwise any statistics and conclusions could be invalid/wrong! One common assumption is that the data or some
statistic will follow a particular distribution. Where a specific distribution is an assumption, then you are likely dealing
with parametric statistics. Part of your analysis work is in estimating the parameters of the relevant distribution. Often
this is fine, and the central limit theorem certainly helps when it comes to tests on means! However, sometimes you will
find that a distributional assumption is not valid, and then you’re stuck!
When a distributional assumption is not met, an alternative statistic approach is the non-parametric approach. These
non-parametric statistics can also have distributional, and other, assumptions, but they are typically less restrictive than
corresponding parametric statistical methods. There is typically some penalty for this, such as the test being less
powerful7.
To further complicate matters, some parametric tests can have assumptions which are technically needed, but
requirements can sometimes be loosened up, under “correct” conditions. When this is possible the statistical test is said
to be robust to the loosening up.
This document mainly deals with parametric statistics, and doesn’t dwell on assumptions! It is worth reiterating from
the preface:
Where the results of a statistical analysis will possibly be of some critical importance than I’d suggest getting a
professional statistician involved!! This document is only an introduction to SPSS and statistics!!
9.1 Sampling Distributions: The Background to Hypothesis Testing & Inferential

Statistics
Say we have a population8 from which we take a sample. From this sample we calculation (say) a single statistic T. This
could be the mean, or a proportion, or standard deviation, or any statistic. Now, in reality, samples differ9. If we were to
repeat the process, and take another sample and calculate our statistic T we would likely get a different answer! We
could keep repeating this process getting estimates of our statistic T as T1 , T2 , ..., Tk , after k samples from the
population. These are just a set of numbers, so we can calculate their mean, and their standard deviation (SD).
So what?!
Well, the SD is now an estimate of how this statistic randomly varies between samples. One could call this “natural”
variation as it is comes about as a natural result of taking random samples from the population.
To understand the usefulness of this let us now consider an example:
7
This is “power” in the statistical sense: The ability to detect an actual difference.
8
Typically populations are assumed to be infinite or at least much much (10 times or more) bigger than our sample. When sampling
from finite/”small” populations this needs to be taken into account using statistics that have a “finite population correction”. This is
beyond the scope of this document.
9
Assuming that the population has variety present!
Page 62 of 102
Say we have been contracted to manage a marketing campaign. Part of the contract is that we must produce statistical
evidence as to whether the campaign had an actual impact or not.
We could take a random sample from the target population before the campaign and measure (in some way) their level
of knowledge of the product to be marketed. We then carry out the campaign, and afterwards we take a second
random sample from the population and again measure level of knowledge.
There will likely be a difference between the two measurements, the before and the after measurements. What we now
would really like to know is: Is this difference so small that it is likely to in fact be nothing to do with the campaign but is
just part of sampling variation (ie “natural”/inherent variation), or is the difference sufficiently big so that we can be
sure it is in fact more than what can be accounted for by random sampling alone? If it is then we can assert that the
campaign has had a definite effect. And this is what hypothesis testing does!
The distribution of the T1 , T2 , ..., Tk is known as the sampling distribution, and the “natural” variation is known as
sampling error10. It is the distribution that results for a statistic from repeated random samples from a population. Using
hypothesis testing, as in the marketing example above, we can determine if the population has changed or not, for
example.
In addition to being able to conduct hypothesis testing as outlined above, knowledge of the sampling distribution for
our statistic (whatever it is!) will enable us to make inferences from our sample about the overall population. This,
“inferential statistics”, is VERY powerful!! If this wasn’t possible then a lot of medicine, for example, just wouldn’t be
possible ... Knowing that a drug worked to a certain level for any sampled group would tell us nothing about the rest of
us!! Luckily this is not so, and knowledge from a sample can be used to describe a range of likely effects on a wider
population. Similarly in marketing, for example, knowledge of the preferences of a sampled group can be used to infer
details of the likely preferences of a larger population. And so on for many areas of life!
9.2 Hypotheses, Type I and Type II Errors, Power, Effect Size
To understand hypothesis testing let us consider a criminal trial. The default position is that the accused is innocent, and
the alternative is that they are not innocent (stating the obvious!). If someone is innocent then there are a range of
things, pieces of evidence, that we might expect to find, such as a good alibi. We have various actual pieces of evidence,
so we can see how many of these actual pieces of evidence match what we would expect given innocence. How many
pieces match up, in probability terms, is called the p-value. So the p-value is the probability of the evidence given
innocence, or more generally is the probability of achieving a particular statistic (eg your particular sample mean) from a
certain default distribution of such statistics (eg distribution of sample means if default is in fact true).
If lots of actual evidence matches up with what we would expect given innocence, then we reject the guilty accusation,
and accept innocence11. Statistically this corresponds to a “large” p-value, eg p=0.53, or in general a p-value greater
than our chosen significance level; typically p>0.05, for example, results in the alternative not being accepted.
If little or none of the actual evidence matches up under the assumption of innocence, then we reject innocence, and
accept guilt12. Statistically this corresponds to a p-value less than our significance level, for example p<0.05.
Even in the presence of evidence we may still make the wrong decision, either way, see Table 2 (below) for more
details.
10
Although it is not an “error” as such! It is just the natural result of taking a random sample.
11
Of course actual innocence may or may not be the true fact!!
12
Which again may or may not be the true fact!!
Page 63 of 102
Many research questions are framed in the words of a (null) hypothesis, and an alternative. For example:
Hypothesis: The average amount spent by customers in the supermarket over the weekend was €100.
Alternative: The average amount spent was not €10013.
Hypothesis: Proportion of those surveyed agreeing or strongly agreeing with statement X was 0.5.
Alternative: Proportion agreeing or strongly agreeing with statement X was not 0.514.
Hypothesis: New drug is no different in relieving pain than the standard drug.
Alternative: New drug is different (or better, or worse15).
Hypothesis: Marketing campaign has not raised awareness of product.
Alternative: Marketing campaign has raised awareness of product.
The default hypothesis is called the null hypothesis in statistics. The null hypothesis is typically a statement along the
lines of “nothing interesting/new here”. The alternative is called just that: The alternative hypothesis. These are
respectively referred to as H0 and H1 in statistics.
The null hypothesis tells us what sampling distribution we need to consider (ie distribution of means, or proportions?).
This corresponds to us considering what things are possible if H0 is indeed true. We can later use information against
what we actually find, and see how likely what we actually found is, if H0 is in fact true.
The alternative tells us the type of hypothesis testing we must do. See table above, and footnote 15. In most cases this
tells us whether we carry out a 1 or 2 sided test, although more complicated alternatives are possible.
It can take a little getting used to in order to frame research questions in this type of format, however this is really
essential if you want to use statistics to prove/disprove any such hypothesis. It is essential to be clear what the default
hypothesis is, and what the alternative might be. Typically the “default” hypothesis states some kind of status quo type
expression, and it is often in proving the alternative that we are in fact interested16.
When viewing the following table remember that we do not (in general) know reality, we just know the evidence and
infer what we believe must be true from this. Behind what we infer and believe still lives the reality!
Reality  They are (in reality) innocent They are (in reality) guilty
What we do 
We’re right ... We found them guilty,
We reject innocence We have found them falsely guilty!!
and they (in reality) are indeed guilty.
We’re right ... We found them
We accept innocence innocent, and they (in reality) are We have found them falsely innocent.
innocent.
The four outcomes from the above table are possible in any hypothesis test17 as follows:
13
Later questions obviously might be: Was it above or below the €100.
14
Again, later questions are obvious.
15
The statement of the alternative has significance in statistics as it not only is the alternative to the initial hypothesis, but it can
also include other information. For example if you somehow knew that the new drug was definitely not worse, then the
initial/default hypothesis is as stated, but the alternative can include this extra info, and would become: The new drug is better than
the standard drug.
16
For technical reasons it can be somewhat “simpler” if we can arrange to reject H0, if our experiment works out as “expected” as
we then won’t need to concern ourselves with Type II errors or Power issues.
17
Or any argument!!
Page 64 of 102
Reality  H0 True H0 False

What we do 
Type I error, measured by  and/or
the p value18, often  is 5%.
We reject H0 We’re right
Remedy: Make  smaller … but this
could introduce Type II errors
Type II error, measured by  … if we
don’t reject H0 when it is fact false
then typically our sample size wasn’t
We don’t reject H0 big enough to detect the level of
(Note it is unusual to actually “accept” difference present between H0 and
H0, unless the evidence is very strong, H1.
We’re right
since there could be some other
alternative H1 that we just haven’t Remedy: Increase sample size.
found yet!)
This area relates to what is called the
“Power” of a test, it’s ability to detect
difference that is in fact present.
Table 2: Actions and resulting outcomes.
It is important to be aware of the four possible outcomes, since this will help you both understand what you have from
your CI/hypothesis test and what issues (and resolutions) there might be.
The Type I (spoken as “type 1”) and Type II (spoken as “type 2”) errors go back to hypothesis testing where these are
the standard terms. If you examine the table you will see that:
9.2.1 Type I Errors

Type I error is the probability (measured by  , or the “p-value” often given in statistical packages) that H0 is actually
true but we (mistakenly) reject it. In terms of a CI, this would be the equivalent of having a population whose means is
in fact 500, but we get a sample that happens to be too far into one of the tails of the distribution, such that the
sample’s CI does NOT include the population mean. Clearly a CI has a confidence level, which implies there is some level
of probability that we are in fact wrong! Graphically this is how a type I error occurs with a CI for a mean (similar idea
would apply for a proportion):
18
A “p value” is the probability of getting the calculated sample test statistic (eg mean) assuming the null hypothesis is true. Think of
it as evidence for H0. If p is “low” (ie p   ) then there isn’t evidence from your given sample supporting H0 and we might
(perhaps) reject H0 in favour of H1. If p is “large” (ie p   ) then there is evidence that H0 is indeed true and we don’t reject H0 at
the given significance, for the given sample size.
Page 65 of 102
 2 
N  , 
 n 
Our sample mean ˆ , surrounded by

CI which doesn’t include the
population mean  .

There is of a probability that the Population mean  , at centre
2
of unknown sampling
sample mean will be in either tail, giving an
distribution of means.
overall probability that ̂ is in a tail of  .
And an overall confidence level of 1   .
The consequence of a Type I error is that we will incorrectly reject H0. As an example using proportions consider:
H0: The population opinion is split 50:50 on statement X.

H1: More people agree with statement X.
We calculate our 95% CI and it comes out as (say)  0.55, 0.75 . This is entirely above 0.5 (the 50:50 point) so we would
conclude at 95% level of confidence that we should reject H0. But you have to bear in mind that there is a 5% chance
that this is incorrect, when in fact it was just a “freak” sample. These are “random” samples (or they should be!!) so this
is going to happen from time to time.
One approach to avoiding type I errors is to calculate your CI for a range of confidence levels, and see what “best” level
of confidence you can achieve is. Here this would be the highest level of confidence that the entire interval is above 0.5.
At 95% (above) it is just over the 0.5, so the 99% level might fail. This is then highlighting to us that we may be in type I
error territory. If so it is just a matter of pointing this out in any analysis. If another sample was possible then this could
reassure us one way or other.
Some level of type I error is typical in any analysis. It is just commented on as part of the analysis. Obviously the smaller
it is the better, which is why it is good to see what the highest level of confidence we can generate is.
9.2.2 Type II Errors and Power

Type II error is the probability (measured by  , or more commonly 1   , which is referred to as the “power” of a test,
which is often given in statistical packages) that H0 should be rejected, but our CI includes the null hypothesis mean so
we can’t reject H0.
While an important idea we will not progress this much further, as it is a more advanced topic.
The “best” thing to do is always set up your hypothesis so that you can reject H0, and then you can never have any type
II error at all!
If you fail to reject H0 then you must consider type II error, and test power.
Page 66 of 102
Another way to avoid/lessen/remove type II error is to increase sample size and redo your sample, in the hope that a
larger sample will contain sufficiently precise and accurate19 information to reject H0.
9.2.3 Effect Size

Where a statistical effect is found to be present/significant then an important question will be “what is the effect size?”.
Something that might be statistically significant might not have much practical significance if the effect size is not large
enough (whatever that might in the specific instance). See (Pallant, 2010) for further info and references.
9.3 Means
The mean is probably the single most commonly used (abused20?) statistic. It is used with scale data. It represents the
arithmetic centre of all of the data. It is affected by extreme values, and can become unrepresentative of a typical value
as a result.
9.3.1 Central Limit Theorem and More on Sampling Distributions

The mean has particular prominence because of a theorem called the “central limit theorem” (CLT): For our purposes
this theorem basically says that for samples from ANY population, with a sufficiently large sample size, we can take it
that the means will be distributed normally!
This is VERY useful. A lot of real world data will not necessarily have a “nice” mathematically accessible distribution. If
we were relying on the distribution of the population to be “nice” then this could cause problems for doing statistics.
However the CLT tells us that we need not concern ourselves too much about the population distribution, if we intend
to base our statistics on means, as they will be normally distributed, once the sample size is “large enough”.
What is “large enough” is hard to say!! But many people take n>30 as sufficient to invoke the CLT. In fact it gets invoked
even for small-ish sample sizes!
There is a good demonstration of the CLT at http://onlinestatbook.com/stat_sim/sampling_dist/index.html, which is

where the following screen shots come from. This is an important idea.
19
The terms precision and accuracy are often confused. Something is precise if the range of answers is narrow, even if that are not
accurate. Accuracy refers to the correctness of a value.
20
Be careful not to calculate a mean where it is not the appropriate measurement to produce a typical/representative value! For
example with data that has extremes, or indeed for data that is coded as a number but which isn’t in reality scale data! A simple
example is Male/Female data coded as 0/1. It is possible to calculate the mean, but it makes no sense!
Page 67 of 102
Here I have drawn/created an odd population from which

samples will be drawn by the simulation software. As
samples are drawn the distribution of means will be filled in
below. This will be the “sampling distribution of means” as it
is the distribution of means achieved from repeated
sampling. Shown below are means from samples of size 5,
and 25.
After a single 5 means have been calculated from 5 samples (ie 5 samples of size 5, and 5 samples of size 25) we get:
Page 68 of 102
No pattern obvious yet ...
After another 5 random samples for each sample size:
And it is still not obvious that there will be a pattern to the sampling distributions ... However, if I advance things with
1000 such random samples (ie 1000 of size 5, and 1000 of size 25) we get:
Page 69 of 102
Notice that the two distributions of sample means below are both beginning to display
some level of similarity with a normal distribution ... as we know they MUST because we
know of the CLT!!
Finally, adding 10,000 samples for each set of means gives:
There have now been 11,010 sample means calculated for each distribution below,
random samples taken from the population above. Also included below is a normal
distribution curve, and you can hopefully see just how close the distribution of means
actually is to the normal curve!! Even for samples of size 5!!
This simulation tool is a great way to get to grips with sampling distributions. It is possible to display sampling
distributions for difference statistics, with different sample sizes, and different populations.
Page 70 of 102
So because of the CLT we know that means (from samples of “sufficient size”) will be normally distributed. This is great
because a lot is known about the normal distribution!!
9.3.2 The t distribution and t tests

In order to get statistical results for a normal distribution one needs the population standard deviation. This will
generally not be known, and we use the sample standard deviation as an approximate replacement. Since this sample
statistic, like all sample statistics, is subject to sample error/variation, we must take this extra level of uncertainty into
account. In order to take this into account we must use the “t distribution”. A t-distribution is very like a normal
distribution, except it is typically “wider”. This extra spread is a result of us taking into account the uncertainty in using
our sample standard deviation as an approximate replacement for the population standard deviation.
This is why the first three tests in the following sections are called “t tests”.
9.3.3 Single Mean: Mean that differs from given value

To compare a mean against a given value use the Analyze/Compare Means/One Sample T Test.
Example: We want to check that a machine in a factory actually delivers an average of 100g of sweets per bag.
We take a sample of 30 bags, and weigh the contents, recording the results in SPSS. We find that the mean is 98g.
However this difference could be just sampling variation/error. We conduct a One Sample T Test:
With resulting output:
One-Sample Statistics
N Mean Std. Deviation Std. Error Mean
Weight of sweets 30 98.00 3.017 .551
One-Sample Test
Test Value = 100
t df Sig. (2-tailed) Mean Difference 95% Confidence Interval of the

Difference
Lower Upper
Weight of sweets -3.631 29 .001 -2.000 -3.13 -.87
Page 71 of 102
Looking at these tables in detail:
Weight of sweets 30 98.00 3.017 .551
These are straight-forward statistics for the weight,

from the sample of 30 bags of sweets.
The “Std. Error Mean” is the “standard error of the mean”, and it is the standard deviation of
the means that you would expect to get, if you were to take many separate samples, and for
each one calculate a new mean. It is thus a measure of the sampling error/variation that we
expect to get when calculating means here. Every sample statistic also has a standard error
statistic associated with it, to describe the variation we would expect to find from sample to
sample for this sample statitstic.
Next to the test results table, which has the key information here from the t test:
Page 72 of 102
df stands for “degrees of freedom”. Here it is just n  1 . The shape of the t distribution
is affected by the sample size, since the bigger the sample the better our estimate of
population standard deviation will be. This is taken into account by this “df” term.
Typically (across different statistics) the higher the DF the better.
Sig. stands for (statistical) significance. This is the “p-value” (See Table 1) here, and
represents the “evidence” that supports the null hypothesis, ie that the population of bags
actually has a mean of 100g, given the sample information/evidence.
We would “reject the null hypothesis” here as the p-value is that there is only 1 in 1000
chance (ie 0.001) that the population mean is in fact 100g, given the sample evidence.
The “2-tailed” refers to how we state our “alternative hypothesis” (See Table 1). If we
believe that the alternative might be mean > 100 (or for that matter mean < 100) then we
would conduct a 1 sided test. Since it could be either way here, this needs a 2-tailed test!
One-Sample Test
Test Value = 100

Difference
Lower Upper
Weight of sweets -3.631 29 .001 -2.000 -3.13 -.87
This is the “t statistic” for our sample: This is simple

x 
to calculate by hand as the formula is: ,
SE
where  is the population/comparison mean. Here
this t statistic is
 98  100  .
0.551
This is just the difference between our sample and

100 (the expected population mean).
The “95% confidence interval” gives us a range of values that we are 95% confident that the actual difference
is. Here we are 95% confident that the population mean is below the expected 100g by from 0.87 up to 3.13.
While many people just use hypothesis tests, and quote p-values, confidence intervals are more useful as they
contain more information. From a hypothesis test we just know that there is 1 in 1000 chance that the
population mean is in fact 100g, whereas the confidence interval actually tells us a range of sizes for the
difference. Other common confidence intervals are 90% or 99%.
Page 73 of 102
9.3.4 Two (Independent) Groups: Means that differ from each other
Once one has the concepts for a single mean, from 9.3.3 above, the ideas are relatively easily extended to compare two
means. As it happens there are two quite different research designs that are possible which could result in comparing
two means, and it is CRUCIAL to be clear which one you are actually dealing with!
This section deals with the design where there is a sample taken from two independent groups. This is sometimes called
a “between subjects” design, as we will be investigating differences between two separate/independent groups.
The alternative, dealt with in section 9.3.5 below, is where the same group are sampled on both occasions. They are
thus not independent groups! This is sometimes called a “within subjects” design, as we will be investigating differences
within a single/dependent group.
As an example, say we have conducted a survey of student opinions of a maths web site(!!), where these are scored on
a 1 to 5 Likert scale21, with 1 being “very poor” and 5 being “very good”. We carried out a survey before making some
changes to the web site, and a second survey of a new random group of students (say the following year22), after making
changes to the site. We want to see if student opinions have changed, ie mean2 = mean1?
Note that although this example relates to possible changes across time, it is still a between subjects design, since we
are not using the same subjects for the second survey.
Summary information from SPSS shows (Split by timing):
Statistics
Student opinion of maths web site
Valid 22
N
Missing 0
Before upgrade Mean 2.64
Median 2.50
Mode 4
Valid 27
N
Missing 0
After upgrade Mean 3.89
Median 4.00
Mode 5
21
See for example http://en.wikipedia.org/wiki/Likert_scale for a discussion on whether it is valid to treat Likert data as scale data,
which is required for the t test here!
22
Asserting that any change in the opinions is attributable to the web site changes could easily be challenged here! Someone could
argue that it is simply a new set of students, one year on, who happen to feel more positive. Research design should ideally consider
how change might be attributed, so that such questions can also be answered!
Page 74 of 102
Student opinion of maths web site
Timing of sample Frequency Percent Valid Percent Cumulative

Percent
Very poor 6 27.3 27.3 27.3
Poor 5 22.7 22.7 50.0
Neutral 3 13.6 13.6 63.6

Before upgrade Valid
Good 7 31.8 31.8 95.5
Very Good 1 4.5 4.5 100.0
Total 22 100.0 100.0

Very poor 1 3.7 3.7 3.7
Poor 4 14.8 14.8 18.5
Neutral 5 18.5 18.5 37.0

After upgrade Valid
Good 4 14.8 14.8 51.9
Very Good 13 48.1 48.1 100.0
Total 27 100.0 100.0
Have a look at the various summary data. It certainly (to me!) seems to suggest that things have improved in the opinion
of the sampled students ... but ... could this difference just be down to sampling error, or is it something “new”. To find
out we conduct an “Independent-Samples T Test”, making sure to clear any “split file” command.
Page 75 of 102
The results are:
Group Statistics
Timing of sample N Mean Std. Deviation Std. Error Mean
Student opinion of maths Before upgrade 22 2.64 1.329 .283

web site After upgrade 27 3.89 1.281 .247
Page 76 of 102
NOTE: The main test results (next) have been split here to make the whole table visible within the page. In
SPSS this is a single table:
We’ll come back to the Levene’s test below, but first the main point is the “Sig.” information, which shows that there is
likely to be a real difference in opinion between the two groups. From the confidence interval information the real
difference could be as low as 0.5 of a point on the Likert scale (of little practical significance), up to 2 points on the scale.
When reviewing this type of information be careful to know what was actually calculated: Here it was
mean(before) – mean(after), so the minus signs in the CI are in the web site’s favour!! Had it been
mean(after) – mean(before) then negative CI values would mean things got worse in the opinion of the second group!
There are two sets of statistics shown, one row is for “equal variances assumed” and the other for “equal variances not
assumed”. This is to do with how the population standard deviation approximation, used in the t test, will be calculated,
and also how the t test itself is calculated. Levene’s test has a significance level of 0.642 here, indicating there is good
evidence to believe that the variances are indeed equal, so we can reasonably safely use the first row of results as our
actual results. The “F” under the Levene heading refers to the F distribution, which is just another of a number of well
known distributions.
9.3.5 Paired (Dependent) Differences: Means that differ within subjects across 2 times
It is worth reading sections 9.3.3 and 9.3.4 above before this section.
Look in Help/Case studies, and then Statistics Base/T Tests/Paired Samples T Test. Good experience to get used to the
case studies help files. Expand the tree item, and the sub-tree item, and you should see:
Page 77 of 102
Click through the menu items available under the Paired-Samples T

Test item to see how it works.
Use these arrow buttons to move through a

particular options step by step details.
Carrying out the steps given in the case study should ultimately result in output as follows:
Page 78 of 102
Paired Samples Statistics
Mean N Std. Deviation Std. Error Mean
Pair 1 Triglyceride 138.44 16 29.040 7.260
Final triglyceride 124.38 16 29.412 7.353

Pair 2 Weight 198.38 16 33.472 8.368
Final weight 190.31 16 33.508 8.377
Paired Samples Correlations
N Correlation Sig.
Pair 1 Triglyceride & Final 16 -.286 .283

triglyceride
Pair 2 Weight & Final weight 16 .996 .000
Paired Samples Test
Paired Differences
95% Confidence Interval of the

Difference
Mean Std. Deviation Std. Error Mean Lower Upper t df Sig. (2-tailed)
Pair 1 Triglyceride - Final 14.063 46.875 11.719 -10.915 39.040 1.200 15 .249
triglyceride
Pair 2 Weight - Final weight 8.063 2.886 .722 6.525 9.600 11.175 15 .000
At this stage, having gone through the last two sections, the interpretation of this output should be fairly straight-forward.
Page 79 of 102
9.3.6 Three or more (Independent) Groups: Means that differ from each other with 3 or more groups:
ANOVA
In section 9.3.4 above a method for comparing two means was presented. However, it is equally possible to have 3 or
more means that need to be compared. While this can be achieved via multiple T Tests, this is not good practice. Every
statistical test involves a potential error (Type I or Type II), and so conducting lots of tests of any sort leads to an
increased likelihood that the overall results/conclusions are likely to be incorrect/flawed. This is called the family-wise
error rate … the overall error from conducting several/many separate tests.
It is possible to conduct a single statistical test involving many means, and this can be achieved via an ANOVA. The
acronym ANOVA stands for ANalysis Of Variance. The reason for this naming comes from the mathematical background
to ANOVAs.
There are many types of ANOVAs possible, and the details are beyond the scope of this document.
The relevant menu paths in SPSS are: Analyze/Compare Means/One-Way ANOVA, and Analyze/General Linear Model,
with all 4 sub-menu items being relevant in the latter path. There are further relationships between the general linear
model and regression, but this is again beyond the scope of this document.
9.3.7 Three or more (Dependent) Groups: Means that differ from each other with 1 group across 3 or
more times: ANOVA
Similarly to the last section, 9.3.6 above, coverage of this topic is beyond the scope of this document, but to give a
starting point: The relevant technique here is called a Repeated Measures ANOVA.
The relevant menu path in SPSS is: Analyze/General Linear Mode/Repeated Measures.
9.4 Proportions
Wherever you might calculate a percentage out of some total number/count, then you might find it useful to carry out
statistical testing on proportions. All percentages are just proportions multiplied by 100%, so 33% is 0.33 as a
proportion.
A common use of statistics for proportions is the analysis of Likert scale data. Say you have a scale from 1 to 5, with 1 =
Very poor, 2 = Poor, 3 = Neutral, 4 = Good, 5 = Very good, rating a given service. From your data you find that 60% of
respondents rate the service as poor or very poor. While that is bad news, now you are wondering what might this
sample result tell you about the possible population proportion rating for your service!
To tackle the purely statistical hypothesis question of “The majority of the population rate my service poorly”, a
crosstab could be used (see “Crosstabs and Frequencies: Chi-Squared tests” section later). However a test that produces
a confidence interval (CI) is often “better”, since the CI conveys not only the information about the hypothesis, but also
a range of values that are likely for the statistic involved. The “better” here has to be quoted since there are always pros
and cons for any test, so the CI might not be “better” at all, if it lacked some other desired statistical property, such as
power. It depends. Let’s not get too distracted at this introductory level by too many subtleties23.
23
Just ignoring the possible presence of subtleties is NOT good. Knowing that they are there at least should make you weary of
jumping to too many conclusions, or not having some healthy scepticism about conclusions drawn from results. Like many learning
tasks, learning statistics is best viewed as an evolving process. No subtlety today; some awareness next week; Even more in a year’s
time!! And so on.
Page 80 of 102
Tests for proportions are not directly available in SPSS, but there is an easy workaround so long as sample is “large”, and
test proportion isn’t near 0 or 1. Altman, et al., 2005, gives relatively straight-forward details for more accurate testing
of proportions.
The “trick” to get SPSS to carry out proportion testing is to create a new variable that has the following coding: 1 = Has
the characteristic we are interested in; 0 = Doesn’t have the characteristic we are interested in.
No. of cases with characteristic

The mean of this new variable will be , which is the proportion of interest! SPSS will
Total Number of cases
carry out T tests on this new variable, which will give us the hypothesis and CI results we want. Note should be made
that this is an approximate method, and the approximation gets worse as the characteristic of interest nears 0% or
100%, ie proportion near 0 or 1, or if the sample is “small”. What “near” means depends on sample size in particular, as
this directly effects the SE estimate, which is the spread of the proportion statistic. Basically the sampling distribution
for the proportion statistic should not go outside of [0, 1]. See ??? for more details.
The reason that this method is approximate is that proportions calculated from counts are not continuous! Not every
value is possible in between actually possible values. Not unless you can continue to get a larger and larger sample!
There are ways to calculate proportional statistics exactly, but they are beyond the scope of this document.
9.4.1 Single Proportion: Proportion that differs from given value

Open the telco2.sav file.
There is a categorical variable called willStay, with codebook details of:
willStay
Value Count Percent
Position 44
Will stay with

Label
provider
Format F4
Measurement Ordinal
Role Input
1 No way 159 15.9%
2 Maybe 188 18.8%
Valid Values 3 Neutral 204 20.4%
4 Probably 206 20.6%
5 Definitely 243 24.3%
What would be interesting (for example) here would be to find out if the population is likely to have a preference for
staying or not staying with the service provider. This corresponds to:
Null hypothesis: Proportion saying they will stay 0.5 (ie there is nothing “interesting” here, and people have no
particular opinion).
Alternative hypothesis: Proportion is not 0.5 (ie there is a distinct preference)
Page 81 of 102
We create a new variable, willStayY, which we compute from the existing willStay variable as follows:
Transform/Compute variable:
1. Type in the new variable name, and give it a label of

“Will Stay Preference”, and type of Numeric.
2. Enter the value (zero here) to be allocated to those
who responded either 1 or 2. We are recoding the
1’s and 2’s to 0 here.
3. Use the “If” button to limit the cases that will result in
willStayY = 0 (from above). See next screenshot below.
Page 82 of 102
Repeat the Transform/Compute variable steps again, but this time for willStay>3, and recode it to 1:
Page 83 of 102
The neutral value here of willStay=3 can either be left as “system missing” (ie dots in these cells in the data view), or
else you can use the Transform/Compute variable one last time to recode the willStay=3 into willStayY=9, with 9
representing “missing value” here.
Go to the variable view, and change the willStayY variable to have 0 decimal places, and if you used 9 as missing value,
then enter this for the missing field.
Now conduct a one sample T test: Analyze/Compare Means/One Sample T Test:
Put the willStayY variable into the Test Variable(s) box.

The null hypothesis (nothing interesting/new here) test value is 0.5 here, meaning
respondents had no preference. If this is NOT true then we know they have a
preference, which is what we want to check!
Page 84 of 102
The output results are:
Will Stay Preference 796 .56 .496 .018
One-Sample Test
Test Value = 0.5

Difference
Lower Upper
Will Stay Preference 3.643 795 .000 .064 .03 .10
We can now see that the proportion does differ statistically significantly from 0.5, so respondents definitely had some
preference. The sample mean is 0.56, but the 95% CI tells us that the population proportion that are likely to say they
would stay is 0.5 + 0.3 up to 0.5 + 0.10, which is 0.53 up to 0.60.
Looking purely at the significance we’d just know that there was likely to be a preference, but having the CI tells us the
range of the preference, so is more useful and informative.
And this is how you can do approximate statistical tests on proportions.
9.4.2 Two (Independent) Groups: Proportions that differ from each other
Same idea as in section 9.4.1 above: Recode the original variable into a new 0, 1 variable. Then use the
Analyze/Compare Means/Independent Samples T Test on the new variable.
9.4.3 Paired (Dependent) Differences: Proportions that differ within subjects across 2 times
Same idea as in section 9.4.1 above, except first recode the two original variables into two new 0, 1 variables. Then use
the Analyze/Compare Means/Paired Samples T Test on the two new variables.
9.5 Crosstabs and Frequencies: Chi-Squared tests

Back in section 8.1 crosstabs were covered to the extent of producing detailed frequency/count breakdown of
categorical or nominal data.
An interesting and useful question at times is: Do the pattern of counts that you get in a crosstab have any significance?
One common approach statistically to this question is a “chi-squared” test, which is sometimes written as  2 , which is
the Greek letter “chi”.
A chi-squared test is an approximate test (beyond this doc to explain why!), but is usually “good enough” once none of
the cells in the crosstab have resulting count less than 5.
Page 85 of 102
A chi-squared test can be used in a number of different ways, which are named differently, but in essence they are all
tests of frequencies against some sort of expected values, under some sort of assumption. The two most common tests
are: Chi-squared test for independence, or a chi-squared test of goodness of fit.
9.5.1 Chi squared independence testing

First we’ll look at a quick example of what “independence” means here.
Are cases of lung cancer independent of the person smoking24?
Lots of people who never smoked get lung cancer, so someone who wants to defend smoking might argue that lung
cancer is NOT related to smoking, it is just something that people can get, and anyone can get it. To test this idea we
could gather some data on a cohort of people, some who smoke, and some who don’t. We test everyone for lung
cancer, and produce (for example) the following table:
Lung cancer diagnosis No lung cancer diagnosis Total

Currently smoke 75 400 475
Don’t currently smoke 25 500 525
Total 100 900 100
The data is this table is fictitious. See footnote 24.
A chi-squared test compares the table observed counts against counts that would be the expected counts if smoking
and lung cancer are NOT related … These are easy to calculate by hand, as follows:
If we ignore smoking then there are 100 people with a lung cancer diagnosis out of the 1000. So under
100
the assumption that smoking makes no difference then we would expect of any group to
1000
actually get lung cancer. We apply this fraction to the 475, and the 525, and this gives us the number
of lung cancer diagnosis we would expect to get if smoking makes no difference:
100 100
475  47.5 , and 525  52.5
1000 1000
Similarly there are 900 out of 1000 that don’t get lung cancer, so we can this fraction to the 475 and
525 to get the number we would expect not to have lung cancer, if smoking makes no difference:
900 900
475  427.5 , and 525  472.5
1000 1000
So the expected table, assuming smoking makes no difference, would be:
Lung cancer diagnosis No lung cancer diagnosis Total

Currently smoke 47.5 427.5 475
Don’t currently smoke 52.5 472.5 525
Total 100 900 100
24
I’ve made this example up!! But if you Google “smoking and lung cancer chi squared” you should be able to find actual
data/examples.
Page 86 of 102
 Oi  Ei 
2
A chi-squared test calculates the chi-squared statistic as:  

2
 Ei
, where Oi are the
observed counts, and Ei are the expected counts, under the assumption of smoking making no
difference. This  2 value can then be compared against a “standard” chi-squared distribution to see if
it is what we would expect if the two tables of frequencies agreed.
We’ll come back to this example, and complete the details in Excel in section 9.5.2 below. You
probably won’t be surprised to find that the test will show that the assumption is NOT correct, so lung
cancer diagnosis here is NOT independent of smoking.
This idea can be applied to many situations. Taking the telco.sav sample file, we can check if churn is independent of
education level, gender, marital status, etc.
Open the telco.sav file, and then Analyze/Descriptive Statistics/Crosstabs:
As shown above, enter level of education, gender and marital status for the rows, and churn for the columns (it actually
makes no difference which are which for the Chi-squared test). You can use the “Cells” button to output the expected
counts (and more besides) if you want. Click on “Statistics”, and (as shown below) select the “Chi-square” option:
Page 87 of 102
Click continue and then OK.
I’ve just included the level of education * churn crosstab and chi-squared results below. Others are omitted for brevity:
Level of education * Churn within last month

Crosstab
Count
Churn within last month Total
No Yes

172 32 204
school

Level of education
Total 726 274 1000
Chi-Square Tests
Value Df Asymp. Sig. (2-

sided)
Pearson Chi-Square 42.620a 4 .000

Likelihood Ratio 42.699 4 .000
Linear-by-Linear
41.560 1 .000
Association
N of Valid Cases 1000
a. 0 cells (0.0%) have expected count less than 5. The minimum

expected count is 18.08.
Page 88 of 102
The significance here, shown having been calculated using various techniques, is less than 1 in 1000, ie < 0.001, so the
test is telling us that there is a significant difference in churn across different levels of education. There are further
options to investigate which cells in the crosstab are actually contributing most to the significance. Under the “Cells”
button you can tick the “Unstandardized” residuals option (see below), which will show you the individual  Oi  Ei 
results. The bigger (in absolute size, ignoring the size for now) these are the more that cell is contributing to the
 Oi  Ei 
2
statistically significant result … It is better (IMO!) to actually look at the individual results to get a more
Ei
accurate view of what is producing the stats result, but that option is not available directly. It can easily be done by hand
using SPSS’s observed and expected values:
Tick to get both the observed and expected counts, which

enables easy calculation of the individual chi-squared values.
Repeat the above crosstab now gives output including the following:
Level of education * Churn within last month Crosstabulation
Churn within last month Total
No Yes
Count 172 32 204

Expected Count 148.1 55.9 204.0
school
Residual 23.9 -23.9
Count 224 63 287

Level of education
High school degree Expected Count 208.4 78.6 287.0
Residual 15.6 -15.6
Count 150 59 209

Some college
Expected Count 151.7 57.3 209.0
Page 89 of 102
Residual -1.7 1.7
Count 142 92 234
College degree Expected Count 169.9 64.1 234.0
Residual -27.9 27.9
Count 38 28 66
Post-undergraduate degree Expected Count 47.9 18.1 66.0
Residual -9.9 9.9

Count 726 274 1000
Total
Expected Count 726.0 274.0 1000.0
 Oi  Ei 
2
Calculating the individual “manually” (ie done in Excel) here produces:

Ei
Churn
No Yes
Did not complete high school 3.86 10.22
High school degree 1.17 3.10
Some college 0.02 0.05
College degree 4.58 12.14
Post-undergraduate degree 2.05 5.41
And now look for what is “large” (ie contributes most to producing the significant result). Clearly “Did not complete high
school” and “College degree” seem to be the main contributors, as shown shaded in the table. These two alone produce
a  2 test result of 0.0002, which is already significant! If I were working in customer services for this company I would
try and target these two groups to try and lower the churn rate specifically within them. This also gives us a very clear
example of where statistics can be used to be part of the analysis and solution to a very real business problem!
I would imagine other analysis might produce other groups or sub-groups that could be “targeted”.
You can use the “Layers” crosstab option to make SPSS produce crosstabs and chi-squared results within subgroups, so
for example including Gender as Layer 1:
Page 90 of 102
With the Statistics options set to chi-squared, and the cells left to default produces, amongst other output:
Chi-Square Tests
Gender Value df Asymp. Sig. (2-

sided)
Pearson Chi-Square 30.408b 4 .000

Male
Linear-by-Linear Association 26.612 1 .000

Pearson Chi-Square 18.016c 4 .001
Female
Pearson Chi-Square 42.620a 4 .000

Total

a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is
18.08.
b. 0 cells (0.0%) have expected count less than 5. The minimum expected count is
8.95.
c. 0 cells (0.0%) have expected count less than 5. The minimum expected count is
9.13.
Page 91 of 102
Where the chi-squared results are now broken down by Gender, in addition to the overall results.
9.5.2 Chi squared in Excel

Carrying out a chi-squared, referred to in SPSS as a “Pearson Chi-Square” is relatively simple, but using Excel gives a more
flexible approach to what can be tested. In fact you can use Excel quite easily to compare any set of count data to any
set of expected counts, after some assumption is taken into account. Watch out for low cell counts, as the chi-squared
test is only approximate, and the approximation gets worse if the counts are small. No cells should have a count less
than 5.
As an example, there is a slight difference between the numbers of males and females in the sample with 483 males and
517 females. Is this evidence that there are in fact more women than men in the population?
The initial observed count data are:
Gender

Percent
Male 483 48.3 48.3 48.3
Valid Female 517 51.7 51.7 100.0
Total 1000 100.0 100.0
Under the assumption that there is no difference in numbers of men and women the expected number of each would
be of 500, ie half the total sample size. This gives the expected values. Transferring all this to Excel, after some
formatting produces:
Individual Chi-squared
ObservedExpected Contribution
Male 483 500.00 0.58
Female 517 500.00 0.58
1000
p-value= 0.282296653
The p-value is produced using the CHISQ.TEST command (Excel 2007/2010). This takes the observed and expected
values as parameters and produces the p-value result of the test: The probability of getting the observed counts if the
expected counts are in fact the underlying correct counts, given the population proportions, which in this case are ½ per
category.
The p-value of 0.28 (2dp) here is telling us that there is insufficient evidence to reject the null hypothesis that there are
equal proportions of males and females. This is written in the cautious language of statistics: A less accurate but more
relatable version of this would be: The evidence supports the assumption of equal proportions.
And this also shows how easy it is to do all kinds of count comparisons within Excel, using the CHI.TEST function.
Anywhere you have an observed set of counts, and have a theory on what the expected counts should be, from the
population, you can use a chi-squared test to check the assumption, given the sample/observed data.
This kind of test could easily be used to check “interesting” patterns in Likert scale data, for example:
Page 92 of 102
- There are as many “Very dissatisfied” as there are “Very satisfied”
- There are as many “Very dissatisfied” or “Dissatisfied” as there are “Satisfied” or “Very Satisfied”.
Another approach to the Gender question above would be to do a proportions test against a specific value, ie a one
sample T Test for a proportion, with a test value of 0.5. This approach has the advantage of producing a confidence
interval for the proportion, which isn’t available from the chi-squared test. You will often find in statistics that there are
a number of approaches possible. Each will have its advantages and disadvantages, depending on the context.
If it is possible then a test that produces a confidence interval should generally be preferred over a test that doesn’t. For
count data split into two categories a proportions test should be possible, so would be preferred over a chi-squared
test. The CI gives a ranges of values for population proportion, which is extra information not produced from a chi-
squared test. However, if there are multiple categories this would require multiple proportions T Tests, which is to be
avoided where possible, due to family-wise error.
9.5.3 Chi-squared goodness of fit testing

Once you get the basic idea of comparing a set of observed counts against a set of expected counts using a chi-squared
test, then you can expand that out to any situation where such observed and expected counts are available. And one
such extension is where we want to check if a certain set of binned data follows (approximately) a certain distribution,
such as a normal distribution, or any reference distribution!
It is a bit beyond the scope of this document to detail it, but you can use Excel to calculate the numbers of data points
that you would expect to find within each bin, and then compare them in the “usual way” (from above, in last section)
to carry out the test.
It isn’t the most powerful test to use, but it is easy to implement, and relatively easy to understand, and interpret
afterwards.
9.5.3.1 Normal distribution test

It is a common requirement to specifically want to check if your data, or some other part of your analysis, follows a
normal distribution. While the chi-squared test above is a decent starting point, there are better tools when it comes
down to it! Tests for example that don’t require binning of the data. Binned data summarises the data into the bins, and
so some level of detail is lost in the summary. If possible it is better (more accurate) to use the original data. This can be
done in SPSS using the Analyze/Descriptive Statistics/Explore command:
Open the telco.sav sample file.
Move Age in years into the Dependent list. Click on Plots, and select the “Normality plots with test” option. Deselect
Stem-and-leaf, and select Histogram:
Page 93 of 102
Click Continue, OK.
Amongst the output you will get is:
Tests of Normality
Kolmogorov-Smirnova Shapiro-Wilk
Statistic df Sig. Statistic df Sig.
Age in years .070 1000 .000 .977 1000 .000
a. Lilliefors Significance Correction
There are two tests of normality here: The Kolmogorov-Smirnov test and the Shapiro-Wilk’s test. Try Googling
to see what the differences are! The significance for both is 0.000, which indicates that these tests both
Page 94 of 102
conclude that there is p < 0.001 chance of this Age data actually being normally distributed. That would tend
to be fairly conclusive evidence!!
Sometimes these tests are more strict than is needed. Many statistical tests that technically require normality
are in fact not that sensitive to small to moderate deviations from normality! Rather than “blindly” trust
statistical tests alone here, it is generally a good idea to simply view the data as a histogram, and SPSS obliges
with the histogram output:
While the two statistical tests tell us that the data are unlikely to be normally distributed, the chart shows us
why that might be so. The data above look right skewed to me, for example.
A more statistical approach than simply “viewing the data” is the following Q-Q plot:
Page 95 of 102
A Q-Q plot plots the cumulative observed values against the expected cumulative values if the data are in fact
normally distributed. If the data are in fact normally distributed then you end up with the points all following
the straight line! Where the points differ from the straight line it is indicating that these points in particular
are “causing” the non-normality. Sometimes in analysis one then investigates these specific points to ensure
they are correct. Additionally, sometimes one proceeds with one’s analysis with two sets of data: One that has
the outlier (or non-normal) points removed, and another that has the full data set. One then compares the
results to see if the non-normal points are having a particular influence on the results or not. As mentioned
above, many statistical tests that require normality of the data are actually quite robust to departures from
normality, so it is worth conducting the analysis (if feasible) in this split way, and then compare results.
The chart below is basically showing the same info as the one above, except the line has been subtracted from
the data: The data are “detrended”:
Page 96 of 102
The wave shape here is typical of skewed data.
9.6 Relationships: Regression and Correlation: An overview

Statistical tests such as those presented previously here can be used to determine if there is or is not an effect, due to
some factor being changed. Sometimes we might not only want to determine if there is an effect, but what is the nature
of the effect, so that we might be able to make predictions of other outcomes under other conditions. In essence we
want to determine if there is a relationship between some outcome y , and some variable that we have measured x ,
or if several things have been measured then x1 , x2 , ..., xk .
Regression is one of the general techniques used to determine the nature of the relationship in such circumstances. It is
a very powerful technique. So called “Linear regression” is the most common form of regression, where the nature of
the relationship is assumed to be “linear” (no powers or cross products in it). Non-linear regression can also be
accommodated by introducing dummy variables. More on this later.
The simplest form of linear regression attempts to match an equation y  bx  a to a set of sample (x, y) data, where
the slope b and intercept a are determined to optimise the fit of the line to the data. The most common fitting
technique is called “Least Squares” (LS). This technique finds b, a so that the resulting line minimises the squared
distances of the points from the line: Hence the name “least squares”.
Correlation is a technique that measures relationships within data, or in the context of LS it measures how good a fit line
is to the data. It is quite possible to fit a LS line to any set of data, so you then look at correlation (and a decent
diagram!) to check to see whether using the LS line was actually a good idea: Does the line fit the data well?
Page 97 of 102
The y  bx  a model is the simplest form of LS. Where we are interested in more complex relationships then we could
have:
y  bk xk  bk 1 xk 1  ...  b1 x1  a1
Here we are interested in how the factors x1 to xk relate to the y variable. For example we might wonder if end of
year results ( y ) could be predicted by looking at CAO points ( x1 ), age ( x2 ), results from first CA ( x3 ), and so on. Note
that here our data would consist of a y value and an x1 , x2 , x3 set of values, for each case in the sample. This
potentially introduces an additional subscript, eg yi and x1i , x2i , x3i , for each case i . Sometimes there might be more
than one measurement for each x1i , x2i , x3i and this introduces another subject: x1ij , x2ij , x3ij , which refers to the j -th
measurement for the i -th case.
There are various important assumptions that underpin aspects of LS regression, such as:
- The x’s are assumed to have no error, so that these are taken as exact measurements.
- For any given x (or any given x1ij , x2ij , ..., xkij for the multivariate case), it is assumed that the yi are normally
distributed about the LS line25.
- The variance of the y about the LS line is assumed to be constant across the range of x values25.
- The y’s are assumed to be independent, identically distributed25.
Where the yi are not scale variables then they cannot be normally distributed about the LS line, and alternative forms
of LS regression need to employed, such as Logistic Regression for categorical data.
Where we desire to include non-linear terms this can do done via so called “dummy variables”. For examples say we
want to model: y  a  bx  cx 2 , we would introduce the “new” variables x1  x, and x2  x 2 , and now the model
becomes linear in x1 and x2 : y  a  bx1  cx2 .
Once the coefficients from the model are determined they can be subjected to statistical testing, and confidence
intervals for them can be generated.
9.6.1 Residual Analysis

A residual in statistics is the difference between what our model predicts and what actually happened, within the
sample data. Mathematically it is yi  yˆi , where yˆ i is the prediction from the model.
Having generated a model it is important to examine the residuals too, as these can be used to tell if model assumptions
have been met. In fact as noted in footnote 25 it is the residuals that should be used to check model assumptions have
been met.
Details of this are beyond the scope of this document.
9.6.2 Correlation Coefficient

Correlation analysis checks to see how variations in one variable relate to variations in another: Do they change in the
same way, or opposite ways, or is there no pattern at all.
25
Technically it is the residuals that should be independent, identically normally distributed, with constant variance, but as an
overview introduction this will suffice.
Page 98 of 102
A correlation coefficient result ranges from -1 thru to +1, where 0 means complete randomness. A value of -1 means
that as x increases then y decreases. This is call negative correlation. A value of 1 means that as x increases y also
increases. This is call positive correlation.
Correlation coefficients are used in a number of more advanced statistical techniques such as Principal Component
Analysis (PCA) which attempts to find variables that are correlated with each other, within a set of data, so that we
might eliminate some of the variables, and thus work with a smaller set of predictor variables.
Values near to -1 or +1 are referred to as strong correlation, and values nearer to 0 are referred to a weak (or no!)
correlation, depending on the closeness to 0.
It is possible to do a statistical test on a correlation coefficient to test if the value from a sample if likely to be
representative of the population. Additionally it is possible to general a confidence interval for a correlation coefficient.
9.6.3 Multivariate Regression

There can be some confusion over two terms used in relation to regression: Multivariate regression and Multivariable
regression, or just Multiple Regression!!
Multivariate refers to having more than one outcome variable, so called Dependent Variables (DVs). The others refer to
the example from the last section where there is just one DV, but could be multiple Independent Variables (IVs).
Multivariate statistics in general is a fairly complex advanced topic. All I’ll say here is that (a) it is possible, and (b) the
idea is to simultaneously use as much data/evidence as one can. So where there are multiple DVs it might well be
possible to analyse each DV separately using multiple regression, however this is likely to be less statistically capable of
detecting significant effects than a full multivariate analysis would be.
9.6.4 Coefficient of Determination: R 2 and Radjusted

2
The correlation coefficient is relevant to assessing how two variables relate to each other. For more than two variables
the coefficient of determination, R 2 is used. You can think of R 2 as simply the square of the correlation coefficient,
which is what it is for just two variables.
The coefficient of determination measures how much of the variation in the dependent variable (DV) is actually
explained by the model. So R 2  0.6 would mean that 60% of the variation in the DV is explained by the current
independent variable (IV) model. Clearly the higher this is the better.
However, if one adds in enough IVs into a multiple regression model then you will eventually get a model that will
fully/100% match the data, or at least as fully as is possible. In this case you would get R 2  1 (or close to it), but this
2
would be at the expense of a maximised model, with many many IVs!! The statistic Radjusted takes into account the
number of IVs in the model, and so is a more balanced measurement of how good a model is for matching the data. It
still has the same basic meaning of measuring how much of the DV variation is explained by the model, but the answer
is then “adjusted” to penalise adding in too many variables that don’t really add much to the predictive power of the
model.
9.6.5 Generalised Linear Models: GLM

This is a more advanced approach to the regression modelling problems discussed in the last few sections. It can handle
all kinds of variations. Discussion of the GLM would effectively be a full document in its own right!!
Page 99 of 102
10 Reading and Writing Statistical Results
When writing up your own results the best advice I can give here is to follow the format that is normally used in the
publication or company you are writing for! Even before you design your own analysis it is advisable to make yourself
familiar with the target publication, as they might have preferred approaches/methods.
When reading statistics watch out for the p-value: The probability that the null hypothesis is true, given the sample
result/statistic. Pay attention to sample size, since small sample results should generally be given less weight than
results derived from large samples. Notwithstanding this, also pay attention to bias: Ideally samples should be randomly
selected. Deviations from randomness mean that results might be less representative of the population, and so might
deserve less weight/significance. A large biased sample is probably worse than a small unbiased/random sample, since it
could lead to apparently weighty results that in fact could be biased! Typically sample size is written using the letter n,
for example: “A statistically significant difference (t = 5.3, p < 0.001, n1=20, n2=30) was found between the two groups.”
For this format it is typical to quote the actual statistic (t =5.3 here), along with the p-value and sample size. Other
information might also be included, if relevant/useful, such as the sample means and the SE of means for each group.
An alternative to “raw” p-values is to quote a confidence interval for a statistic. This tends to convey more information
than the p-value alone, as it gives the statistical significance, and also a range of values for the likely size of the effect
within the population. For example: “A statistically significant difference (95% CI [-5.2, -1.3], n1=20, n2=30) was found
between the two groups.”
11 Further Study
Watching MythBusters or other TV programmes on the Discovery channel, or elsewhere, you will see applications of
statistics to novel/interesting problems!!
This document attempts to provide a non-mathematical introduction to SPSS and some of the statistics therein. To
further your understanding I would recommend considering taking some of the Royal Statistical Societies professional
examinations: www.rss.org.uk. They conduct examination at the following levels (quoted details copied from the RSS
website Dec 2012):
- Ordinary Certificate: “The Ordinary Certificate is the entry level of the Society's professional examinations. Its
aim is to provide a sound grounding in the principles and practice of statistics, with emphasis on practical data
collection, presentation and interpretation. In terms of level, it is pitched between GCSE and A-level standard in
the English school system, but the nature of the syllabus is very different because of the emphasis on practical
statistical work. It is intended both as a first qualification, an end in itself; and as a basis for further work in
probability and statistics, as for example in the Society's Higher Certificate and Graduate Diploma examinations.
Holders of the Ordinary Certificate should be able to carry out supervised statistical work of a routine kind, or be
able to apply statistical methods, at an elementary level, within work of a more general nature.”
- Higher Certificate: “The Higher Certificate is the intermediate level of the Society's professional examinations. It
is intended both as an end in itself in respect of being a qualification in statistics more advanced than that of our
Ordinary Certificate, and as a basis for further work in statistics up to the highest undergraduate level, as for
example in our Graduate Diploma. It contains some work at the equivalent of A-level in the English school
system, but most of its material is similar to what would be found in the first year of a typical university course
in statistics. Indeed, some of its topics might be in the second year of a university course. It gives a thorough
introduction to statistical theory and inference at this level, stressing the importance of practical applications.”
- Graduate Diploma: “The Graduate Diploma is the highest level of the Royal Statistical Society's professional
examinations. It is of a standard equivalent to that of a good UK Honours Degree in Statistics, giving a thorough
Page 100 of 102

and wide-ranging treatment of theoretical and applied statistics at final-year undergraduate level. It is widely
recognised and respected, nationally and internationally, by employers in the public and private sectors. It is
also recognised by many universities for entry to postgraduate study in statistics.”
These are self study qualifications. Local examinations are organised once per year. Lots of details including syllabus
details, past exam papers and solutions are available on the RSS web site.
Students having completed first year Business Mathematics and Statistics at ITB to a good standard would have a good
foundation for tackling the ordinary certificate, although further study would be required to build sufficiently on the
foundation to pass overall.
12 References
Altman, D. G., Machin, D., Bryant, T. N. & Gardner, M. J., 2005. Statistics with confidence. 2nd edition ed. Bristol:
Arrowsmith.
Chatfield, C., 1983. Statistics for technology: A course in applied statistics. 3rd edition (revised) ed. Boca Raton:
Chapman & Hall/CRC.
Dytham, C., 2011. Choosing and Using Statistics: A Biologist's Guide. 3rd ed. Oxford: Wiley-Blackwell.
Elliott, A. C. & Woodward, W. A., 2007. Statistical Analysis Quick Reference Guidebook: With SPSS Examples. Thousand
Oaks: SAGE Publications.
Field, A., 2009. Discovering Statistics Using SPSS. 3rd ed. London: SAGE Publications.
Lane, D., 2008. Rice Virtual Lab in Statistics. [Online]

Available at: http://onlinestatbook.com/rvls.html
[Accessed 17 November 2012].
Pallant, J., 2010. SPSS Survival Manual: A step by step guide to data analysis using SPSS. 4th ed. Maidenhead: Open
University Press.
Salkind, N. J., 2004. Statistics for People who (think they) Hate Statistics. 2nd edition ed. London: Sage Publications, Inc.
See syllabus doc (student handbook) for additional references.
Page 101 of 102

13 Change History
Date Initials Description of change
20-Nov-2013 CMcG Start of (this) change history, and minor changes on v0.9-10
09-Jan-2014 CMcG Inclusion of missing value stuff.
Page 102 of 102

SPSS BRM

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

SPSS BRM

Uploaded by

Copyright:

Available Formats

Business Research Methods

9.6.4 Coefficient of Determination: R 2 and Radjusted

9.6.5 Generalised Linear Models: GLM.................................................................................................................. 99

Here’s an example from the tutorial:

Various additional options. Hover mouse

Menu path and step details are

Main tutorial details/information appears here,

Click on these buttons to move

Various additional options. Hover mouse over them

All of the help files you can

Good description of the sorts of things you might set out to

1.1 SPSS case studies: A great way to learn!

470 480 490 500 510 520 530

3 Introduction: Business “Reality” versus Experimental Quality

4.1 Coding Survey Data

Value Count Percent

Value Count Percent

Standard Attributes Type Numeric

4.1.2 Coding Open Text Response Items

4.1.3 Coding Multiple Response (MR) Items

Here’s how the sample questions from above could be coded:

= strongly present, 3 = very strongly present.

This approach works best if there aren’t too many themes!

If there are lots of themes then creating a variable for each

The value labels for the servTheme variables in SPSS would

We will return to these samples in section 4.1.7 below.

4.1.4 Measure Types: Nominal, Ordinal, Scale

Measure Type Explanation

4.1.5 Entering Data in SPSS: Variable View

You will be presented with something like:

The fields have the following uses/meanings:

Label Service Speed

Standard Attributes Type Numeric

4.1.6 Coding Missing Values

1. Have you ever owned an iPhone?

4.1.7 Entering Data in SPSS: Data View

4.1.8 Multiple Response Items in SPSS

Rather confusingly SPSS refers in the menu path to

4.1.8.1 Multiple Response: SPSS Variable Sets

1. Move all of the variables

4. When every else is done, click “Add”

2. Decide on what you want to be “counted”, ie included in the

The categories option just allows you to specify a range of

Valid Missing Total

N Percent N Percent N Percent

$MRServiceSeta 889 88.9% 111 11.1% 1000 100.0%

a. Dichotomy group tabulated at value 1.

Multiple lines 475 12.7% 53.4%

Voice mail 304 8.1% 34.2%

Internet 368 9.8% 41.4%

Caller ID 481 12.9% 54.1%

Call waiting 485 13.0% 54.6%

Call forwarding 493 13.2% 55.5%

3-way calling 502 13.4% 56.5%

Electronic billing 371 9.9% 41.7%

a. Dichotomy group tabulated at value 1.

Looking at the Analyze/Descriptive Statistics/Frequencies for multline and voice variables:

Frequency Percent Valid Percent Cumulative

No 525 52.5 52.5 52.5

Valid Yes 475 47.5 47.5 100.0

Total 1000 100.0 100.0

Frequency Percent Valid Percent Cumulative

No 696 69.6 69.6 69.6

Valid Yes 304 30.4 30.4 100.0