Steve Saffhill

Research Methods in Sport & Exercise

Difference Testing
Aims and Objectives
Introduce how to test for significance
differences in two groups of data
Detail basic principles of difference testing
Introduce parametric and non-parametric
How to interpret SPSS outputs
Parametric Assumptions - Reminder
1. The data must be randomly sampled

2. The data must be high level data (interval/ratio not

nominal or ordinal)

3. The data must be normally distributed……

- normal curve on histograms
- z scores between 1.96 & + 1.96

4. The data must be of equal variance

 These assumptions are of progressive importance.

 If you do not meet #1 then use Non-parametric

inferential tests

 Some can be violated but you must justify doing so

with supporting evidence! (Vincent, 2005)

 # 4 is the least important

Basic Principles – Different Tests
• The statistical tests enables us to evaluate the effect of an
independent variable on a dependent variable.

• IV = the presumed cause of the effect being researched.

 The researcher controls the IV (i.e., differing levels of
athletic ability)

• DV = those that can be explained by the effects of the IV

 What is actually being measured (Performance changes -
that the researcher cannot control)!!!!
 PARAMETRIC = t-test
• Use for both repeated measurements (Paired t test) (i.e.
measurements carried out on the same subjects)
• independent measurements (independent t test) (i.e.
measurements carried out on two different groups of subjects).

• Wilcoxon – Repeated measures/Paired equivalent
• Mann-Whitney-U – independent t-test equivalent
• The statistical test is used to determine if the two levels of treatment differ
significantly (p < .05) so that their difference would not be attributable to a chance
occurrence more than 5 times in 100.
• The statistical test is always of the null hypothesis.
• All that statistics can do is reject or fail to reject the null hypothesis.
• Statistics cannot accept the research do!
• Only logical reasoning, good experimental design, and appropriate theorising can
do so.
• Statistics can determine only whether the groups are different, not why they are

• You the researcher say why!

Experimental Research Designs
• Within-Participants Design = repeated measures test on same

• Allows you to control for inter-individual confounding variables

• If you use different groups there is a chance of some variable

other than your IV that distinguishes between your group!

• In this design you have same people in all conditions so there is

less extraneous variation between conditions!

• Fewer participants too!!

• Between-Participants Design = different groups in
each condition of the IV
• Each group is less likely to get bored, tired etc
• Less susceptible to practice effects/results bias

• Needs more participants

• Need different participants in each group
• So lose some control over confounding influences!
Independent t - test
• The most frequently used t test determines whether two
sample means differ reliably from each other.

• In this test the samples are independent of one another

– also referred to as a between comparison.

• E.g., male v female scores in anxiety (IV/DV?)

• E.g., football v boxing VO2max (IV/DV?)
Types of t test


• In an experiment of training intensity and distance run in 12

minutes the results are as follows:

 mean distance run in 12 min after 70% training intensity = 3004 m

 after 40% training intensity = 2456 m.

 Can you identify the IV and DV??

Types of t test

 IV = training intensity (70% vs 40%)

 DV = distance run
Types of t test

• The question that statistics has to answer is:

“Is the difference in the two mean scores significant or is it one

that could have occurred by chance given the inherent
variability of groups produced by random sampling?”
Using SPSS to carry out the analysis gives the following result:
t (2.8) = 13.81, p <.03

• The t is basically a ratio between a measure of the between group

variance and within group variance

• The larger the variance between the groups compared with the
variance within the groups = larger t value

DV goes into dependent

list in SPSS and IV goes into
FACTOR list!

Then you have to define

your groups (i.e., tell SPSS
who is who in what group)
What is the probability of obtaining that t value by
• The larger t is, then the more likely there is a TRUE difference between the
groups that is theoretically caused by our independent variable

• Each t value comes with its own associated probability level and this is where
the p value comes from

• p = .03

• Yes – there is a significant difference in distance between the two groups

• 70% intensity group ran reliably further than the 40% intensity of training

• There is a significant difference between the two mean scores!


• All comparison-between-groups techniques assume that the

variances between the groups are equivalent.
 Although mild violations of this assumption do not present major
problems, serious violations are more likely if group sizes are not
approximately equal.

• Most computer programs allow unequal group sizes. However,

the homogeneity assumption should be checked if group sizes
are very different or even when variances are very different
(automatically covered by SPSS) - Levene’s equality of variance
Independent T-test on SPSS
Group Statistics

Std. Error
LEV EL N Mean Std. Deviation Mean
CS1 senior 94 4.3616 .88860 .09165
junior 101 4.1074 1.16502 .11592

Independe nt Sam ple s Test

Levene's Test f or
Equality of Varianc es t-test f or Equality of Means
95% Conf idence
Interval of the
Mean Std. Error Diff erence
F Sig. t df Sig. (2-tailed) Diff erence Diff erence Low er Upper
CS1 Equal variances
8.311 .004 1.704 193 .090 .2542 .14919 -.04009 .54843
Equal variances
1.720 185.961 .087 .2542 .14778 -.03737 .54571
not ass umed

If less than 0.05 we can say

that there is a significant
difference in the variance of If this is the case we say the
the two sets of scores!!! variance is not assumed and
use the bottom value here
Dependent t Test – also called a repeated measures or
Paired t Test

• This means that the two groups of scores are related in

some manner.
• one group of subjects is tested twice on the same variable,
and the experimenter is interested in the change between
the two tests
• Hence repeated measures design
There is no IV as such and
hence both variables go
into dependent list in SPSS
(i.e., nothing goes into
factor list)
Example: Effects of visualisation on pain
– Condition 1 = imagine performing an exciting t-test whilst
plunging hands into ice cold water
– Condition 2 = imagine being on a beach drinking beer
whilst plunging hands into ice cold water

IV = Condition DV = time hand immersed

• Similar formula to independent t-tests, however it is a bit

more sensitive as it takes into consideration that we are using
the same participants in both conditions

• We couldn’t have all do C1 first as they would never return

for C2

• It might also lead to order effects!

 Learning, practice etc

• ½ do C1 and ½ do C2 first and then swap

SPSS Output
Paired Sam ples Statistics

Std. Error
Mean N Std. Deviation Mean
Pair CS1 4.2219 269 1.11603 .06805
1 CS2 4.3379 269 .98975 .06035

Paired Sam ples Correlations

N Correlation Sig.
Pair 1 CS1 & CS2 269 .523 .000

Paired Sam ples Te st

Paired Diff erences

95% Conf idence
Interval of the
Std. Error Diff erence
Mean Std. Deviation Mean Low er Upper t df Sig. (2-tailed)
Pair 1 CS1 - CS2 -.1161 1.03399 .06304 -.2402 .0081 -1.841 268 .067

The difference between

mean of C1 and C2
Issues of Significance

• Differences in pain between the two conditions were not

statistically significant (p = 0.67)

• Remember p must be < 0.05 to be statistically

– There is no significant difference (p = 0.67)
– This only reflects a tendency
– Power issue??

• Tendency accepted as p<0.1

Non-Parametric Difference Tests

• Wilcoxon - 2 groups – within groups/repeated

measures – Paired t-test

• Mann Whitney U- 2 groups – between groups –

independent t-test
Mann Whitney U

• Do males and females differ on their emphasis on importance

of body image?

• Hypothesis = males and females will differ on their

emphasis of importance of body image

• Imagine the data were not randomly sampled/high

level, and/or not Normally distributed
2 output boxes appear:


Similar function to
t in t-test
High = better
p value
>0.05 = no sig diff
• Differences between imagery rating scores from memory and
after watching video playback

• Hypothesis = there will be a significant difference between

imagery rating from memory and after watch the video of the

• IV = presence/absence of video (operationalised by asking

subjects to rate likeness to actual performance
• DV = 1-7 scale
2 tables appear:

Similar function to
t in t-test
High = better

This is the p value!

>0.05 = NOT significant
• Generally speaking parametric tests are

• However, they are not always possible!

 It depends on YOUR data
Meaningful versus statistical
• Meaningful significance cannot be determined by statistics.

• It is a decision made by the researcher.

• Statistical significance is not the same as meaningful


• A small effect of altering a surgical method may emerge as

statistically significant but may be unimportant when we
measure surgical survival
One tailed or two tailed tests
• This topic is concerned with directional or non-
directional hypotheses.
• If the hypothesis is non-directional, e.g. performance of
Group A will be different to Group B following an
intervention then we must chose a two-tailed statistical
• If a hypothesis is directional, e.g. Group A will score
significantly higher than group B following the
intervention then we must chose a one-tailed statistical

• a directional hypothesis is a more powerful test.

One or two tailed
• This is concerned with Directional vs. non-directional hypotheses.

• If the hypothesis is non-directional (e.g. performance of Group A will be

different to Group B following an intervention)
• two-tailed statistical test.

• If a hypothesis is directional (e.g. Group A will score significantly higher than

group B following the intervention)
• one-tailed statistical test.

• In a directional hypothesis, not only do you say there will be a difference, but
also what that difference will be
• E.g. women have better ultra endurance than men
What are the implications of one and two tailed hypotheses?

• SPSS USUALLY assumes we conduct two tailed research so the

p value it produces is for a two tailed test! So we do nothing!!!

• However, if our hypothesis is one-tailed we must change the p

value SPSS has given us into a one tailed value..OR click on 1-
tailed on SPSS if it has it!

This p value becomes 0.0335!!!

• We simply half it!!!
Paired Sam ples Te st

Paired Diff erences

95% Conf idence
Interval of the
Std. Error Diff erence
Mean Std. Deviation Mean Low er Upper t df Sig. (2-tailed)
Pair 1 CS1 - CS2 -.1161 1.03399 .06304 -.2402 .0081 -1.841 268 .067

Notice there was no significant difference and now there is!!!!

‘Why not always conduct one-tailed tests if it is more likely to
demonstrate significance?’

• The answer lies in the fact that you have to declare what you are
going to do before conducting the study (hypothesis).

(Remember - your study should be rooted in theory: you must have

an idea of what should happen)

• If we have conducted a one-tailed test and the result goes in the

opposite direction to that predicted, no matter how extreme, then
you cannot claim this as significant.
Why not always carry out two-tailed tests?

• For example, if our theory concerns the effects of stimulants on motor

performance, then stimulants generally speed up motor reactions.

• In which case it makes no sense to predict that tasks performed with

a stimulant will be performed significantly faster or slower than those
performed without it.

• In this case the theory dictates a directional test and hence a one-
tailed test of the hypothesis.

Think about your research first and select a

statistical test that fits your design/the
What if more than 2 groups?
• Often you will want to conduct a test to see if there are
differences between more than two groups/conditions

• VO2 max in football, rugby and hockey?

• Motivation in Year 1, Year 2, Year 3?


• More advanced stats - we will cover this next week!

 Check parametric assumptions to use correct test

 Allow to test significance of the IV on changing DV

 T-tests used to test data from TWO groups

 Can run either paired or independent sampled t-test

 For non-parametric data use either:

 Wilcoxon – paired groups
 Mann Whitney U – independent groups

 Can have 1- or 2-tailed significance, it depends on your hypothesis!

