Lecture 3 (Handout)

You might also like

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 76

Descriptive Statistics

Introduction to Study Skills & Research Methods (HL10040)

Dr James Betts FACSM


J.Betts@bath.ac.uk
@DrBSteamjets
Lecture Outline:
•Statistics
•Variables
•Levels of Measurement
•Measures of Central Tendency
•Distribution
•Variability.
Statistics
Descriptive Inferential

Organising,
Correlational Generalising
summarising &
describing data
Relationships

Significance
Measured Variables
Measurement:

“Assigning numbers to objects, events, or abstract


concepts according to a known set of rules”

(Stevens 1946)

This permits data to be categorised, quantified and/or analysed


in order that meaningful conclusions can be drawn.
Measured Variables
The data gathered during research are known
as variables:
• Organismic
– Any physiological, psychological or performance
characteristics of an organism

• Environmental
– Any characteristics relating to the organism’s
environment.
Levels of Measurement
1. Nominal Scale Lowest Level

2. Ordinal Scale

3. Interval Scale

4. Ratio Scale Highest Level


Nominal Scale
• A measure of identity or category
• Useful for quantifying qualitative data
• Provides no information regarding either order
or magnitude

Note: Categories must be


both mutually exclusive
and comprehensive.
Ordinal Scale
• A measure of order or rank
• Used to arrange data into series
• Provides no information regarding magnitude
e.g.
?
?
Interval Scale
• A measure of order and quantity
• Difference between values can be calculated
• Cannot establish ‘x-fold’ increase

Antarctica Bath Sahara Desert


-30 oC 40 oC 10 oC 48 oC 58 oC
(-22 oF) (50 oF) (136 oF)

Kelvin???
Ratio Scale
• Highest level of measurement
• An interval scale with an absolute zero point
• Subsumes all lower levels of measurement
GB =
USA = 38.07 s
38.08 s Ordinal = Gold
Silver Interval = 38.07 s
+.01 s Ratio = 100%
+.03%
Nigeria =
38.23 s
Bronze
+.16 s
+.42%
Quick Test
• Nominal, Ordinal, Interval or Ratio?

– Blood lactate concentration (mmol.l-1)


– Profile of Mood States (scale 1-7)
– Heart Rate (beats.min-1)
– Blood Group
– Bench Press 1RM (kg)
– Year of Birth (AD)
– Atmospheric Pressure (mmHg)
Système Internationale (SI) Units
Seven ‘constant’* base units using the metric system
Variable Unit Symbol Accepted Derivations
Distance metre m ha for area; º for angle;
#
l or L for volume
Mass* kilogram kg t
Time second s min, h, d (not year)
Temperature kelvin K#
Mole mole mol l or L for volume#
Current ampere A#
Luminance candela cd
Units always lower-case#, neither italicised nor
pleuralised (i.e. kg not KGS) and with space
between value and unit (inc. % but exc. º).
Discrete and Continuous Variables
• Discrete Variables:
– can be described using a specific and distinct point
on a scale
– cannot be sub-divided any further
e.g.
Gender = Male or Female

RPE = 6-20?

Heart Rate.
Discrete and Continuous Variables
• Continuous Variables:
– Can theoretically take any value between two points
on a continuum
– Are dependent on the accuracy of measuring tools
e.g.
Time = yr wk d h min s ms s ns ps…

IQ?
Indicators of Central Tendency
• Mode
– Most Frequently Occurring Score

• Median
– Middle Score

• Mean
– Arithmetic Average, etc.
Indicators of Central Tendency
Mode = 15 k.y-1

Annual
Salary: 10k 11k 11k 15k 15k 15k 19k 20k 21k 21k 22k 22k 24k 25k

Advantages
•Quick and easy to compute
•Unaffected by extreme scores
•Can be used at any level of measurement.
Indicators of Central Tendency
Mode = 15 k.y-1

Annual
Salary: 10k 11k 11k 15k 15k 15k 19k 20k 21k 21k 22k 22k 24k 25k

Disadvantages
•Terminal Statistic
• A given sub-group could make
this measure unrepresentative.
Indicators of Central Tendency
Median

Annual
Salary: 10k 11k 11k 15k 15k 15k 19k 20k 21k 21k 22k 22k 24k 25k

50th Percentile = n + 1
2
Indicators of Central Tendency
Median = 19.5 k.y-1

Annual
Salary: 10k 11k 11k 15k 15k 15k 19k 20k 21k 21k 22k 22k 24k 25k

For an even number of scores take


the mean of the middle two:
(19 + 20)
2 = 19.5
Indicators of Central Tendency
Median = 19.5 k.y-1

Annual
Salary: 10k 11k 11k 15k 15k 15k 19k 20k 21k 21k 22k 22k 24k 25k

Advantages
•Unaffected by extreme scores
•Can be used at all levels above nominal.
Indicators of Central Tendency
Median = 19.5 k.y-1

Annual
Salary: 10k 11k 11k 15k 15k 15k 19k 20k 21k 21k 22k 22k 24k 25k

Disadvantages
•Only considers order- value ignored.
Indicators of Central Tendency
Mean
Douglas Bag drove 100 miles from Bath to
London at 100 miles∙h-1 but was caught in
traffic on his return journey, so was limited
to 50 miles∙h-1.

What was his average speed for the entire journey?


Indicators of Central Tendency
-Arithmetic Mean
-Harmonic Mean
-Geometric Mean
also.. -f mean
-Truncated mean
-Power mean
-Weighted arithmetic mean
-Chisini mean
-Identric mean, etc, etc…
Indicators of Central Tendency
Mean

Annual
Salary: 10k 11k 11k 15k 15k 15k 19k 20k 21k 21k 22k 22k 24k 25k

∑X
X= n
Indicators of Central Tendency
Mean = 17.9 k.y-1

Annual
Salary: 10k 11k 11k 15k 15k 15k 19k 20k 21k 21k 22k 22k 24k 25k

251
(10+11+11+15+15+15+19+20+21+21+22+22+24+25)
X= 14
Indicators of Central Tendency
Mean = 17.9 k.y-1

Annual
Salary: 10k 11k 11k 15k 15k 15k 19k 20k 21k 21k 22k 22k 24k 25k

Advantages
•Very sensitive measure
•Takes into account all the available information
•Can be combined with means of other groups to give the overall mean.
Indicators of Central Tendency
Mean = 17.9 k.y-1

Annual
Salary: 10k 11k 11k 15k 15k 15k 19k 20k 21k 21k 22k 22k 24k 25k

Disadvantages
•Very sensitive measure
•Can only be used on interval or ratio data
•Can only be used when scores are symmetrical above and below X.
Distribution
• Often displayed graphically, where:

– X axis = measured variable


– Y axis = frequency.
160
Normal Distribution
140

120
Number of People

100

80

60

40

20

1500 2500 3500 4500 5500

Energy Intake (calories per day)


160
Para-Normal Distribution?
140

120
Number of People

100

80

60

40

20

1500 2500 3500 4500 5500

Energy Intake (calories per day)


AKA
160
Normal Distribution -Bell Shaped
140
-
Gaussian.
120
Number of People

100
…but first described
80
mathematically by
Abraham De Moivre Carl Friedrich Gauss
in 1733…
60 Applied ND in 1809 to
…published 1924! establish the diameter
40 of lunar features

20

1500 2500 3500 4500 5500

Energy Intake (calories per day)


Normal Distribution
Characteristics of ND Curve:
•Naturally Occurring
e.g.
Biological/Physiological
Anthropometric
Social/Economic
Psychological
Errors
Normal Distribution
Characteristics of ND Curve:
•Naturally Occurring
•Asymptotic (Theoretically)
•Symmetrical
AND…
160
Normal Distribution
140

Mode
120
Number of People

100
Median

80
Mean

60

40

20

1500 2500 3500 4500 5500

Energy Intake (calories per day)


160
Normal Distribution
140

120 Point of
Number of People

100
Inflection

80

60
68.26%
40

20

1500 2500 3500 4500 5500

Energy Intake (calories per day)


160 Normal Distribution
140

Z = standard score Therefore,


120
for comparison: Average =
Number of People

100 Raw score 3500 SD =


versus 1000
80
Group
34.13% 34.13%
60

40

20 2.15% 2.15%
13.59% 13.59%
0

1500 2500 3500 4500 5500

Energy Intake (calories per day)


160
Normal Distribution
140
So, if: Therefore,
120
Average =
Number of People

Raw score = 4500


100 3500 SD =
Z = +1 1000
80 Study of SD size
34.13% 34.13% = ‘Kurtosis’
60
This one is
40 Mesokurtic
20 2.15% 2.15%
13.59% 13.59%
0

1500 2500 3500 4500 5500

Energy Intake (calories per day)


160
Normal Distribution
140
So, if: Therefore,
120
Average =
Number of People

Raw score = 4500


100 3500 SD =
Z = +2 500
80
Leptokurtic
60 68.26%
40

20

1500 2500 3500 4500 5500

Energy Intake (calories per day)


160 Normal Distribution
So, if: Therefore,
140
Average =
120 Raw score = 4500 3500 SD =
Number of People

2000
100 Z = +0.5
Platykurtic
80

60 68.26%
40

20

1500 2500 3500 4500 5500

Energy Intake (calories per day)


160
Non-Normal Distribution
Mode
140
Negative Skew
120
Median
Number of People

100
Mean
80

60

40

20

1500 2500 3500 4500 5500

Energy Intake (calories per day)


160
Non-Normal Distribution
140
Mode
Positive Skew
120
Number of People

Median
100
Mean
80

60

40

20

1500 2500 3500 4500 5500

Energy Intake (calories per day)


Quick Test
• Do you think that most people are of
above/below ‘average’ intelligence?
(IQ)

• Do you think that most people are


above/below ‘average’ drivers?
(crashes.y-1)
Quick Test
• Do you think that most people are of
above/below ‘average’ intelligence?
(IQ)

• Do you think that most people are


above/below ‘average’ drivers?
(crashes.y-1)
Why is Distribution Important?
• Determines which measure of central tendency to
use
• Determines which measure of variability to use
• Provides ‘Z-score’ for standardised comparisons
• Determines further statistical analysis
– Parametric (assumes ND, I/R LOM, random sample-more powerful)
– Non-parametric (simply calculated, distribution free-less powerful).
Objective assessment of Normality
• In reality, most variables follow a distribution
that is not entirely normal
• Therefore we need to establish what we
consider to be ‘normal’
• This is achieved through assessing the
difference between the mean and the median
Normal Non-normal
Variability
• Standard Deviation (SD)
• Standard Error of the Mean (SEM)
• Range
• Inter Quartile Range (IQR)
• Normalised Confidence Intervals (nCI).
Hypothetical Investigation
“The effect of 1 week placebo
supplementation on press-up
performance”
•8 randomly sampled participants
•Performed as many press-ups as possible
•Supplemented with placebo tablets for 1 week
•Performed as many press-ups as possible
200

180

160
Number of Press-Ups

140

120

100

80

60

40

20

1 2
Week
140 Mean  SD
Week 1 Overall
120 Week 2

100
Number of Press-Ups

80
34.13% SD represents the
spread of our data
60
around the mean
Note:
34.13%
40 Requires ND

20

0
160
Standard Error of the Mean (SEM)
140
But does the mean we
120 calculated reflect the
Number of People

target population???
100

80

60 68.26%
40

20

1500 2500 3500 4500 5500


0 60 120
Number
Energy Intakeof(calories
Press-upsper day)
160
Standard Error of the Mean (SEM)
140
Mean of
120
Target Pop.
Number of People

(unknown)
100

80
SD of
60 68.26% Target Pop.
(unknown)
40

20

1500 2500 3500 4500 5500


?????????????????????????????????????????????????????
Number
Energy Intakeof(calories
Press-upsper day)
Standard Error of the Mean (SEM)
But this is unknown so we have to make
an estimate using the SD for our sample

SD of Target Pop.
SEM =
n
Standard Error of the Mean (SEM)

SD of our sample.
SEM =
n
So the smaller our SD & the bigger our n,
the smaller our SEM will be…
140 Mean  SEM
Week 1
120 Week 2

100
Number of Press-Ups

Overall
80
We can be
68% certain SEM represents
that the mean how accurate our
60 of the target mean is
pop. lies
40 within this
distance of our i.e. a measure of
mean sampling error
20

0
Median & Range or IQR
• The mean  SD or SEM cannot be used for
non-normally distributed data
• Instead the median is often plotted along with
either the range or the IQR
Raw Data: 8, 10, 15, 20, 55, 75, 120, 179
e.g.
Median (Range) = 37.5 (8-179)
effect of extreme scores?
Median & Range or IQR
• The mean  SD or SEM cannot be used for
non-normally distributed data
• Instead the median is often plotted along with
either the range or the IQR

IQR removes extreme scores


Difference between the 25th & 75th percentile
(n.b. median = the 50th percentile).
Median & Range or IQR
• The mean  SD or SEM cannot be used for
non-normally distributed data
• Instead the median is often plotted along with
either the range or the IQR
Raw Data: 8, 10, 15, 20, 55, 75, 120, 179

e.g.
Median (IQR) = 37.5 (13.75-108.75)
Refer to the web page for notes on calculation of median and IQR.
200

180

160
Number of Press-Ups

140

120 SD
100
SEM Between
80 subject
Range variance
60

40 IQR
20

1 2
Week
200

180

160
Number of Press-Ups

140

120

100 Within
80 subject
60 variance
40

20

1 2
Week
Normalised Confidence Intervals

MSE
nCI = [criterion t (df)]
N
Put the data from each trial
into SPSS…

Week 2
Week 1
Select ‘Analyse’, ‘GLM’,
‘Repeated Measures’…
Move
variables …to here, click ok
from
here…
ANOVA Output
Tests of Within-Subjects Effects

Measure: MEASURE_1
Type III Sum
Source of Squares df Mean Square F Sig.
WEEK Sphericity Assumed 60.062 1 60.062 49.830 .000
Greenhouse-Geisser 60.062 1.000 60.062 49.830 .000
Huynh-Feldt 60.062 1.000 60.062 49.830 .000
Lower-bound 60.062 1.000 60.062 49.830 .000
Error(WEEK) Sphericity Assumed 8.438 7 1.205
Greenhouse-Geisser 8.438 7.000 1.205
Huynh-Feldt 8.438 7.000 1.205
Lower-bound 8.438 7.000 1.205

MSE
nCI = [criterion t (df)]
n
ANOVA Output
Tests of Within-Subjects Effects

Measure: MEASURE_1
Type III Sum
Source of Squares df Mean Square F Sig.
WEEK Sphericity Assumed 60.062 1 60.062 49.830 .000
Greenhouse-Geisser 60.062 1.000 60.062 49.830 .000
Huynh-Feldt 60.062 1.000 60.062 49.830 .000
Lower-bound 60.062 1.000 60.062 49.830 .000
Error(WEEK) Sphericity Assumed 8.438 7 1.205
Greenhouse-Geisser 8.438 7.000 1.205
Huynh-Feldt 8.438 7.000 1.205
Lower-bound 8.438 7.000 1.205

MSE
1.205
nCI = [criterion t (df)]
7
n8
t – Distribution Table (Google Search)
Degrees of Freedom Critical t-ratio

1 12.71
2 4.30
3 3.18
4 2.78
5 2.57
df= n-1 6 2.45
7 2.37
8 2.31
MSE
1.205
nCI = X 2.37 = 0.92
N8
140
Mean  nCI
Week 1
120 Week 2

100
Number of Press-Ups

80

60

40

20

0
70 Mean  nCI
Week 1
Week 2 Overall

65
nCI represent the difference
Number of Press-Ups

between means

60

55

50
Selected Reading
• Hopkins W. G. (2000) A New View of Statistics [Online]
Auckland: Internet Society for Sport Science. Available at:
www.newstats.org [accessed October 24th 2006]

• Masson, M. E. J. and G. R. Loftus. Using confidence intervals


for graphically based data interpretation. Canadian Journal of
Experimental Psychology. 57:203-220, 2003.
Accuracy/Precision of Reporting
‘Troublesome decimals’ by Kordi et al. (2011) Scand. J. Med. Sci. Sport
-Use the maximum number of available d.p. throughout data analysis
(i.e. rounding should only be applied to the final reported results)

Means, medians, SD, range of CI should not generally be reported with


a greater number of d.p. than were measured in the raw data

Exceptions:
-Mean can be +1 d.p. if >100 raw scores averaged
-SD can be +1 d.p. for every additional whole unit in the mean
-Median/Range necessarily +1 d.p. if averaging the middle 2 raw
scores
-P-values should be reported as the first 1 or 2 non-zero values
The Shooting Room

•Player one enters room & pair of dice are rolled


•If double six and he is shot
•If not player one leaves and nine new players enter
•If double six they are shot
•If not they leave and ninety new players enter, etc.
How worried should you be if selected to play?
Bartha & Hitchcock (1999) The Shooting-Room Paradox
and Conditionalizing on Measurably Challenged Sets.
Synthese 118 (3): p. 403-437
J.Betts@bath.ac.uk

You might also like