Professional Documents
Culture Documents
Class+3 +Univariate+statistics+-+Measures+of+centre,+variability+and+position
Class+3 +Univariate+statistics+-+Measures+of+centre,+variability+and+position
1
Important practicalities
• First seminar video
– Available after this lecture on Canvas – section ‘Panopto’
– Exercises explained step-by-step
– Made by the teaching assistants
– First seminar subdivided in three smaller videos
• Sample dataset
– Putting the data in a dataset in SPSS (= statistical software)
– Anonymizing the data by giving a study ID
4
5
Workshop Basic Math
• Voluntary workshop about basic mathematical skills
• Target groups
– 33 students who failed for the basic math test
– Students with lower test results (< 14) or who are insecure about their math
skills
• No registration needed
TO RECAP
7
Specific
numerical Can add Can
Natural distance or multiply True
order in between subtract or divide zero
values values values values point
Categorical
1. Nominal
2. Ordinal X
Metric
3. Interval X X X
4. Ratio X X X X X
8
Categorical variables Metric variables
Nominal Ordinal Interval Ratio
Absolute frequency X X X X
Relative frequency X X X X
Absolute cumulative frequency X X X
Relative cumulative frequency X X X
Frequency table X X X X
Bar graph X X X X
Pie chart X X X X
Histogram X X
Stem-and-leaf plot X X
9
In this class:
• Measures of centre
– Mean
– Median
– Mode
• Measures of variability
– Range
– Variance and standard deviation
– Variation coefficient
• Measures of position
– Percentiles
– Interquartile distance
– Boxplot, outliers
10
MEASURES OF CENTRE
11
Measures of centre: mean
• Mean = sum of observations divided by the
number of observations
• Statistical notation for the mean of variable y:
12
The summation sign ∑
Summation sign with values Summation sign with observations
13
Measures of centre: mean
• Variable yi: number of children
• Sample size n: 8
• y1=2; y2=3; y3=0; y4=4; y5=2; y6=2; y7=3; y8=1
Values fi pi %
0 1 0,125 12,5
1 1 0,125 12,5
2 3 0,375 37,5
3 2 0,250 25,0
4 1 0,125 12,5
Total 8 1 100
14
Measures of centre: mean
• Variable yi: number of children
• Sample size n: 8
• y1=2; y2=3; y3=0; y4=4; y5=2; y6=2; y7=3; y8=1
15
Measures of centre: mean
• With absolute frequency tables
16
Measures of centre: mean
• Mean with observations
Refers to observations/subjects
Refers to values
First value
17
Measures of centre: mean
• Variable yi: number of children
Values fi pi %
𝑚 𝑚
0 1 0,125 12,5
1
𝑦= ∑ 𝑓 𝑖×𝑦 𝑖 𝑦=∑ 𝑝 𝑖× 𝑦 𝑖 1 1 0,125 12,5
𝑛 𝑖=1 𝑖=1 2 3 0,375 37,5
3 2 0,250 25,0
4 1 0,125 12,5
Total 8 1 100
( 1× 0 ) + ( 1 × 1 ) + ( 3 ×2 ) + ( 2 ×3 ) +(1 × 4)
𝑦= =2,1
8
𝑦 =( 0,125 × 0 ) + ( 0,125 × 1 )+ ( 0,375 ×2 ) + ( 0,250 × 3 ) + ( 0,125 × 4 )=2,1
18
Measures of centre: mean
• Properties of the mean
– Only for metric variables
– Very sensitive to outliers
20
Measures of centre: median
• Median is the observation that falls in the middle of an ordered
sample
• Statistical notation: M
• First: order all observations from low to high (or vice versa)
21
Measures of centre: median
• Calculating the median from frequency tables
22
Measures of centre: median
• Calculating the median from frequency tables
fi fi fi fi
n = 10
1 2 1 2 1 2 1 2
Even sample size 2 3 2 2 2 2 2 1
3 1
(n+1)/2 th observation 3 2 3 1 3 1
4 2
Value of the 5,5th
5 2 4 2 4 3 4 4
observation
n 10 5 2 5 2 5 2
n 10 n 10 n 10
Value of the 5,5th
observation: M = 2,5 M=? M=? M = ?23
Measures of centre: median
• Calculating the median from frequency tables
fi fi fi fi
n = 10
1 2 1 2 1 2 1 2
Even sample size 2 3 2 2 2 2 2 1
3 1
(n+1)/2 th observation 3 2 3 1 3 1
4 2
Value of the 5,5th
5 2 4 2 4 3 4 4
observation
n 10 5 2 5 2 5 2
n 10 n 10 n 10
Value of the 5,5th
observation: M = 2,5 M=3 M = 3,5 M = 424
Measures of centre: median
• Calculating the median from frequency tables
fi fi fi fi
n = 11
1 2 1 2 1 2 1 2
Odd sample size 2 3 2 2 2 4 2 2
3 1
(n+1)/2 th observation 3 2 3 2 3 3
4 3
Value of the 6th
5 2 4 2 4 1 4 2
observation
n 11 5 3 5 2 5 2
n 11 n 11 n 11
Value of the 6th
observation: M=3 M=? M=? M = ?25
Measures of centre: median
• Calculating the median from frequency tables
fi fi fi fi
n = 11
1 2 1 2 1 2 1 2
Odd sample size 2 3 2 2 2 4 2 2
3 1
(n+1)/2 th observation 3 2 3 2 3 3
4 3
Value of the 6th
5 2 4 2 4 1 4 2
observation
n 11 5 3 5 2 5 2
n 11 n 11 n 11
Value of the 6th
observation: M=3 M=3 M=2 M = 326
Measures of centre: median
Variable: In politics people sometimes talk of ‘left’ and
‘right’. Where would you place yourself on a scale, where 0
means the left and 10 means the right?
M=5
(n+1)/2 observation
= (35855+1)/2
= 17928th observation
p = 0.50th observation
(n+1)/2
observation
= (40111+1)/2
= 20056th
observation
p = 0.50th
observation
31
M = 15
32
Measures of centre: median
Variable: To which religion or denomination do you
consider yourself as belonging to?
M = Jewish
35
Measures of centre: mode
Variable: On an average weekday, how much time, in total,
do you spend watching television?
43
Measures of centre: mode
• Properties of the mode
– For metric and categorical variables
– Less informative than mean or median
44
MEASURES OF VARIABILITY
45
Measures of variability
• Variability describes the spread of the data
around a measure of centre
46
Measures of variability:
Range
• Range is the difference between the largest
and smallest values
• For metric variables (not for categorical
variables)
47
Measures of variability: range
Variable: In politics people sometimes talk of ‘left’ and
‘right’. Where would you place yourself on a scale, where 0
means the left and 10 means the right?
Range =
10 – 0 = 10
Range =
114-14 = 100
…
49
Source: ESS – round 7
Measures of variability:
Variance and standard deviation
• Variance: mean squared distances from the sample mean y̅
• For metric variables (not for categorical variables)
• Statistical notation of the variance: S²
𝑛
1
𝑆= 2
∑
𝑛− 1 𝑖=1
(𝑦 𝑖 − 𝑦 )2
50
Measures of variability:
Variance and standard deviation
• Variance: mean squared distances from the sample mean y̅
• For metric variables (not for categorical variables)
• Statistical notation of the variance: S²
Deviation of observation
yi from sample mean y̅
(both positive and
𝑛 negative deviations)
1
𝑆= 2
∑
𝑛− 1 𝑖=1
(𝑦 𝑖 − 𝑦 )2
51
Measures of variability:
Variance and standard deviation
• Variance: mean squared distances from the sample mean y̅
• For metric variables (not for categorical variables)
• Statistical notation of the variance: S²
Deviation of observation
Sum of all deviations yi from sample mean y̅
(both positive and
𝑛 negative deviations)
1
𝑆= 2
∑
𝑛− 1 𝑖=1
(𝑦 𝑖 − 𝑦 )2
52
Measures of variability:
Variance and standard deviation
• Variance: mean squared distances from the sample mean y̅
• For metric variables (not for categorical variables)
• Statistical notation of the variance: S²
Deviation of observation
Sum of all deviations yi from sample mean y̅
(both positive and
𝑛 negative deviations)
1
𝑆= 2
∑
𝑛− 1 𝑖=1
(𝑦 𝑖 − 𝑦 )2
Square of the
deviations
(otherwise the sum
equals 0)
53
Measures of variability:
Variance and standard deviation
• Variance: mean squared distances from the sample mean y̅
• For metric variables (not for categorical variables)
• Statistical notation of the variance: S²
𝑛
1
𝑆= 2
∑
𝑛− 1 𝑖=1
(𝑦 𝑖 − 𝑦 )2
“Sum of squares”
54
Measures of variability:
Variance and standard deviation
• Variance: mean squared distances from the sample mean y̅
• For metric variables (not for categorical variables)
• Statistical notation of the variance: S²
𝑛
1
𝑆= 2
∑
𝑛− 1 𝑖=1
(𝑦 𝑖 − 𝑦 )2
√
𝑛
1
𝑆= ∑
𝑛− 1 𝑖=1
(𝑦 𝑖 − 𝑦 )
2
57
Measures of variability:
Variance and standard deviation
• Example: number of children yi (yi - y̅) (yi - y̅)²
2
3
√
0
𝑛 4
1
∑
2
2
𝑆= (𝑦 𝑖 − 𝑦) 2
𝑛− 1 𝑖=1 3
1
Sum
Mean
The standard deviation in five steps:
n
STEP 1. Calculate the mean
n-1
s²
s
58
Measures of variability:
Variance and standard deviation
• Example: number of children yi (yi - y̅) (yi - y̅)²
2 -0,125
3 0,875
√
0 -2,125
𝑛 4 1,875
1
∑
2 -0,125
2
𝑆= (𝑦 𝑖 − 𝑦) 2 -0,125
𝑛− 1 𝑖=1 3
1
0,875
-1,125
Sum 17 0
Mean 2,125
The standard deviation in five steps:
n
STEP 2. Calculate the deviations from
n-1
the sample mean by subtracting for
s²
each observation the value from the
s
mean
59
Measures of variability:
Variance and standard deviation
• Example: number of children yi (yi - y̅) (yi - y̅)²
2 -0,125 0,015625
3 0,875 0,765625
√
0 -2,125 4,515625
𝑛 4 1,875 3,515625
1
∑
2 -0,125 0,015625
2
𝑆= (𝑦 𝑖 − 𝑦) 2 -0,125 0,015625
𝑛− 1 𝑖=1 3
1
0,875
-1,125
0,765625
1,265625
Sum 17 0 10,875
Mean 2,125
The standard deviation in five steps:
n
STEP 3. Square all the deviations and
n-1
sum it = the sum of squares
s²
s
60
Measures of variability:
Variance and standard deviation
• Example: number of children yi (yi - y̅) (yi - y̅)²
2 -0,125 0,015625
3 0,875 0,765625
√
0 -2,125 4,515625
𝑛 4 1,875 3,515625
1
∑
2 -0,125 0,015625
2
𝑆= (𝑦 𝑖 − 𝑦) 2 -0,125 0,015625
𝑛− 1 𝑖=1 3
1
0,875
-1,125
0,765625
1,265625
Sum 17 0 10,875
Mean 2,125
The standard deviation in five steps:
n 8
STEP 4. Divide the sum of squares by 7
n-1
n-1 to get the variance 1,553571
s²
s
61
Measures of variability:
Variance and standard deviation
• Example: number of children yi (yi - y̅) (yi - y̅)²
2 -0,125 0,015625
3 0,875 0,765625
√
0 -2,125 4,515625
𝑛 4 1,875 3,515625
1
∑
2 -0,125 0,015625
2
𝑆= (𝑦 𝑖 − 𝑦) 2 -0,125 0,015625
𝑛− 1 𝑖=1 3
1
0,875
-1,125
0,765625
1,265625
Sum 17 0 10,875
Mean 2,125
The standard deviation in five steps:
n 8
STEP 5. Take the square root of the 7
n-1
variance 1,553571
s²
s 1,246423
62
Measures of variability:
Variance and standard deviation
• Alternative way of calculating variance and standard devation
work with the absolute frequency per value
𝑚
1
𝑆=
2
∑
𝑛− 1 𝑖=1
2
(𝑦 𝑖 − 𝑦 ) × 𝑓 𝑖
√
𝑚
1
𝑆= ∑
𝑛−1 𝑖=1
2
(𝑦 𝑖 − 𝑦) × 𝑓 𝑖
63
Measures of variability:
Variance and standard deviation
• Interpretation of standard deviation
– Standard/typical distance of the observations from the sample mean
– Represents the variability about the mean
– The larger the standard deviation s, the greater the variability
– The smaller the standard deviation s, the smaller the variability
64
The beauty of variability
Variable: In politics people sometimes talk of ‘left’ and ‘right’.
Where would you place yourself on a scale, where 0 means
the left and 10 means the right? – BELGIAN SAMPLE DATA
65
Source: ESS – BELGIUM
Measures of variability:
Variation coefficient
• Variation coefficient
– The ratio of the standard deviation S to the mean
= Relative standard deviation
– Statistical notation of the variation coefficient: V
– Only for metric variables (not for categorical)
– Often expressed as an percentage
– Used to compare the variability between groups or variables
𝑆
𝑉=
𝑦
66
Measures of variability:
Variation coefficient
• Example: number of children
67
68
MEASURES OF POSITION
69
Measures of position
• Point at which certain percentage of data fall
below (or above).
• Give insight in the centre and/or variability of
data
70
Measures of position:
Percentiles
• The pth percentile is the point such that p% of the
observations fall below or at that point and (100-p)% fall
above it
• Important percentiles
– Median = 50% percentile (p=50) = Q2
– Lower quartile = 25% percentile (p=25) = Q1
– Upper quartile = 75% percentile (p=75) = Q3
71
Measures of position: percentiles
Variable: In politics people sometimes talk of ‘left’ and
‘right’. Where would you place yourself on a scale, where 0
means the left and 10 means the right?
Q1 = 4
Q2 = M = 5
Q3 = 7
74
Measures of position:
Interquartile range
• Interquartile range (IQR): difference between
the upper and lower quartiles
• For metric variables (not for categorical
variables)
• Measures the variability of the middle half of
the observations
• The larger the IQR, the greater the variability
• Also used to detect outliers (supra)
75
Measures of position: IQR
Variable: In politics people sometimes talk of ‘left’ and
‘right’. Where would you place yourself on a scale, where 0
means the left and 10 means the right?
Q1 = 4
Q2 = M = 5
Q3 = 7
IQR = 7 – 4 = 3
77
Measures of position: Box plots
• Example: age of the respondent
81
Measures of position: Box plots
• Example: age of the respondent
Maximum = 104
(outliers excluded) Outlier
IQR = 64 – 34 = 30
1,5 times 30 = 45
Outliers Minimum = 14
• Values > 109 (= 64+45) (outliers excluded)
• Values < - 11 (= 34-45)
Source: ESS – round 7 82
83
Measures of position: Box plots
• Example: basic math scores
Maximum = 20
(outliers excluded)
IQR = 18 – 13 = 5 Minimum = 6
1,5 times 5 = 7,5 (outliers excluded)
Outliers
• Values > 25,5 (= 18+7,5) Outlier
• Values < 5,5 (= 13-7,5)
84
Measures of position: Box plots
• Example: number of household members
87
Exercises on class 3.
• Exercise 3c on p. 10
• Exercise 4c on p. 11
• Exercise 5c-d on p. 12
• Exercise 6e-g on p. 13-14
• Exercise 7c-d on p. 15
• Exercise 8c-e on p. 16
• Exercise 10a-f + h on p. 18
• Exercise 11a-f on p. 19
• Exercise 12a-b on p. 20
• Exercise 14a-b on p. 22
• Exercise 15a-e on p. 23
• Exercise 16a-c on p. 24
• Exercise 17a-d on p. 25
88
Next week
Univariate, descriptive statistics: distribution of the data
Contact:
pieter-paul.verhaeghe@vub.be
89