Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 89

Class 3.

Univariate, descriptive statistics:


Measures of centre, variability and
position

Prof. Pieter-Paul Verhaeghe

1
Important practicalities
• First seminar video
– Available after this lecture on Canvas – section ‘Panopto’
– Exercises explained step-by-step
– Made by the teaching assistants
– First seminar subdivided in three smaller videos

• First Q&A session with the teaching assistants


– On Canvas (section ‘BigBlueButton/Conferences’)
– Thursday morning the 20th of October
– Short sessions of 30 minutes
– Four groups – see Canvas (section ‘People’) to see your group
2
Basic Math Test
• Thank you for completing the test and survey!

• Population: students in the course (N=402)

• Sample: students who did the test (n=325)


– 269 students did the test this year
– Additional 56 students did the test only last year, bur re-enrolled
– Not a random sample  possible bias?
– Response rate: 80,8%

• Sample  dataset
– Putting the data in a dataset in SPSS (= statistical software)
– Anonymizing the data by giving a study ID
4
5
Workshop Basic Math
• Voluntary workshop about basic mathematical skills

• Target groups
– 33 students who failed for the basic math test
– Students with lower test results (< 14) or who are insecure about their math
skills

• Given by Anna Marconato from Study Guidance

• Thursday the 13th of October from 10h30 until 12h00

• Location: room G.1.53 (on campus)

• No registration needed
TO RECAP

7
Specific
numerical Can add Can
Natural distance or multiply True
order in between subtract or divide zero
values values values values point
Categorical
1. Nominal
2. Ordinal X

Metric
3. Interval X X X
4. Ratio X X X X X

8
Categorical variables Metric variables
Nominal Ordinal Interval Ratio
Absolute frequency X X X X
Relative frequency X X X X
Absolute cumulative frequency X X X
Relative cumulative frequency X X X
Frequency table X X X X
Bar graph X X X X
Pie chart X X X X
Histogram X X
Stem-and-leaf plot X X

9
In this class:
• Measures of centre
– Mean
– Median
– Mode

• Measures of variability
– Range
– Variance and standard deviation
– Variation coefficient

• Measures of position
– Percentiles
– Interquartile distance
– Boxplot, outliers
10
MEASURES OF CENTRE

11
Measures of centre: mean
• Mean = sum of observations divided by the
number of observations
• Statistical notation for the mean of variable y:

12
The summation sign ∑
Summation sign with values Summation sign with observations

m refers to values n refers to observations

Example with absolute Example with the mean


frequencies

13
Measures of centre: mean
• Variable yi: number of children
• Sample size n: 8
• y1=2; y2=3; y3=0; y4=4; y5=2; y6=2; y7=3; y8=1
Values fi pi %
0 1 0,125 12,5
1 1 0,125 12,5
2 3 0,375 37,5
3 2 0,250 25,0
4 1 0,125 12,5
Total 8 1 100
14
Measures of centre: mean
• Variable yi: number of children
• Sample size n: 8
• y1=2; y2=3; y3=0; y4=4; y5=2; y6=2; y7=3; y8=1

15
Measures of centre: mean
• With absolute frequency tables

• With relative frequency tables


(could be less precise because of rounding)

16
Measures of centre: mean
• Mean with observations
Refers to observations/subjects

• Mean with absolute frequencies First observation

Refers to values

First value

17
Measures of centre: mean
• Variable yi: number of children
Values fi pi %
𝑚 𝑚
0 1 0,125 12,5
1
𝑦= ∑ 𝑓 𝑖×𝑦 𝑖 𝑦=∑ 𝑝 𝑖× 𝑦 𝑖 1 1 0,125 12,5
𝑛 𝑖=1 𝑖=1 2 3 0,375 37,5
3 2 0,250 25,0
4 1 0,125 12,5
Total 8 1 100
( 1× 0 ) + ( 1 × 1 ) + ( 3 ×2 ) + ( 2 ×3 ) +(1 × 4)
𝑦= =2,1
8
𝑦 =( 0,125 × 0 ) + ( 0,125 × 1 )+ ( 0,375 ×2 ) + ( 0,250 × 3 ) + ( 0,125 × 4 )=2,1

18
Measures of centre: mean
• Properties of the mean
– Only for metric variables
– Very sensitive to outliers

• Example of sensitivity to outliers


• 2, 1, 2, 3, 4  Mean: 2,4
• 6, 1, 2, 3, 4  Mean: 3,2
• 15, 1, 2, 3, 4  Mean: 5,0

• Real world examples of sensitivity to outliers?


19
Measures of centre: median
• Median is the observation that falls in the middle of an
ordered sample
• Statistical notation: M
• First: order all observations from low to high (or vice
versa)
• n = odd  M = observation in the middle
e.g 10, 10, 11, 12, 13 (n = 5; M=11)
• n = even  M= midpoint of 2 middle observations
e.g 10, 10, 11, 12, 12, 13 (n = 6; M=11,5)

20
Measures of centre: median
• Median is the observation that falls in the middle of an ordered
sample
• Statistical notation: M

• First: order all observations from low to high (or vice versa)

• n = odd  M = value of the observation in the middle


e.g 10, 10, 11, 12, 13 (n = 5; M=11)  value of (n+1)/2 observation
• n = even  M= value of the midpoint of 2 middle observations
e.g 10, 10, 11, 12, 12, 13 (n = 6; M=11,5)  value of (n+1)/2 observation

21
Measures of centre: median
• Calculating the median from frequency tables

– (Cumulative) absolute frequencies


M = value of (n+1)/2 th observation

– (Cumulative) relative frequencies


M = value of p=0,50 observation
Could be less precise because of rounding

22
Measures of centre: median
• Calculating the median from frequency tables

fi fi fi fi
n = 10
1 2 1 2 1 2 1 2
Even sample size 2 3 2 2 2 2 2 1
3 1
(n+1)/2 th observation 3 2 3 1 3 1
4 2
 Value of the 5,5th
5 2 4 2 4 3 4 4
observation
n 10 5 2 5 2 5 2
n 10 n 10 n 10
Value of the 5,5th
observation: M = 2,5 M=? M=? M = ?23
Measures of centre: median
• Calculating the median from frequency tables

fi fi fi fi
n = 10
1 2 1 2 1 2 1 2
Even sample size 2 3 2 2 2 2 2 1
3 1
(n+1)/2 th observation 3 2 3 1 3 1
4 2
 Value of the 5,5th
5 2 4 2 4 3 4 4
observation
n 10 5 2 5 2 5 2
n 10 n 10 n 10
Value of the 5,5th
observation: M = 2,5 M=3 M = 3,5 M = 424
Measures of centre: median
• Calculating the median from frequency tables

fi fi fi fi
n = 11
1 2 1 2 1 2 1 2
Odd sample size 2 3 2 2 2 4 2 2
3 1
(n+1)/2 th observation 3 2 3 2 3 3
4 3
 Value of the 6th
5 2 4 2 4 1 4 2
observation
n 11 5 3 5 2 5 2
n 11 n 11 n 11
Value of the 6th
observation: M=3 M=? M=? M = ?25
Measures of centre: median
• Calculating the median from frequency tables

fi fi fi fi
n = 11
1 2 1 2 1 2 1 2
Odd sample size 2 3 2 2 2 4 2 2
3 1
(n+1)/2 th observation 3 2 3 2 3 3
4 3
 Value of the 6th
5 2 4 2 4 1 4 2
observation
n 11 5 3 5 2 5 2
n 11 n 11 n 11
Value of the 6th
observation: M=3 M=3 M=2 M = 326
Measures of centre: median
Variable: In politics people sometimes talk of ‘left’ and
‘right’. Where would you place yourself on a scale, where 0
means the left and 10 means the right?

Source: ESS – round 7 27


Measures of centre: median
Variable: In politics people sometimes talk of ‘left’ and
‘right’. Where would you place yourself on a scale, where 0
means the left and 10 means the right?

M=5

(n+1)/2 observation
= (35855+1)/2
= 17928th observation

p = 0.50th observation

Source: ESS – round 7 28


Measures of centre: median
Variable: On an average weekday, how much time, in total,
do you spend watching television?

Source: ESS – round 7 29


Measures of centre: median
Variable: On an average weekday, how much time, in total,
do you spend watching television?
M = more
than 1,5
hours up to 2
hours

(n+1)/2
observation
= (40111+1)/2
= 20056th
observation

p = 0.50th
observation

Source: ESS – round 7 30


Measures of centre: median
• Properties of the median
– For metric and ordinal variables (not for nominal
variables)
– Not sensitive to outliers

31
M = 15

32
Measures of centre: median
Variable: To which religion or denomination do you
consider yourself as belonging to?

Source: ESS – round 7 33


Measures of centre: median
Variable: To which religion or denomination do you
consider yourself as belonging to?

M = Jewish

Source: ESS – round 7 34


Measures of centre: mode
• Mode is the value that occurs most frequently
= the value with the highest absolute or relative
frequency

• Consult the frequency table

35
Measures of centre: mode
Variable: On an average weekday, how much time, in total,
do you spend watching television?

Source: ESS – round 7 36


Measures of centre: mode
Variable: On an average weekday, how much time, in total,
do you spend watching television?

Source: ESS – round 7 37


Measures of centre: mode
Variable: In politics people sometimes talk of ‘left’ and
‘right’. Where would you place yourself on a scale, where 0
means the left and 10 means the right?

Source: ESS – round 7 38


Measures of centre: mode
Variable: In politics people sometimes talk of ‘left’ and
‘right’. Where would you place yourself on a scale, where 0
means the left and 10 means the right?

Source: ESS – round 7 39


Measures of centre: mode
Variable: To which religion or denomination do you
consider yourself as belonging to?

Source: ESS – round 7 40


Measures of centre: mode
Variable: To which religion or denomination do you
consider yourself as belonging to?

Source: ESS – round 7 41


42
Mode = 15

43
Measures of centre: mode
• Properties of the mode
– For metric and categorical variables
– Less informative than mean or median

44
MEASURES OF VARIABILITY

45
Measures of variability
• Variability describes the spread of the data
around a measure of centre

• The beauty of variability!!!

46
Measures of variability:
Range
• Range is the difference between the largest
and smallest values
• For metric variables (not for categorical
variables)

47
Measures of variability: range
Variable: In politics people sometimes talk of ‘left’ and
‘right’. Where would you place yourself on a scale, where 0
means the left and 10 means the right?

Range =
10 – 0 = 10

Source: ESS – round 7 48


Measures of variability: range
Variable: What’s your age?

Range =
114-14 = 100

49
Source: ESS – round 7
Measures of variability:
Variance and standard deviation
• Variance: mean squared distances from the sample mean y̅
• For metric variables (not for categorical variables)
• Statistical notation of the variance: S²

𝑛
1
𝑆= 2

𝑛− 1 𝑖=1
(𝑦 𝑖 − 𝑦 )2

50
Measures of variability:
Variance and standard deviation
• Variance: mean squared distances from the sample mean y̅
• For metric variables (not for categorical variables)
• Statistical notation of the variance: S²
Deviation of observation
yi from sample mean y̅
(both positive and
𝑛 negative deviations)
1
𝑆= 2

𝑛− 1 𝑖=1
(𝑦 𝑖 − 𝑦 )2

51
Measures of variability:
Variance and standard deviation
• Variance: mean squared distances from the sample mean y̅
• For metric variables (not for categorical variables)
• Statistical notation of the variance: S²
Deviation of observation
Sum of all deviations yi from sample mean y̅
(both positive and
𝑛 negative deviations)
1
𝑆= 2

𝑛− 1 𝑖=1
(𝑦 𝑖 − 𝑦 )2

52
Measures of variability:
Variance and standard deviation
• Variance: mean squared distances from the sample mean y̅
• For metric variables (not for categorical variables)
• Statistical notation of the variance: S²
Deviation of observation
Sum of all deviations yi from sample mean y̅
(both positive and
𝑛 negative deviations)
1
𝑆= 2

𝑛− 1 𝑖=1
(𝑦 𝑖 − 𝑦 )2
Square of the
deviations
(otherwise the sum
equals 0)

53
Measures of variability:
Variance and standard deviation
• Variance: mean squared distances from the sample mean y̅
• For metric variables (not for categorical variables)
• Statistical notation of the variance: S²

𝑛
1
𝑆= 2

𝑛− 1 𝑖=1
(𝑦 𝑖 − 𝑦 )2

“Sum of squares”

54
Measures of variability:
Variance and standard deviation
• Variance: mean squared distances from the sample mean y̅
• For metric variables (not for categorical variables)
• Statistical notation of the variance: S²

𝑛
1
𝑆= 2

𝑛− 1 𝑖=1
(𝑦 𝑖 − 𝑦 )2

Division by sample size n – 1 in order to


approximately get the average of the
squared deviations from the sample mean 55
Measures of variability:
Variance and standard deviation
• Populations: division by N
𝑁
1
𝜎 = ∑ ( 𝑦 𝑖− 𝑦 )
2 2
𝑁 𝑖 =1
• Samples: division by n-1
𝑛
1
𝑆= 2

𝑛− 1 𝑖=1
(𝑦 𝑖 − 𝑦 )2

• This adaptation is called Bessel’s correction (see extra on


Canvas) 56
Measures of variability:
Variance and standard deviation
• Variance: based on the squared deviations
 Units of measurement: the squares of those of the original data
 Difficult to interpret
 Square root of the variance = standard deviation

• Statistical notation of standard deviation: S


𝑛
1
𝑆= ∑
𝑛− 1 𝑖=1
(𝑦 𝑖 − 𝑦 )
2

57
Measures of variability:
Variance and standard deviation
• Example: number of children yi (yi - y̅) (yi - y̅)²
2
3


0
𝑛 4
1

2
2
𝑆= (𝑦 𝑖 − 𝑦) 2

𝑛− 1 𝑖=1 3
1
Sum
Mean
The standard deviation in five steps:
n
STEP 1. Calculate the mean
n-1

s
58
Measures of variability:
Variance and standard deviation
• Example: number of children yi (yi - y̅) (yi - y̅)²
2 -0,125
3 0,875


0 -2,125
𝑛 4 1,875
1

2 -0,125
2
𝑆= (𝑦 𝑖 − 𝑦) 2 -0,125

𝑛− 1 𝑖=1 3
1
0,875
-1,125
Sum 17 0
Mean 2,125
The standard deviation in five steps:
n
STEP 2. Calculate the deviations from
n-1
the sample mean by subtracting for

each observation the value from the
s
mean
59
Measures of variability:
Variance and standard deviation
• Example: number of children yi (yi - y̅) (yi - y̅)²
2 -0,125 0,015625
3 0,875 0,765625


0 -2,125 4,515625
𝑛 4 1,875 3,515625
1

2 -0,125 0,015625
2
𝑆= (𝑦 𝑖 − 𝑦) 2 -0,125 0,015625

𝑛− 1 𝑖=1 3
1
0,875
-1,125
0,765625
1,265625
Sum 17 0 10,875
Mean 2,125
The standard deviation in five steps:
n
STEP 3. Square all the deviations and
n-1
sum it = the sum of squares

s
60
Measures of variability:
Variance and standard deviation
• Example: number of children yi (yi - y̅) (yi - y̅)²
2 -0,125 0,015625
3 0,875 0,765625


0 -2,125 4,515625
𝑛 4 1,875 3,515625
1

2 -0,125 0,015625
2
𝑆= (𝑦 𝑖 − 𝑦) 2 -0,125 0,015625

𝑛− 1 𝑖=1 3
1
0,875
-1,125
0,765625
1,265625
Sum 17 0 10,875
Mean 2,125
The standard deviation in five steps:
n 8
STEP 4. Divide the sum of squares by 7
n-1
n-1 to get the variance 1,553571

s
61
Measures of variability:
Variance and standard deviation
• Example: number of children yi (yi - y̅) (yi - y̅)²
2 -0,125 0,015625
3 0,875 0,765625


0 -2,125 4,515625
𝑛 4 1,875 3,515625
1

2 -0,125 0,015625
2
𝑆= (𝑦 𝑖 − 𝑦) 2 -0,125 0,015625

𝑛− 1 𝑖=1 3
1
0,875
-1,125
0,765625
1,265625
Sum 17 0 10,875
Mean 2,125
The standard deviation in five steps:
n 8
STEP 5. Take the square root of the 7
n-1
variance 1,553571

s 1,246423

62
Measures of variability:
Variance and standard deviation
• Alternative way of calculating variance and standard devation
 work with the absolute frequency per value

𝑚
1
𝑆=
2

𝑛− 1 𝑖=1
2
(𝑦 𝑖 − 𝑦 ) × 𝑓 𝑖


𝑚
1
𝑆= ∑
𝑛−1 𝑖=1
2
(𝑦 𝑖 − 𝑦) × 𝑓 𝑖
63
Measures of variability:
Variance and standard deviation
• Interpretation of standard deviation
– Standard/typical distance of the observations from the sample mean
– Represents the variability about the mean
– The larger the standard deviation s, the greater the variability
– The smaller the standard deviation s, the smaller the variability

64
The beauty of variability
Variable: In politics people sometimes talk of ‘left’ and ‘right’.
Where would you place yourself on a scale, where 0 means
the left and 10 means the right? – BELGIAN SAMPLE DATA

 Year 2002 2010 2018


n 1633 1643 1690
4,83 4,98 4,96
M 5 5 5
Mode 5 5 5
S² 4,185 3,766 3,936
S 2,046 1,941 1,984

65
Source: ESS – BELGIUM
Measures of variability:
Variation coefficient
• Variation coefficient
– The ratio of the standard deviation S to the mean
= Relative standard deviation
– Statistical notation of the variation coefficient: V
– Only for metric variables (not for categorical)
– Often expressed as an percentage
– Used to compare the variability between groups or variables

𝑆
𝑉=
𝑦
66
Measures of variability:
Variation coefficient
• Example: number of children

67
68
MEASURES OF POSITION

69
Measures of position
• Point at which certain percentage of data fall
below (or above).
• Give insight in the centre and/or variability of
data

70
Measures of position:
Percentiles
• The pth percentile is the point such that p% of the
observations fall below or at that point and (100-p)% fall
above it

• For metric and ordinal variables (not for nominal


variables)

• Important percentiles
– Median = 50% percentile (p=50) = Q2
– Lower quartile = 25% percentile (p=25) = Q1
– Upper quartile = 75% percentile (p=75) = Q3
71
Measures of position: percentiles
Variable: In politics people sometimes talk of ‘left’ and
‘right’. Where would you place yourself on a scale, where 0
means the left and 10 means the right?

Source: ESS – round 7 72


Measures of position: percentiles
Variable: In politics people sometimes talk of ‘left’ and
‘right’. Where would you place yourself on a scale, where 0
means the left and 10 means the right?

Q1 = 4

Q2 = M = 5

Q3 = 7

Source: ESS – round 7 73


Measures of position:
Percentiles
• Median, lower and upper quartiles split the
data in four equal parts in the case of
continuous variables (not always with discrete
variables)

74
Measures of position:
Interquartile range
• Interquartile range (IQR): difference between
the upper and lower quartiles
• For metric variables (not for categorical
variables)
• Measures the variability of the middle half of
the observations
• The larger the IQR, the greater the variability
• Also used to detect outliers (supra)

75
Measures of position: IQR
Variable: In politics people sometimes talk of ‘left’ and
‘right’. Where would you place yourself on a scale, where 0
means the left and 10 means the right?

Q1 = 4

Q2 = M = 5

Q3 = 7

IQR = 7 – 4 = 3

Source: ESS – round 7 76


Measures of position:
Box plots
• Box plot: graphical summary based on five
numbers, which shows both the center and
variability of the observations
– 100% percentile = maximum (except for outliers)
– 75% percentile = upper quartile = Q3
– 50% percentile = median = Q2
– 25% percentile = lower quartile Q1
– 0% percentile = minimum = (except for outliers)

77
Measures of position: Box plots
• Example: age of the respondent

Source: ESS – round 7 78


Measures of position: Box plots
• Example: age of the respondent

Source: ESS – round 7 79


Measures of position: Box plots
• Example: age of the respondent

Source: ESS – round 7 80


Measures of position:
Outliers
• Outliers: Observations which fall more than
1.5 IQR above the upper quartile and more
than 1.5 IQR below the lower quartile

• They are separately marked in box plots

• Outliers are sometimes substantially


interesting

81
Measures of position: Box plots
• Example: age of the respondent

Maximum = 104
(outliers excluded) Outlier

IQR = 64 – 34 = 30
1,5 times 30 = 45
Outliers Minimum = 14
• Values > 109 (= 64+45) (outliers excluded)
• Values < - 11 (= 34-45)
Source: ESS – round 7 82
83
Measures of position: Box plots
• Example: basic math scores
Maximum = 20
(outliers excluded)

IQR = 18 – 13 = 5 Minimum = 6
1,5 times 5 = 7,5 (outliers excluded)
Outliers
• Values > 25,5 (= 18+7,5) Outlier
• Values < 5,5 (= 13-7,5)
84
Measures of position: Box plots
• Example: number of household members

Source: ESS – round 7 85


Measures of position: Box plots
• Example: number of household members

Source: ESS – round 7 86


Categorical variables Metric variables
Nominal Ordinal Interval Ratio
Measures of center
Mean X X
Median X X X
Mode X X X X
Measures of variability
Range X X
Variance X X
Standard deviation X X
Variation coefficient X X
Measures of position
Percentiles X X X
Interquartile distances X X
Boxplot X X
Outliers X X

87
Exercises on class 3.
• Exercise 3c on p. 10
• Exercise 4c on p. 11
• Exercise 5c-d on p. 12
• Exercise 6e-g on p. 13-14
• Exercise 7c-d on p. 15
• Exercise 8c-e on p. 16
• Exercise 10a-f + h on p. 18
• Exercise 11a-f on p. 19
• Exercise 12a-b on p. 20
• Exercise 14a-b on p. 22
• Exercise 15a-e on p. 23
• Exercise 16a-c on p. 24
• Exercise 17a-d on p. 25
88
Next week
Univariate, descriptive statistics: distribution of the data

Contact:
pieter-paul.verhaeghe@vub.be

89

You might also like