Professional Documents
Culture Documents
Measures of Dispersion V4
Measures of Dispersion V4
STATISTICS IN EDUCATION
Lecture 4
Understanding Data Via Descriptive
Analysis
• Two sets of descriptive measures:
– Measures of central tendency: used to
report a single piece of information that
describes the most typical response to a
question
– Measures of variability: used to reveal the
typical difference between the values in a
set of values
Understanding Data Via Descriptive
Analysis
• Measures of Central Tendency:
– Mode: the value in a string of numbers that
occurs most often
– Median: the value whose occurrence lies in
the middle of a set of ordered values
– Mean: sometimes referred to as the
“arithmetic mean”; the average value
characterizing a set of numbers
3
Understanding Data Via Descriptive
Analysis
• Measures of Variability:
– Frequency distribution reveals the number (percent) of
occurrences of each number or set of numbers
– Range identifies the maximum and minimum values in a
set of numbers
– Standard deviation indicates the degree of variation in a
way that can be translated into a bell-shaped
curve distribution
4
Understanding Data Via Descriptive
Analysis
• Measures of Variability:
5
Content
• Measures of dispersion
– Range
– Inter quartile range
– Variance
– Standard deviation
Measures of dispersion
Student A 77 78 79 80 81 82 83
Student B 47 69 73 80 96 97 98
Measures of dispersion
Student A 77 78 79 80 81 82 83
Student B 47 69 73 80 96 97 98
Student A 77 78 79 80 81 82 83
Student B 47 69 73 80 96 97 98
Student A 77 78 79 80 81 82 83
Student B 47 69 73 80 96 97 98
Student A 77 78 79 80 81 82 83
Student B 47 69 73 80 96 97 98
• Variability
–Tell us how far scores spread out
–Tells us how the degree to which
scores deviate from the central
tendency
How are these different?
Mean = 10 Mean = 10
Measure of Variability
X X
2
i
Variance i 1
Mean
N 1
N
X i X
2
Standard Deviation i 1
N 1
The Range
• Quartile: the value that divides a data set arranged in order into four parts
containing the same number of data
• First Quartile (Q1): a value where up to 1/4 of the total data has a value that
is lower than its own value
• Second Quartile (Q2) or median: a value where up to 1/2 of the total data has
a value that is lower than its own value
• Third Quartile (Q3): a value where up to 3/4 of the total data has a value that
is lower than its own value
• Example: 33, 40, 45, 47, 50, 52, 60, 66, 70, 76, 82
Q1=45
Q2=52 Q3=70
Variability: IQR
• Interquartile Range
– = P75 – P25 or Q3 – Q1
– This helps to get a range that is not influenced by
the extreme high and low scores
– Where the range is the spread across 100% of the
scores, the IQR is the spread across the middle
50%
Interquartile Range
• Measure of the difference between the third quartule and the first
quartile ia a data set:
• Interquartile range = third quartile – first quartile
= Q 3 – Q1
• Example:
– Find the interquartile range of the following data set:
33, 50, 45, 47, 40, 52, 82, 66, 70, 76, 60
Arrange data in ascending order:
33, 40, 45, 47, 50, 52, 60, 66, 70, 76, 82
Q1=45 Q3=70
Q3=22 24
Q1= 17 18 2
2 = 23
= 17.5
Interquartile range = Q3 – Q1
= 23 – 17.5
= 5.5
Variability: SIQR
• Semi-interquartile range
– =(P75 – P25)/2 or (Q3 – Q1)/2
– IQR/2
– This is the spread of the middle 25% of the data
– The average distance of Q1 and Q3 from the
median
– Better for skewed data
Variability: SIQR
• Semi-Interquartile range
Q1 Q2 Q3 Q1 Q2 Q3
Variance and standard deviation
• Variance = sum of squares of the deviations of each data from the mean
number of data
or 2
= 2
X
2
X
2 i
s
N 1
• When calculated for the entire population
X
2
X
2 i
N
Standard Deviation
• Variance is in squared units
• What about regular old units
• Standard Deviation = Square root of the variance
X X
2
s
i
N 1
Standard Deviation
X X 2
X X
2
E
i 2 i
E
N 1 N 1
Variance and standard deviation of
population
Variance and standard deviation of
sample
=
Variance and standard deviation
Example:
Find the variance and standard deviation of the following
data set
26, 28, 30, 35, 38, 23
Solution:
X =
X
N
X (X- X ) (X- X )2
=
• = 180
26 26- 30 = -4 16
158
6
28 28- 30 = -2 4 =
30 30- 30 = 0 0 5
= 30 35 35- 30 = 5 25
= 31.6
38 38- 30 = 8 64
23
180
23- 30 = -7 49
158
= √31.6
= 5.62
Variance and standard deviation
1 ( X ) 2
S2 X 2
1
• Ex.: Find the variance and standard deviation of the following
26, 28, 30, 35, 38, 23
• Solution:
X X2
26 676
1 ( X ) 2
S2 X 2
28 784 1
30 900
1 180 2
= 5558
35 1225
6 1 6
38 1444
= 31.6
23 529
s = √31.6
TOTAL = 180 TOTAL = 5558 = 5.62
Variance and standard deviation
• Self test:
– Find the variance and standard deviation of the following:
15, 28, 33, 47, 56
• Solution: 1 ( X ) 2
S
2 2
X
X X2 1
15 225
1 179 2
7443
28 784 5 1 5
33 1089 = 258.7
47 2209
S = √258.7
56 3136
=16.08
Total =179 Total =7443
Variance and standard deviation
• Analyze
• Descriptive Statistics
• Descriptives
• Options
• Variance
• Std. Deviation
• Continue
• OK
Variance and standard deviation
• Self test:
– Calculate the variance and standard deviation of the following
data set using SPSS:
345, 232, 422, 341, 330, 472, 356, 436
• Solution:
2
= 75.30 (2 t.p.)
= 5669.36 (2 t.p.)
Variance and standard deviation
• Example:
• In a road poll, the speed of 40 cars were recorded as
follows (km/hr)
56 72 63 81 73 53 57 69 70 89
63 68 72 77 82 85 59 74 69 76
73 62 80 70 60 71 65 73 64 69
72 65 77 68 78 72 67 75 66 79
• Find the variance and standard deviation of the car
speed using SPSS
Range, IQR, Variance and standard deviation
• Question:
– Which of the measures of dispersion is suitable to
represent a data set?
Characteristics of range, inter quartile range,
variance and standard deviation
• Range
– Advantages:
• Easy to understand and calculate
• Suitable for providing a rough picture of dispersion in a
short time
– Disadvantages:
• Its calculation only involves values at both ends of the
data set, ignoring the remaining data
• Its value is influenced by the presence extreme values
in a data set
• Thus, range may give a less than accurate picture of
dispersion for the data set
Characteristics of range, inter quartile range,
variance and standard deviation
• Range
– Example:
• Scores obtained by students in a Mathematics test:
• 52, 54, 56, 60, 61, 64, 65, 92
• Calculate the mean and range of marks
– Solution:
• Mean = 504/8 = 63
• Range = 92 – 52 = 40
Characteristics of range, inter quartile range,
variance and standard deviation
• Range
• Scores obtained by students in a Mathematics test:
• 52, 54, 56, 60, 61, 64, 65, 92
• Mean = 504/8 = 63
• Range = 92 – 52 = 40
– Question:
• Can the range give an accurate picture of the dispersion of
scores?
– Answer:
• No, the range of 40 shows a large dispersion.
• Most of the scores are around 52 to 65 and are mainly around
the mean
• Only one score is too high (92) and is placed very far from the
mean
Characteristics of range, inter quartile range,
variance and standard deviation
• Inter quartile range
– Advantages:
• Its value is not affected by extreme values in a data set
as it is the range for 50% of the value of data between
the first quartile and the third quartile
• Thus, the inter quartile range can be used even if there
are extreme values in the data set
– Disadvantages:
• Does not measure dispersion of each data from the
mean of a data set
Characteristics of range, inter quartile range,
variance and standard deviation
• Inter quartile range
– Example:
• Two groups of teachers made the following donations
for Warriors Day:
• Group A:
• RM1, RM2, RM2, RM3, RM4, RM5, RM5, RM6, RM6,
RM6, RM8, RM30
• Group B:
• RM1, RM2, RM5, RM5, RM6, RM6, RM6, RM6, RM8,
RM8, RM10, RM15
• Find the mean, range and inter quartile range for
each group.
Characteristics of range, inter quartile range,
variance and standard deviation
• Inter quartile range
• Group A:
– RM1, RM2, RM2, RM3, RM4, RM5, RM5, RM6, RM6, RM6, RM8, RM30
• Group B:
– RM1, RM2, RM5, RM5, RM6, RM6, RM6, RM6, RM8, RM8, RM10, RM15
Group A Group B
Mean RM78/12=RM6.50 RM78/12=RM6.50
Range RM30-RM1=RM29 RM15-RM1=RM14
Inter quartile range RM6-RM2.50=RM3.50 RM8-RM5=RM3
Ch 15 50
When to Use a Particular Statistic
Research Question
• How can quiz scores of students enrolled in an
introductory statistics class be summarised
using measures of central tendency?
Measures of dispersion.
• Template:
How can [variable] be summarised using
measures of central tendency? Measures of
dispersion.
SPSS Output
Summarising results
• As shown in Table 3.5, scores ranged from 9 to
20.
• The mean was 15.56 approximate, median
was 17.00 and the mode was 17.00.
• Thus, the scores tended to lump together at
the high end of the scale.
• A negatively skewed distribution is suggested
given that the mean was less than the median
and mode
Summarising results
• The range was 11, the interquartile range was
5.0, variance was 10.01 and standard deviation
was 3.16.
• For example, the middle 50% of the scores had a
range of 5 (interquartile range) indicating that
there was a reasonable spread of scores around
the median.
• Thus, despite a high ”average” score, there were
some low performing students as well.
• These results are consistent with those described
using the graphical representation.
Results in APA format
As shown in Table 3.5, scores ranged from 9 to 20. The
mean was 15.56, approximate median was 17.00 and
the mode was 17.00. Thus, the scores tended to lump
together at the high end of the scale. A negatively
skewed distribution is suggested given that the mean
was less than the median and mode. The exclusive
range was 11, the interquartile range was 5.0, variance
was 10.01 and standard deviation was 3.16. From
example, the middle 50% of the scores had a range of
5(interquartile range) indicating that there was a
reasonable spread of scores around the median. Thus,
despite a high ”average” score, there were some low
performing students as well. These results are
consistent with those described using the graphical
representation.