Download as pdf or txt
Download as pdf or txt
You are on page 1of 59

PQX 7001

STATISTICS IN EDUCATION
Lecture 4
Understanding Data Via Descriptive
Analysis
• Two sets of descriptive measures:
– Measures of central tendency: used to
report a single piece of information that
describes the most typical response to a
question
– Measures of variability: used to reveal the
typical difference between the values in a
set of values
Understanding Data Via Descriptive
Analysis
• Measures of Central Tendency:
– Mode: the value in a string of numbers that
occurs most often
– Median: the value whose occurrence lies in
the middle of a set of ordered values
– Mean: sometimes referred to as the
“arithmetic mean”; the average value
characterizing a set of numbers

3
Understanding Data Via Descriptive
Analysis
• Measures of Variability:
– Frequency distribution reveals the number (percent) of
occurrences of each number or set of numbers
– Range identifies the maximum and minimum values in a
set of numbers
– Standard deviation indicates the degree of variation in a
way that can be translated into a bell-shaped
curve distribution

4
Understanding Data Via Descriptive
Analysis
• Measures of Variability:

5
Content
• Measures of dispersion
– Range
– Inter quartile range
– Variance
– Standard deviation
Measures of dispersion

 Observe the June test report for two students in the


following table:
 What are the similarities in achievement between student A
and student B?
 What are the differences in achievement between student A
and student B?
 Discuss their performance based on the report

Student A 77 78 79 80 81 82 83

Student B 47 69 73 80 96 97 98
Measures of dispersion

Student A 77 78 79 80 81 82 83

Student B 47 69 73 80 96 97 98

Measure of central tendency Student A Student B

Mode None None


Median 80 80
Mean 80 80
Measures of dispersion

Student A 77 78 79 80 81 82 83

Student B 47 69 73 80 96 97 98

 What is the difference between the achievement of


student A and student B?
 Student A:
 His marks range between 77 and 83
 There is only a small difference between the marks obtained
 There is only a small difference between the marks and the
mean
 This student’s marks have a small dispersion
Measures of dispersion

Student A 77 78 79 80 81 82 83

Student B 47 69 73 80 96 97 98

 What is the difference in achievement between student A


and student B?
 For student B:
 His marks range between 47 dan 98
 There is a big difference between the test marks
 There is a big difference between the marks and the mean
 His marks have a large dispersion
Measures of dispersion

Student A 77 78 79 80 81 82 83

Student B 47 69 73 80 96 97 98

 Discuss the students’ performance based on this


report
 Student A showed a more consistent performance

compared to student B because his test scores


have a smaller dispersion
Measures of dispersion
• Question:
Can the measures of central tendency alone (mode,
median and mean) provide enough information to
give a good representation of how data is
distributed?
• Answer:
– No
– Therefore, we need other measures that can
provide additional information about the
dispersion of a data set. This measure is called
measure of dispersion
Measures of dispersion
• A measure that shows how far the values in a data
set differ from each other or from the central
position of the data set
• Common measures of dispersion:
– Range
– Interquartile range
– Variance
– Standard deviation
How do scores spread out?

• Variability
–Tell us how far scores spread out
–Tells us how the degree to which
scores deviate from the central
tendency
How are these different?

Mean = 10 Mean = 10
Measure of Variability

Measure Definition Related to:


Range Largest - Smallest Mode
Interquartile Range X75 - X25
Median
Semi-Interquartile Range (X75 - X25)/2
Average Absolute Deviation X i X
N
N

 X X
2
i
Variance i 1
Mean
N 1
N

 X i  X
2

Standard Deviation i 1

N 1
The Range

• The simplest measure of variability


– Range (R) = Xhighest – Xlowest
– Advantage – Easy to Calculate
– Disadvantages
• Like Median, only dependent on two scores  unstable
{0, 8, 9, 9, 11, 53} Range = 53
{0, 8, 9, 9, 11, 11} Range = 11
• Does not reflect all scores
Range
• Measure of the difference between the largest value
and the smallest value in a data set:
• Range = largest value – smallest value
• Example:
– Find the range for the following data set:
9, 12, 6, 32, 15
• Solution:
– Range = largest value – smallest value
= 32 – 6
= 26
Range
• Self test:
Find the range for the following data set:
a) 108, 104, 109, 45, 106, 107, 110
b) 4.5, 3.2, 9.3, 4.2, 2.1
• Solution:
a) Range = largest value – smallest value
= 110 – 45
= 65
b) Range = largest value – smallest value
= 9.3 – 2.1
= 7.2
Interquartile Range

• Quartile: the value that divides a data set arranged in order into four parts
containing the same number of data
• First Quartile (Q1): a value where up to 1/4 of the total data has a value that
is lower than its own value
• Second Quartile (Q2) or median: a value where up to 1/2 of the total data has
a value that is lower than its own value
• Third Quartile (Q3): a value where up to 3/4 of the total data has a value that
is lower than its own value
• Example: 33, 40, 45, 47, 50, 52, 60, 66, 70, 76, 82

Q1=45
Q2=52 Q3=70
Variability: IQR
• Interquartile Range
– = P75 – P25 or Q3 – Q1
– This helps to get a range that is not influenced by
the extreme high and low scores
– Where the range is the spread across 100% of the
scores, the IQR is the spread across the middle
50%
Interquartile Range
• Measure of the difference between the third quartule and the first
quartile ia a data set:
• Interquartile range = third quartile – first quartile
= Q 3 – Q1
• Example:
– Find the interquartile range of the following data set:
33, 50, 45, 47, 40, 52, 82, 66, 70, 76, 60
Arrange data in ascending order:

33, 40, 45, 47, 50, 52, 60, 66, 70, 76, 82

Q1=45 Q3=70

• Interquartile range = third quartile – first quartile


= 70 – 45
= 25
Interquartile Range
• Self test:
– Find the inter quartile range for the following:
– a) 54, 42, 45, 76, 71, 62, 47
– b) 19, 14, 20, 27, 17, 24, 22, 18
• Solution:
– a) Arrange data in ascending order:

42, 45, 47, 54, 62, 71, 76


Q1=45 Q3=71

Inter quartile range = Q3 – Q1


= 71 – 45
= 26
Interquartile Range
• Solution:
– b) Arrange data in ascending order:

14, 17, 18, 19, 20, 22, 24, 27

Q3=22  24
Q1= 17  18 2
2 = 23
= 17.5
Interquartile range = Q3 – Q1
= 23 – 17.5
= 5.5
Variability: SIQR

• Semi-interquartile range
– =(P75 – P25)/2 or (Q3 – Q1)/2
– IQR/2
– This is the spread of the middle 25% of the data
– The average distance of Q1 and Q3 from the
median
– Better for skewed data
Variability: SIQR

• Semi-Interquartile range
Q1 Q2 Q3 Q1 Q2 Q3
Variance and standard deviation

• Mean of the squares of the deviation of each data from the


mean of a data set:

• Variance = sum of squares of the deviations of each data from the mean
number of data

or  2
= 2

• Standard deviation is the square root of the variance:


• Standard deviation,  = √variance
Variance
• When calculated for a sample

  X
2
X

2 i
s
N 1
• When calculated for the entire population

  X
2
X
 
2 i

N
Standard Deviation
• Variance is in squared units
• What about regular old units
• Standard Deviation = Square root of the variance

 X X
2

s
i

N 1
Standard Deviation

• Uses measure of central tendency (i.e. mean)


• Uses all data points
• Has a special relationship with the normal
curve
• Can be used in further calculations
• Standard Deviation of Sample = SD or s
• Standard Deviation of Population = 
Why N-1?
• When using a sample (which we always do)
we want a statistic that is the best estimate
of the parameter

  X  X 2   
  X X
2

   E  
i 2 i
E
 N 1   N 1 
   
Variance and standard deviation of
population
Variance and standard deviation of
sample

=
Variance and standard deviation
Example:
Find the variance and standard deviation of the following
data set
26, 28, 30, 35, 38, 23
Solution:

X =
X
N
X (X- X ) (X- X )2
=
• = 180
26 26- 30 = -4 16
158
6
28 28- 30 = -2 4 =
30 30- 30 = 0 0 5
= 30 35 35- 30 = 5 25
= 31.6
38 38- 30 = 8 64
23
180
23- 30 = -7 49
158
 = √31.6
= 5.62
Variance and standard deviation
1  (  X ) 2

S2    X 2  
 1   
• Ex.: Find the variance and standard deviation of the following
26, 28, 30, 35, 38, 23
• Solution:
X X2

26 676
1  (  X ) 2

S2    X 2  
28 784  1   
30 900
1  180 2 
=  5558  
35 1225
6 1  6 
38 1444
= 31.6
23 529
s = √31.6
TOTAL = 180 TOTAL = 5558 = 5.62
Variance and standard deviation
• Self test:
– Find the variance and standard deviation of the following:
15, 28, 33, 47, 56
• Solution: 1  ( X ) 2 
S 
2 2
  X  
X X2  1   
15 225
1  179 2 
  7443  
28 784 5 1  5 
33 1089 = 258.7
47 2209
S = √258.7
56 3136
=16.08
Total =179 Total =7443
Variance and standard deviation

• Calculating variance and standard deviation


using SPSS v 15 for Windows:
– Prepare an SPSS file
– Enter the data
– Click on the menu Analyze
– Select Descriptive Statistics
– In the Descriptive Statistics menu, select
Descriptives, Options, Variance, Std. Deviation,
Continue, OK
Variance and standard deviation

• Analyze
• Descriptive Statistics
• Descriptives
• Options
• Variance
• Std. Deviation
• Continue
• OK
Variance and standard deviation

• Self test:
– Calculate the variance and standard deviation of the following
data set using SPSS:
345, 232, 422, 341, 330, 472, 356, 436
• Solution:

 2
= 75.30 (2 t.p.)

 = 5669.36 (2 t.p.)
Variance and standard deviation

• Example:
• In a road poll, the speed of 40 cars were recorded as
follows (km/hr)

56 72 63 81 73 53 57 69 70 89
63 68 72 77 82 85 59 74 69 76
73 62 80 70 60 71 65 73 64 69
72 65 77 68 78 72 67 75 66 79
• Find the variance and standard deviation of the car
speed using SPSS
Range, IQR, Variance and standard deviation
• Question:
– Which of the measures of dispersion is suitable to
represent a data set?
Characteristics of range, inter quartile range,
variance and standard deviation
• Range
– Advantages:
• Easy to understand and calculate
• Suitable for providing a rough picture of dispersion in a
short time
– Disadvantages:
• Its calculation only involves values at both ends of the
data set, ignoring the remaining data
• Its value is influenced by the presence extreme values
in a data set
• Thus, range may give a less than accurate picture of
dispersion for the data set
Characteristics of range, inter quartile range,
variance and standard deviation
• Range
– Example:
• Scores obtained by students in a Mathematics test:
• 52, 54, 56, 60, 61, 64, 65, 92
• Calculate the mean and range of marks
– Solution:
• Mean = 504/8 = 63
• Range = 92 – 52 = 40
Characteristics of range, inter quartile range,
variance and standard deviation

• Range
• Scores obtained by students in a Mathematics test:
• 52, 54, 56, 60, 61, 64, 65, 92
• Mean = 504/8 = 63
• Range = 92 – 52 = 40
– Question:
• Can the range give an accurate picture of the dispersion of
scores?
– Answer:
• No, the range of 40 shows a large dispersion.
• Most of the scores are around 52 to 65 and are mainly around
the mean
• Only one score is too high (92) and is placed very far from the
mean
Characteristics of range, inter quartile range,
variance and standard deviation
• Inter quartile range
– Advantages:
• Its value is not affected by extreme values in a data set
as it is the range for 50% of the value of data between
the first quartile and the third quartile
• Thus, the inter quartile range can be used even if there
are extreme values in the data set
– Disadvantages:
• Does not measure dispersion of each data from the
mean of a data set
Characteristics of range, inter quartile range,
variance and standard deviation
• Inter quartile range
– Example:
• Two groups of teachers made the following donations
for Warriors Day:
• Group A:
• RM1, RM2, RM2, RM3, RM4, RM5, RM5, RM6, RM6,
RM6, RM8, RM30
• Group B:
• RM1, RM2, RM5, RM5, RM6, RM6, RM6, RM6, RM8,
RM8, RM10, RM15
• Find the mean, range and inter quartile range for
each group.
Characteristics of range, inter quartile range,
variance and standard deviation
• Inter quartile range
• Group A:
– RM1, RM2, RM2, RM3, RM4, RM5, RM5, RM6, RM6, RM6, RM8, RM30
• Group B:
– RM1, RM2, RM5, RM5, RM6, RM6, RM6, RM6, RM8, RM8, RM10, RM15

Group A Group B
Mean RM78/12=RM6.50 RM78/12=RM6.50
Range RM30-RM1=RM29 RM15-RM1=RM14
Inter quartile range RM6-RM2.50=RM3.50 RM8-RM5=RM3

 Can the range give accurate information about dispersion of data?


 No, because it is influenced by extreme values
 Can the inter quartile range give accurate information about
dispersion of data?
 Yes, because it is not influenced by extreme values
Characteristics of range, inter quartile range,
variance and standard deviation
• Variance
– Advantages:
• More accurate compared to range and inter quartile
range because its calculation involves all values in the
data set
• Measures the spread of each value from the mean of
the data set
– Disadvantages:
• Calculation involves the square of deviations of each
value from the mean of the data set
• Thus, the unit for variance is not the same as the unit
for data
Characteristics of range, inter quartile range,
variance and standard deviation
• Standard deviation
– Advantages:
• More accurate compared to range and inter quartile
range – its calculation includes all data from the data
set
• Measures the dispersion of each value from the mean
of the data set
• Its calculation does not involve the square of deviations
from the mean of the data set
• Thus, the unit for standard deviations the same as the
unit for data
Understanding Data Via Descriptive
Analysis
• Measures of Variability:

Ch 15 50
When to Use a Particular Statistic
Research Question
• How can quiz scores of students enrolled in an
introductory statistics class be summarised
using measures of central tendency?
Measures of dispersion.

• Template:
How can [variable] be summarised using
measures of central tendency? Measures of
dispersion.
SPSS Output
Summarising results
• As shown in Table 3.5, scores ranged from 9 to
20.
• The mean was 15.56 approximate, median
was 17.00 and the mode was 17.00.
• Thus, the scores tended to lump together at
the high end of the scale.
• A negatively skewed distribution is suggested
given that the mean was less than the median
and mode
Summarising results
• The range was 11, the interquartile range was
5.0, variance was 10.01 and standard deviation
was 3.16.
• For example, the middle 50% of the scores had a
range of 5 (interquartile range) indicating that
there was a reasonable spread of scores around
the median.
• Thus, despite a high ”average” score, there were
some low performing students as well.
• These results are consistent with those described
using the graphical representation.
Results in APA format
As shown in Table 3.5, scores ranged from 9 to 20. The
mean was 15.56, approximate median was 17.00 and
the mode was 17.00. Thus, the scores tended to lump
together at the high end of the scale. A negatively
skewed distribution is suggested given that the mean
was less than the median and mode. The exclusive
range was 11, the interquartile range was 5.0, variance
was 10.01 and standard deviation was 3.16. From
example, the middle 50% of the scores had a range of
5(interquartile range) indicating that there was a
reasonable spread of scores around the median. Thus,
despite a high ”average” score, there were some low
performing students as well. These results are
consistent with those described using the graphical
representation.

You might also like