Professional Documents
Culture Documents
Session 3
Session 3
Session3
Summarization of Data
Contents
Introduction, p18
3.1 Measures of Central Tendency, p18
3.2 Measures of Dispersion, p22
Summary, p29
Learning Outcomes, p29
Introduction
Once the data for a research project has been collected, and summarized by
using tables and diagrams, the next step is to measure the central tendency
and the dispersion of the data set. Measures of central tendency allow us to
identify where the majority of values are located in the distribution of the
data set, and measures of dispersion would tell us how the data are spread
around the middle value of the data set.
Mean
The arithmetic mean of a sample is the sum of the individual values in the
data set divided by the total number of values in the data set.
x=
x i
For example, if we have weights of five women (in kg); 50, 50, 65, 79 and
75, then the mean weight of this sample is equal to (50+50+65+79+75) /5
= 319 / 5 = 63.8 kg.
For grouped data, the mean can be calculated using the following steps.
Step 1: Find the midpoint of each interval (x)
Midpoint of interval = (Lower class limit + Upper class limit) / 2
Step 2: Multiply the frequency (f) of each interval by its mid-point (fx)
Step 3: Get the sum of all the frequencies (f) and the sum of all the fx.
Divide the ‘sum of fx’ by ‘sum of f’ to get the mean.
For example, the following table shows the frequency distribution of the
diameters of 40 particular drugsbottles. (Lengths have been measured to the
nearest millimeter). Find mean length of diameters in the sample of bottles.
40-44 12 42 504
45-49 15 47 705
51-54 10 52 520
55-60 7 57 399
Total 50 2350
Median
For grouped data, the median value is calculated using the following
formula.
ating 40-44 12 18
the 45-49 15 33
media
51-54 10 43
n for
55-60 7 50
the
diameters of 40 particular drugs bottles, first we need to calculate
cumulative frequencies as given in the following table.
When calculating the median we should get the actual limits of the class
intervals. (E.g. 45 – 49 is 44.5 to 49.5)
Mode
The mode is the most frequently appearing value of a variable.
Activity3.1
1. Find the mean, median and mode of the following data set.
96, 48, 27, 72, 39, 70, 7, 68, 99, 36, 95, 4, 6, 13, 34, 74, 65, 42, 28, 54, 69, 48
of how well the mean represents the data. If the dispersion of values in the
data set is large, the mean may not be a valuable measure to represent the
data set. This is because a large dispersion may indicate that there are large
differences between individual scores.
Measures of dispersion include range, quartiles, inter quartile range,
percentiles, standard deviation, variation and coefficient of variation.
Range
Quartiles
Quartiles divide a set of data into four equal parts. The values that divide
each part are called the first, second, and third quartiles; and they are
denoted by Q1, Q2, and Q3, respectively.
For ungrouped data, first arrange the data set in an ascending order.
12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32
Where
L1 - Lower class boundary of the Q1 class
L3 - Lower class boundary of Q3 class
fQ1- Frequency of the Q1 class
fQ3 - Frequency of the Q3 class
F = Cumulative frequency of the class preceding the Q1 or Q2 class
n = total frequency
h = Class interval
Time taken for a painkiller drug to relieve pain of 50 cancer patients is given
in the table below.
1-10 8 0.5-10.5 8
11-20 14 10.5-20.5 22
21-30 12 20.5-30.5 34
31-40 9 30.5-40.5 43
41-50 7 40.5-50.5 50
The interquartile range (IQR) is the interval between the values of the upper
and lower quartiles. The interquartile range is equal to Q3 minus Q1.
Variance
The sample variance is the sum of the squared deviations of the observed
values from the average (mean) divided by one less than the number of
observations in the data set.
For example, for n observations x1, x2, x3, ... ,xn with sample mean
Standard deviation
For example, consider a set of IQ scores; 96, 104, 126, 134 and 140.
Divide this value by the number of scores minus one (because it is from a
sample, not a population, thus to minimize bias) then get the square root:
Sample Variance
Variance =
Sample Standard deviation
10-12 04
13-15 12
16-18 20
19-21 14
Total 50
No of patients
f Mid point (x) fx fx2
attended
10-12 04 11 44 484
The Mean number of pregnant mothers who attended a clinic on that day is
832 / 50 = 16.64. In other words, on average 17 mothers attended each
clinic in the district on that day.
Variance
Standard deviation =
Thus, the standard deviation (denoted as SD) of the number of pregnant
mothers who attended well-women clinics on that week day is 2.75
Coefficient of Variation
variation in a data series indicates that the group is more variable and less
stable or less uniform. If a coefficient of variation is small it indicates that
the group is less variable and it is more stable or more uniform.
Formula for Coefficient of Variance (CV)
Suppose the mean pulse rate (beats per minute) of a group of students was
60 and SD was 10. In the same group the mean and SD of the variable
height were 160 cm and 5 cm respectively. Which variable shows the
greater variation?
CV for pulse rate = 16.6%
CV for height = 3.1%
So the variable pulse rate has a greater variability compared to the variable
height in this student population
Summary
• Once the data has been collected, and summarized using tables and
diagrams, the next step is to measure the central tendency and the
dispersion of the data set.
• Mean, Median and Mode are the common measures of central
tendency of a data set.
Learning Outcomes
At the end of the lesson you should be able to
• Explain and calculate various measures of central tendency.
• Describe and calculate various measures of dispersion.
Review Questions
The incubation periods of a random sample of 14 HIV infected individuals are given
below (in years):
12.0, 10.5, 5.2, 9.5, 6.3, 13.1, 13.5, 12.5, 10.7, 7.2, 14.9, 6.5, 8.1, 7.9
a. Calculate the sample mean.
b. Calculate the sample median.
c. Calculate the sample standard deviation.