Session 3

BSU5335 – Unit I Session 03: Summarization of Data
Session3
Summarization of Data
Contents
Introduction, p18
3.1 Measures of Central Tendency, p18
3.2 Measures of Dispersion, p22
Summary, p29
Learning Outcomes, p29
Introduction
Once the data for a research project has been collected, and summarized by
using tables and diagrams, the next step is to measure the central tendency
and the dispersion of the data set. Measures of central tendency allow us to
identify where the majority of values are located in the distribution of the
data set, and measures of dispersion would tell us how the data are spread
around the middle value of the data set.
3.1 Measures of central tendency
This is the “middle” or “center” of a variables’ distribution. It gives a single

score that best describes the entire distribution of a quantitative data set.
Mean, Median and Mode are the common measures of central tendency of a
data set.
18 Copyright © 2020, The Open University of Sri Lanka

Mean
The arithmetic mean of a sample is the sum of the individual values in the
data set divided by the total number of values in the data set.
x=
x i
n Where  xi sum of all values

is the
For example, if we have weights of five women (in kg); 50, 50, 65, 79 and
75, then the mean weight of this sample is equal to (50+50+65+79+75) /5
= 319 / 5 = 63.8 kg.
For grouped data, the mean can be calculated using the following steps.
Step 1: Find the midpoint of each interval (x)
Midpoint of interval = (Lower class limit + Upper class limit) / 2
Step 2: Multiply the frequency (f) of each interval by its mid-point (fx)
Step 3: Get the sum of all the frequencies (f) and the sum of all the fx.
Divide the ‘sum of fx’ by ‘sum of f’ to get the mean.
For example, the following table shows the frequency distribution of the
diameters of 40 particular drugsbottles. (Lengths have been measured to the
nearest millimeter). Find mean length of diameters in the sample of bottles.
Table 3.1: Frequency distribution of the diameters

Diameter
Frequency(f) Midpoint (x) fx
(mm)
35-39 6 37 222
40-44 12 42 504
45-49 15 47 705
51-54 10 52 520
55-60 7 57 399
Copyright © 2020, The Open University of Sri Lanka 19

Total 50 2350
Mean length of the diameters of 40 particular drugsbottlesis equal to 47

millimeters.
Median
The "median" is the "middle" value of aset of observations.

To find the median, first we arrange the observations in order from the
lowest to the highest value. If there is an odd number of observations, the
median is the middle value. If there is an even number of observations, the
median is the average of the two middle values. Thus, in the sample of the
weights of five women given above, the median weight would be 65 kg;
since 65kg is the middle value in that data set.
So for ungrouped data median is given by
For grouped data, the median value is calculated using the following
formula.
Where L = lower limit of the median class

n = total number of observation
F = number of observations up to the median class

C = the interval of the median class

f = number of observations in the median class
Diameter (mm) Frequency (f) Cumulative frequency

When
calcul 35-39 6 6
ating 40-44 12 18
the 45-49 15 33
media
51-54 10 43
n for
55-60 7 50
the
diameters of 40 particular drugs bottles, first we need to calculate
cumulative frequencies as given in the following table.
When calculating the median we should get the actual limits of the class
intervals. (E.g. 45 – 49 is 44.5 to 49.5)
Median length of the diameters of 40 particular drugsbottlesis equal to 45.56

millimeters.
Mode
The mode is the most frequently appearing value of a variable.

For example, BMI (kg/m2) was measured in a sample of 7 patients. The

values were 24.5, 23.5, 26.5, 29.5, 30.5, 26.5 and 22.5. In this data set the
mode is 26.5.
Activity3.1
1. Find the mean, median and mode of the following data set.
96, 48, 27, 72, 39, 70, 7, 68, 99, 36, 95, 4, 6, 13, 34, 74, 65, 42, 28, 54, 69, 48
2. Weights (in kg) of 80 children are given below

8.9 11.4 10.4 14.9 11.5 12 11 10.2
11.2 12.9 12.1 9.4 13.2 10.8 11.7 8.9
10.6 10.5 13.7 11.8 14.1 10.3 13.6 10.2
12.1 12.9 11.4 12.7 10.6 11.4 11.9 13.3
9.3 13.5 14.6 11.2 11.7 10.9 10.4 13.7
12 12.9 11.1 9.4 10.2 11.6 12.5 15.2
13.4 12.1 10.9 11.3 14.7 10.8 13.3 11.4
11.9 11.4 12.5 13 11.6 13.1 9.7 11.8
11.2 15.1 10.7 12.9 13.4 12.3 11 15.5
14.6 11.1 13.5 10.9 13.1 11.8 12.2 11.3
Calculate mean and median of the above data set.
3.2 Measures of dispersion
A measure of dispersion (or spread or variation) of a data set is used to

describe how data are scattered around the central value of the data set. Thus,
it is usually used in conjunction with a measure of central tendency, such as
the mean or median, to provide an overall description of a set of data. There
are many reasons why the measures of the spread of data values are
important, but one of the main reasons is its’ relationship with the measures
of central tendency. For example, a measure of dispersion gives us an idea

of how well the mean represents the data. If the dispersion of values in the
data set is large, the mean may not be a valuable measure to represent the
data set. This is because a large dispersion may indicate that there are large
differences between individual scores.
Measures of dispersion include range, quartiles, inter quartile range,
percentiles, standard deviation, variation and coefficient of variation.
Range
The range is the simplest measure of variation. It is the difference between

the largest and the smallest values of a random variable.
Range = Maximum value - Minimum value
Example: Calculate the range of the cholesterol level (mg/dL) of 9 patients

given below:
204, 210, 215, 220, 225, 234, 238, 240
The range = the largest number – the smallest number

= 240 – 204 = 36 mg/dL
Quartiles
Quartiles divide a set of data into four equal parts. The values that divide
each part are called the first, second, and third quartiles; and they are
denoted by Q1, Q2, and Q3, respectively.

For ungrouped data, first arrange the data set in an ascending order.
12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32
For grouped data, we can get Q1 and Q3 equation as follows.
Where
L1 - Lower class boundary of the Q1 class
L3 - Lower class boundary of Q3 class
fQ1- Frequency of the Q1 class
fQ3 - Frequency of the Q3 class
F = Cumulative frequency of the class preceding the Q1 or Q2 class
n = total frequency
h = Class interval
Time taken for a painkiller drug to relieve pain of 50 cancer patients is given
in the table below.
Table 3.2: Frequency distribution table of time taken to relieve pain in 50

cancer patients

Time taken to Cumulative

Frequency Class boundaries
relieve pain (min) frequency
1-10 8 0.5-10.5 8
11-20 14 10.5-20.5 22
21-30 12 20.5-30.5 34
31-40 9 30.5-40.5 43
41-50 7 40.5-50.5 50
Q1 and Q3 can be calculated as follows.

Q1 class = n/4 = 50 / 4 = 12.5, therefore Class Q1 is the 2nd class
Q3 class = 3n/4 = 150 / 4 = 37.5, therefore Class Q3 is the 4th class
Inter Quartile Range
The interquartile range (IQR) is the interval between the values of the upper
and lower quartiles. The interquartile range is equal to Q3 minus Q1.
In the above example,
Variance
The sample variance is the sum of the squared deviations of the observed
values from the average (mean) divided by one less than the number of
observations in the data set.

For example, for n observations x1, x2, x3, ... ,xn with sample mean
The sample variance is given by
Standard deviation
Standard deviation is a commonly used measure of spread or dispersion of a

set of data. It is calculated by taking the square root of the variance.
Sample Standard deviation is equal to
For example, consider a set of IQ scores; 96, 104, 126, 134 and 140.
The mean of this data is (96+104+126+134+140)/5 = 120.

The deviations from the mean of each value are given by
96-120 = -24, 104-120 = -16, 126-120 = 6, 134-120 = 14, 140-120 = 20.
The sum of their squares is given by

Divide this value by the number of scores minus one (because it is from a
sample, not a population, thus to minimize bias) then get the square root:
So the standard deviation of the IQ scores in the sample = S = 19.12

For grouped data the following formulae are used to calculate Variance and
Standard Deviation.
Sample Variance
Variance =
Sample Standard deviation
Suppose the number of pregnant motherswho attended 50 well-women

clinics in a district on a week day is summarized and given below. Find the
variance and standard deviation.
No of patients attended Frequency
10-12 04
13-15 12
16-18 20

19-21 14
Total 50
No of patients
f Mid point (x) fx fx2
attended
10-12 04 11 44 484
13-15 12 14 168 2352
16-18 20 17 340 5780
19-21 14 20 280 5600
Total 50 832 14216
The Mean number of pregnant mothers who attended a clinic on that day is
832 / 50 = 16.64. In other words, on average 17 mothers attended each
clinic in the district on that day.
Variance
Standard deviation =
Thus, the standard deviation (denoted as SD) of the number of pregnant
mothers who attended well-women clinics on that week day is 2.75
Coefficient of Variation
Coefficient of Variation is the standard deviation expressed as a percentage

of the mean. If we wish to compare the variability of two or more series of
data, we can use the coefficient of variation. A higher coefficient of

variation in a data series indicates that the group is more variable and less
stable or less uniform. If a coefficient of variation is small it indicates that
the group is less variable and it is more stable or more uniform.
Formula for Coefficient of Variance (CV)
In other words coefficient of variation is defined as the ratio of the standard

deviation to the mean. The value of CV is calculated only for a non-zero
mean.
Example
Find the Coefficient of variation for the sample given in the above example
of pregnant mothers attending well-women clinics
Suppose the mean pulse rate (beats per minute) of a group of students was
60 and SD was 10. In the same group the mean and SD of the variable
height were 160 cm and 5 cm respectively. Which variable shows the
greater variation?
CV for pulse rate = 16.6%
CV for height = 3.1%
So the variable pulse rate has a greater variability compared to the variable
height in this student population
Summary
• Once the data has been collected, and summarized using tables and
diagrams, the next step is to measure the central tendency and the
dispersion of the data set.
• Mean, Median and Mode are the common measures of central
tendency of a data set.

• Mean, Median and Mode are the common measures of central

tendency of a data set.
• Measures of dispersion include range, quartiles, inter quartile range,
percentiles, standard deviation, variation and coefficient of variation.
Learning Outcomes
At the end of the lesson you should be able to
• Explain and calculate various measures of central tendency.
• Describe and calculate various measures of dispersion.
Review Questions
The incubation periods of a random sample of 14 HIV infected individuals are given
below (in years):
12.0, 10.5, 5.2, 9.5, 6.3, 13.1, 13.5, 12.5, 10.7, 7.2, 14.9, 6.5, 8.1, 7.9
a. Calculate the sample mean.
b. Calculate the sample median.
c. Calculate the sample standard deviation.

Session 3

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Session 3

Uploaded by

Copyright:

Available Formats

BSU5335 – Unit I Session 03: Summarization of Data

3.1 Measures of central tendency

This is the “middle” or “center” of a variables’ distribution. It gives a single

18 Copyright © 2020, The Open University of Sri Lanka

n Where  xi sum of all values

Table 3.1: Frequency distribution of the diameters

Copyright © 2020, The Open University of Sri Lanka 19

Mean length of the diameters of 40 particular drugsbottlesis equal to 47

The "median" is the "middle" value of aset of observations.

So for ungrouped data median is given by

Where L = lower limit of the median class

20 Copyright © 2020, The Open University of Sri Lanka

C = the interval of the median class

Diameter (mm) Frequency (f) Cumulative frequency

Median length of the diameters of 40 particular drugsbottlesis equal to 45.56

Copyright © 2020, The Open University of Sri Lanka 21

For example, BMI (kg/m2) was measured in a sample of 7 patients. The

2. Weights (in kg) of 80 children are given below

Calculate mean and median of the above data set.

3.2 Measures of dispersion

A measure of dispersion (or spread or variation) of a data set is used to

22 Copyright © 2020, The Open University of Sri Lanka

The range is the simplest measure of variation. It is the difference between

Range = Maximum value - Minimum value

Example: Calculate the range of the cholesterol level (mg/dL) of 9 patients

The range = the largest number – the smallest number

Copyright © 2020, The Open University of Sri Lanka 23

For grouped data, we can get Q1 and Q3 equation as follows.

Table 3.2: Frequency distribution table of time taken to relieve pain in 50

24 Copyright © 2020, The Open University of Sri Lanka

Time taken to Cumulative

Q1 and Q3 can be calculated as follows.

Q3 class = 3n/4 = 150 / 4 = 37.5, therefore Class Q3 is the 4th class

Inter Quartile Range

In the above example,

Copyright © 2020, The Open University of Sri Lanka 25

The sample variance is given by

Standard deviation is a commonly used measure of spread or dispersion of a

Sample Standard deviation is equal to

The mean of this data is (96+104+126+134+140)/5 = 120.

26 Copyright © 2020, The Open University of Sri Lanka

So the standard deviation of the IQ scores in the sample = S = 19.12

Suppose the number of pregnant motherswho attended 50 well-women

No of patients attended Frequency

Copyright © 2020, The Open University of Sri Lanka 27

13-15 12 14 168 2352

16-18 20 17 340 5780

19-21 14 20 280 5600

Total 50 832 14216

Coefficient of Variation is the standard deviation expressed as a percentage

28 Copyright © 2020, The Open University of Sri Lanka

In other words coefficient of variation is defined as the ratio of the standard

Copyright © 2020, The Open University of Sri Lanka 29

• Mean, Median and Mode are the common measures of central

30 Copyright © 2020, The Open University of Sri Lanka

You might also like