Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

BSU5335 – Unit I Session 03: Summarization of Data

Session3

Summarization of Data

Contents
Introduction, p18
3.1 Measures of Central Tendency, p18
3.2 Measures of Dispersion, p22
Summary, p29
Learning Outcomes, p29

Introduction
Once the data for a research project has been collected, and summarized by
using tables and diagrams, the next step is to measure the central tendency
and the dispersion of the data set. Measures of central tendency allow us to
identify where the majority of values are located in the distribution of the
data set, and measures of dispersion would tell us how the data are spread
around the middle value of the data set.

3.1 Measures of central tendency

This is the “middle” or “center” of a variables’ distribution. It gives a single


score that best describes the entire distribution of a quantitative data set.
Mean, Median and Mode are the common measures of central tendency of a
data set.

18 Copyright © 2020, The Open University of Sri Lanka


BSU5335 – Unit I Session 03: Summarization of Data

Mean

The arithmetic mean of a sample is the sum of the individual values in the
data set divided by the total number of values in the data set.

x=
x i

n Where  xi sum of all values


is the

For example, if we have weights of five women (in kg); 50, 50, 65, 79 and
75, then the mean weight of this sample is equal to (50+50+65+79+75) /5
= 319 / 5 = 63.8 kg.
For grouped data, the mean can be calculated using the following steps.
Step 1: Find the midpoint of each interval (x)
Midpoint of interval = (Lower class limit + Upper class limit) / 2
Step 2: Multiply the frequency (f) of each interval by its mid-point (fx)
Step 3: Get the sum of all the frequencies (f) and the sum of all the fx.
Divide the ‘sum of fx’ by ‘sum of f’ to get the mean.

For example, the following table shows the frequency distribution of the
diameters of 40 particular drugsbottles. (Lengths have been measured to the
nearest millimeter). Find mean length of diameters in the sample of bottles.

Table 3.1: Frequency distribution of the diameters


Diameter
Frequency(f) Midpoint (x) fx
(mm)
35-39 6 37 222

40-44 12 42 504

45-49 15 47 705

51-54 10 52 520

55-60 7 57 399

Copyright © 2020, The Open University of Sri Lanka 19


BSU5335 – Unit I Session 03: Summarization of Data

Total 50 2350

Mean length of the diameters of 40 particular drugsbottlesis equal to 47


millimeters.

Median

The "median" is the "middle" value of aset of observations.


To find the median, first we arrange the observations in order from the
lowest to the highest value. If there is an odd number of observations, the
median is the middle value. If there is an even number of observations, the
median is the average of the two middle values. Thus, in the sample of the
weights of five women given above, the median weight would be 65 kg;
since 65kg is the middle value in that data set.

So for ungrouped data median is given by

For grouped data, the median value is calculated using the following
formula.

Where L = lower limit of the median class


n = total number of observation
F = number of observations up to the median class

20 Copyright © 2020, The Open University of Sri Lanka


BSU5335 – Unit I Session 03: Summarization of Data

C = the interval of the median class


f = number of observations in the median class

Diameter (mm) Frequency (f) Cumulative frequency


When
calcul 35-39 6 6

ating 40-44 12 18
the 45-49 15 33
media
51-54 10 43
n for
55-60 7 50
the
diameters of 40 particular drugs bottles, first we need to calculate
cumulative frequencies as given in the following table.

When calculating the median we should get the actual limits of the class
intervals. (E.g. 45 – 49 is 44.5 to 49.5)

Median length of the diameters of 40 particular drugsbottlesis equal to 45.56


millimeters.

Mode
The mode is the most frequently appearing value of a variable.

Copyright © 2020, The Open University of Sri Lanka 21


BSU5335 – Unit I Session 03: Summarization of Data

For example, BMI (kg/m2) was measured in a sample of 7 patients. The


values were 24.5, 23.5, 26.5, 29.5, 30.5, 26.5 and 22.5. In this data set the
mode is 26.5.

Activity3.1

1. Find the mean, median and mode of the following data set.
96, 48, 27, 72, 39, 70, 7, 68, 99, 36, 95, 4, 6, 13, 34, 74, 65, 42, 28, 54, 69, 48

2. Weights (in kg) of 80 children are given below


8.9 11.4 10.4 14.9 11.5 12 11 10.2
11.2 12.9 12.1 9.4 13.2 10.8 11.7 8.9
10.6 10.5 13.7 11.8 14.1 10.3 13.6 10.2
12.1 12.9 11.4 12.7 10.6 11.4 11.9 13.3
9.3 13.5 14.6 11.2 11.7 10.9 10.4 13.7
12 12.9 11.1 9.4 10.2 11.6 12.5 15.2
13.4 12.1 10.9 11.3 14.7 10.8 13.3 11.4
11.9 11.4 12.5 13 11.6 13.1 9.7 11.8
11.2 15.1 10.7 12.9 13.4 12.3 11 15.5
14.6 11.1 13.5 10.9 13.1 11.8 12.2 11.3

Calculate mean and median of the above data set.

3.2 Measures of dispersion

A measure of dispersion (or spread or variation) of a data set is used to


describe how data are scattered around the central value of the data set. Thus,
it is usually used in conjunction with a measure of central tendency, such as
the mean or median, to provide an overall description of a set of data. There
are many reasons why the measures of the spread of data values are
important, but one of the main reasons is its’ relationship with the measures
of central tendency. For example, a measure of dispersion gives us an idea

22 Copyright © 2020, The Open University of Sri Lanka


BSU5335 – Unit I Session 03: Summarization of Data

of how well the mean represents the data. If the dispersion of values in the
data set is large, the mean may not be a valuable measure to represent the
data set. This is because a large dispersion may indicate that there are large
differences between individual scores.
Measures of dispersion include range, quartiles, inter quartile range,
percentiles, standard deviation, variation and coefficient of variation.

Range

The range is the simplest measure of variation. It is the difference between


the largest and the smallest values of a random variable.

Range = Maximum value - Minimum value

Example: Calculate the range of the cholesterol level (mg/dL) of 9 patients


given below:
204, 210, 215, 220, 225, 234, 238, 240

The range = the largest number – the smallest number


= 240 – 204 = 36 mg/dL

Quartiles

Quartiles divide a set of data into four equal parts. The values that divide
each part are called the first, second, and third quartiles; and they are
denoted by Q1, Q2, and Q3, respectively.

Copyright © 2020, The Open University of Sri Lanka 23


BSU5335 – Unit I Session 03: Summarization of Data

For ungrouped data, first arrange the data set in an ascending order.
12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32

For grouped data, we can get Q1 and Q3 equation as follows.

Where
L1 - Lower class boundary of the Q1 class
L3 - Lower class boundary of Q3 class
fQ1- Frequency of the Q1 class
fQ3 - Frequency of the Q3 class
F = Cumulative frequency of the class preceding the Q1 or Q2 class
n = total frequency
h = Class interval

Time taken for a painkiller drug to relieve pain of 50 cancer patients is given
in the table below.

Table 3.2: Frequency distribution table of time taken to relieve pain in 50


cancer patients

24 Copyright © 2020, The Open University of Sri Lanka


BSU5335 – Unit I Session 03: Summarization of Data

Time taken to Cumulative


Frequency Class boundaries
relieve pain (min) frequency

1-10 8 0.5-10.5 8
11-20 14 10.5-20.5 22
21-30 12 20.5-30.5 34
31-40 9 30.5-40.5 43

41-50 7 40.5-50.5 50

Q1 and Q3 can be calculated as follows.


Q1 class = n/4 = 50 / 4 = 12.5, therefore Class Q1 is the 2nd class

Q3 class = 3n/4 = 150 / 4 = 37.5, therefore Class Q3 is the 4th class

Inter Quartile Range

The interquartile range (IQR) is the interval between the values of the upper
and lower quartiles. The interquartile range is equal to Q3 minus Q1.

In the above example,

Variance

The sample variance is the sum of the squared deviations of the observed
values from the average (mean) divided by one less than the number of
observations in the data set.

Copyright © 2020, The Open University of Sri Lanka 25


BSU5335 – Unit I Session 03: Summarization of Data

For example, for n observations x1, x2, x3, ... ,xn with sample mean

The sample variance is given by

Standard deviation

Standard deviation is a commonly used measure of spread or dispersion of a


set of data. It is calculated by taking the square root of the variance.

Sample Standard deviation is equal to

For example, consider a set of IQ scores; 96, 104, 126, 134 and 140.

The mean of this data is (96+104+126+134+140)/5 = 120.


The deviations from the mean of each value are given by
96-120 = -24, 104-120 = -16, 126-120 = 6, 134-120 = 14, 140-120 = 20.
The sum of their squares is given by

26 Copyright © 2020, The Open University of Sri Lanka


BSU5335 – Unit I Session 03: Summarization of Data

Divide this value by the number of scores minus one (because it is from a
sample, not a population, thus to minimize bias) then get the square root:

So the standard deviation of the IQ scores in the sample = S = 19.12


For grouped data the following formulae are used to calculate Variance and
Standard Deviation.

Sample Variance

Variance =
Sample Standard deviation

Suppose the number of pregnant motherswho attended 50 well-women


clinics in a district on a week day is summarized and given below. Find the
variance and standard deviation.

No of patients attended Frequency

10-12 04

13-15 12

16-18 20

Copyright © 2020, The Open University of Sri Lanka 27


BSU5335 – Unit I Session 03: Summarization of Data

19-21 14

Total 50

No of patients
f Mid point (x) fx fx2
attended
10-12 04 11 44 484

13-15 12 14 168 2352

16-18 20 17 340 5780

19-21 14 20 280 5600

Total 50 832 14216

The Mean number of pregnant mothers who attended a clinic on that day is
832 / 50 = 16.64. In other words, on average 17 mothers attended each
clinic in the district on that day.
Variance

Standard deviation =
Thus, the standard deviation (denoted as SD) of the number of pregnant
mothers who attended well-women clinics on that week day is 2.75

Coefficient of Variation

Coefficient of Variation is the standard deviation expressed as a percentage


of the mean. If we wish to compare the variability of two or more series of
data, we can use the coefficient of variation. A higher coefficient of

28 Copyright © 2020, The Open University of Sri Lanka


BSU5335 – Unit I Session 03: Summarization of Data

variation in a data series indicates that the group is more variable and less
stable or less uniform. If a coefficient of variation is small it indicates that
the group is less variable and it is more stable or more uniform.
Formula for Coefficient of Variance (CV)

In other words coefficient of variation is defined as the ratio of the standard


deviation to the mean. The value of CV is calculated only for a non-zero
mean.
Example
Find the Coefficient of variation for the sample given in the above example
of pregnant mothers attending well-women clinics

Suppose the mean pulse rate (beats per minute) of a group of students was
60 and SD was 10. In the same group the mean and SD of the variable
height were 160 cm and 5 cm respectively. Which variable shows the
greater variation?
CV for pulse rate = 16.6%
CV for height = 3.1%
So the variable pulse rate has a greater variability compared to the variable
height in this student population

Summary
• Once the data has been collected, and summarized using tables and
diagrams, the next step is to measure the central tendency and the
dispersion of the data set.
• Mean, Median and Mode are the common measures of central
tendency of a data set.

Copyright © 2020, The Open University of Sri Lanka 29


BSU5335 – Unit I Session 03: Summarization of Data

• Mean, Median and Mode are the common measures of central


tendency of a data set.
• Measures of dispersion include range, quartiles, inter quartile range,
percentiles, standard deviation, variation and coefficient of variation.

Learning Outcomes
At the end of the lesson you should be able to
• Explain and calculate various measures of central tendency.
• Describe and calculate various measures of dispersion.

Review Questions

The incubation periods of a random sample of 14 HIV infected individuals are given
below (in years):
12.0, 10.5, 5.2, 9.5, 6.3, 13.1, 13.5, 12.5, 10.7, 7.2, 14.9, 6.5, 8.1, 7.9
a. Calculate the sample mean.
b. Calculate the sample median.
c. Calculate the sample standard deviation.

30 Copyright © 2020, The Open University of Sri Lanka

You might also like