4 Measures of Centrality: Mean, Median, Mode, Grouped Data

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 18

Central Tendency

4 Measures of Centrality
Mean, Median, Mode, Grouped Data
When one is presented a data set, it is most helpful if one is presented only one
statistic which is enough to tell a fitting characteristic of the entire data set. There is no
other statistic more helpful than the “average” which is a snapshot of the data set’s
important characteristic: central tendency. The most commonly used measures of
central tendency are mean, median, and mode.

Measure of Central Tendency


A measure of central tendency is a number that gives a summary of the
characteristics of a given data set.

4.1 MEAN
Mean is sometimes called the arithmetic mean or arithmetic average. The sample mean
is denoted by x ( eks-bar) whereas, the population mean is denoted by the Greek letter
μ (meeyu).

Sample Mean

x=
∑ xi
n

The symbol n is the number elements in a data set. The symbol


x i represents the value

of the i-th observation. The value of the first data is x 1 , the second x 2 , the third
x 3 , and so on, the last data is x n .

Example 4.1
Obtain the mean of a sample of class sizes from three colleges, CBAM, LIMA,
and CIHTM: 30, 43, 45

Solution
x 1=30
x 2=43
x 3=45

49
Central Tendency

To compute for the sample mean we simply follow the formula.

x=
∑ xi =30+43+ 45 =118 =39. 33≈39
n 3 3

The formula to compute for population mean is the same as the one used for sample mean
but the symbols used are different.

Population Mean

μ=
∑ xi
N

On some occasions a data set contains unusually large or small values that can
significantly affect the value of the mean. In such a situation the computed mean may
not give an appropriate description of the data set. To correct this, the data set is
“trimmed”. These unusually large or small values are removed from the data set, and the
resulting average is called the trimmed mean. For example, suppose a student took 10
quizzes in Statistics 1, and his scores are as follows:

Example 4.2
Obtain the mean of the 10 quizzes, and then compare this mean when the data set
is trimmed to 9 quizzes by removing by the lowest score.

Quiz Score Quiz Score


1 87 6 60

2 85 7 86

3 96 8 92

4 86 9 94

5 97 10 89

The mean ( average score ) of his quizzes is

87+ 85+96+ 86+97+60+ 86+92+ 94+89


x=
10
x=87 . 2

50
Central Tendency

It is a point of interest to look at the student’s 6 th quiz which is 60%. In LPU Batangas
that means the student’s raw score is actually zero for that quiz. It is a bit unusual that a
student who does well in many quizzes will score zero in one quiz. In fact, it is possible
he did not take the quiz at all and so his score appeared 60 on the student record. To
remove this extreme value, we can choose to obtain the trimmed mean instead. Remove
60 from the data set and obtain the average of the remaining nine data values.

87+ 85+96+ 86+97+ 86+92+94 +89


x=
9
x=90 . 22

4.2 MEDIAN
The median is value which coincides with the middle observation when the data items are
sorted from smallest to largest values. If there is an odd number of data items, the
median will fall on the observation in the middle. If there is an even number of data
items there is no single middle value. Instead the median is computed from the average
of the two middle values.

Median

If there is an odd number of data values arranged in ascending order, the median is
the value in the middle of the list.

If there is an even number of data values arranged in ascending order, the median is
the average of the two middle values in the list.

Example 4.3
Obtain the median of the list shown below.
19 17 39 34 26 48 58 74 26 47 96

Solution
Arrange the list in ascending order.
17 19 26 26 34 39 47 48 58 74 96
Observation in the
middle
Median is 39.

51
Central Tendency

Example 4.4
Obtain the median of the data values in the list: 1002 3989 5789 3876 2999
4888

Solution
Arrange the list in ascending order.

1002 2999 3876 3989 4888 5789

The list contains six data values. The two middle values are 3876
and 3989. The median therefore, is

3876+3989
Median= =3932 .5
2

For much of the computations you will do in your profession, the mean is the measure of
central tendency you will use often. But there are occasions when median is the preferred
measure. This arises when extreme values contained in the data set can affect
significantly the value of the mean. Extreme values are observations which are either too
large or too small.

4.3 MODE
Mode is defined in the following manner.

Mode
The mode is the data value which occurs with greatest frequency

Given a list of data values, the observation which occurs most often is the mode of the
data set. This measure of central tendency is most applicable with qualitative data.
Consider for example, a survey about soft drinks presented in Table 1.1.

The mode of this data set is Coca-cola because it is the most preferred soft-drink. For
this data set, to look for median or mean makes no sense. It is the mode which is the
appropriate measure for central tendency in this situation.

Sometimes survey results reveal two or more categories occur with the greatest
frequency. If such instance occurs then, the data set is said to be multimodal. It is
bimodal if it shows two modes. When a data set is multimodal, it may not be appropriate
to choose mode as a measure of central tendency.

52
Central Tendency

Table 4.1 Popular Softdrinks in San Pascual

Softdrink Frequency
Coca-cola 30

Pop Cola 10

Pepsi 15

Sprite 27

Sparkle 17

Root Beer 14

RC Cola 20

In extreme cases, when each category appears only once then by definition, it is
multimodal; each category is a mode, but this does not tell us anything about the central
tendency of the data set. Therefore, it is inappropriate to report a mode in such a
situation.

4.4 SAMPLE MEAN OF GROUPED DATA


To obtain the mean of individual data values, we simply obtain the sum of these data
values and divide it by n, the sample size. But we shall implement a different procedure
to obtain sample mean when the data presented to us is grouped data.

Grouped data refers to a data set which is already organized in frequency distribution
form. It can be either in the form of a frequency distribution table or graph.

Table 4.2 Frequency Distribution


Quiz Scores
Score Interval Frequency

10 1

7–9 17

4–6 15

1–3 2

Let us use Table 4.1 to illustrate the sample mean of a grouped data.

53
Central Tendency

Sample Mean of Grouped Data

∑ f i mi
x=
n
f i− the frequency of observations in class interval i
mi− the midpoint of class interval i
f i mi − the product of f i and mi
n− is the total number of data values

Table 4.3 Frequency Distribution Quiz Scores


Score Interval Midpoint Frequency f i mi
10 10 1 10

7–9 8 17 136

4–6 5 15 75

1–3 2 2 4

Total n = 35 ∑ f i m i=
225

Using the formula for grouped data we obtain the mean

∑ f i mi 10+136+ 75+4 225


x= = = =6 . 43
35 35 35

When the mean is calculated from all the data values, it does not necessarily follow that it
will be equal to the mean of the grouped data. The difference between ungrouped data
and grouped data is called grouping error. The mean obtained from the grouped data is
an approximation of the true mean obtained from all the data values.

4.5 SAMPLE MEDIAN OF GROUPED DATA


Before we introduce the formula to obtain the median of grouped data let us go back to
cumulative frequency distribution. The “less than” cumulative frequency distribution

54
Central Tendency

( ¿ cf ) is obtained by adding frequencies successively from lowest to the highest


interval. The “more than” cumulative frequency distribution ( ¿ cf ) is obtained by
adding the frequencies from the highest to lowest interval.

Table 4.4 Cumulative Frequency Distribution


Quiz Scores
Score Interval Frequency ¿cf ¿cf
10 1 35 1

7–9 17 34 18

4–6 15 17 33

1–3 2 2 35

The formula to obtain the median of grouped data is as follows:

Median for Grouped Data (from top to bottom)

n
Median=U b −
2
( )
−Cf u
fm
×c

U b − upper limit of the median class


n− number of elements in the data set
Cf u− cumulative frequency of the class above the median class
f m− frequency of the median class

Example 4.5
Obtain the median of the data presented in Table 4.2. This table shows the results
of an examination in Tourism Practicum.

55
Central Tendency

Table 4.5 Relative Frequency Examination Scores


in Tourism Practicum
Score f < cf >cf

90 – 94 2 80 2

85 – 89 4 78 6

80 – 84 12 74 18

75 – 79 21 62 39

70 – 74 17 41 56

65 – 69 9 24 65

60 – 64 8 15 73

55 – 59 6 7 79

50 – 54 1 1 80

Total 80

Solution
Since there are nine classes, the median class falls at the 5 th class 70 – 74,
and
f m=17 .

The upper limit


U b of the median class is 74.5, this is obtained from
( 74+75 ) /2=74 .5 .

The cumulative frequency above the median class


Cf u=39 , the size of
the data set is n=80 , and the length of each class interval is i=5 .

n
Median=U b −
2
fm ( )
−Cf u
×c

=74 . 5− (40−39
17 )×5=74 . 21
If we are to count the classes from bottom to top, the alternative formula is presented as
follows.

56
Central Tendency

Median for Grouped Data (from bottom to top)

n
Median=Lb +
2
fm ( )
−Cf l
×c

Lb − lower limit of the median class


n− number of elements in the data set
Cf l− cumulative frequency of the class below the median class
f m− frequency of the median class

Example 4.6
Obtain the median of the data presented in Table 4.2 using the alternative
formula.

Solution

The median class falls at the 5th class 70 – 74, and


f m=17 .

The lower limit


Lb of the median class is 69.5, this is obtained from
( 69+70 ) /2=69. 5 .

The cumulative frequency below the median class


Cf l=24 , the size of
the data set is n=80 , and the length of each class interval is i=5 .

n
Median=Lb +
2
( )
fm
−Cf l
×c

40−24
=69 .5+ ( 17 )
×5=74 .21

4.6 WEIGHTED AVERAGE

57
Central Tendency

Weighted average is somewhat similar to arithmetic mean but the difference is the data
values carry different “weights”. Some carry more weight than others, and naturally
some data values can influence the average more than others can. One good example is
the manner by which your average grades are computed at the end of each semester.

Weighted Average
n
∑ wi x i
x= i=1n
∑ wi
i=1

w i− the weight of data value


xi

Example 4.7
Felix Dinglasan received his grades at the end of first semester. Compute his
general weighted average for all his subjects.

Table 4.6 Grade Report 1st semester 2013 - 2014


Subject Units Grade
Personality
3 5.00
Development
College Algebra 3 1.00

College Physics 1 5 1.25

English 1 3 3

Seamanship 1 3 2.0

Machine Shop 1 3 2.50

Solution

Let us use Table 4.6 and add another column for


w i x i . See Table 4.7 in
the next page.

Our solutions gives

58
Central Tendency

∑ wi x i 47 .75
x= = =2. 39
20
∑ wi
.

Table 4.7 Grade Report 1st semester 2013 - 2014


Subject Units Grade w i xi
Personality
3 5.00 15.00
Development
College
3 1.00 3.00
Algebra
College
5 1.25 7.25
Physics 1
English 1 3 3 9.00

Seamanship 1 3 2.0 6.00


Machine Shop
3 2.50 7.50
1
Total ∑ wi=20 ∑ wi xi =47. 75
4.7 PERCENTILES
A percentile is a measure that indicates the value below which a given percentage of the
observations in a group of observations fall. For example, the 58 th percentile is the score
or value below which 58% of the observations may be found. If after taking an
examination the result says, Anthony Dela Cruz got 98 percentile, that means 98 % of
the examinees who took the exam got scores lower than Anthony’s.

Admission test scores are often reported in percentiles. It tells the performance of a
student relative to other examinees.

Percentiles

The pth percentile is a value such that at least p percent of the observations take on
this value or less, and at least ( 100 – p) percent of the observations take on this
value or more.

Procedure to obtain the pth Percentile

59
Central Tendency

1) Arrange the data values in ascending order.

2) Compute the index i as follows:

i= ( 100p ) n
where p is the percentile of interest, and n is the number of observations.

3) a) If i is not an integer, round it up. Round it to the nearest integer greater


than i .

b) If i is an integer . the pth percentile is the average of the data values


which occupy the i and i+1 positions.

Example 4.8
Obtain the 85th percentile for the test scores shown below:
65, 67, 72, 75, 75, 75, 75, 75, 79, 80, 80, 80, 83, 86, 88, 89,90

Solution
1) The data set is already sorted from lowest to highest.

2) There are 17 data values n = 17 and p = 85.

i= ( 100p ) n=(100
85
) 17=14 . 45
Since the value has a decimal part, we round this up to 15.

What score then occupies the 85th percentile? Count the observations
from lowest to the highest score. The 15 th score which is 88 occupies
the 85th percentile.

Example 4.9
Obtain the 50th percentile for the list of scores shown below.
60, 63, 68, 70, 75, 75, 75, 79,79, 80, 82,82, 82, 85, 86, 90, 90, 92, 94, 99

60
Central Tendency

Solution
1) The data set is already sorted in ascending order.

2) There are 20 data values, n = 20, and p = 50.

i= ( 100p ) n=(50100 ) 20=10


The value of the index is 10. Obtain the observations which occupy
the 10th and 11th positions. These are 80 and 82. Their average is
(80+82)/2 = 81. Therefore, the score which occupies the 50 th
percentile is 81.

4.8 QUARTILES AND HINGES


A data set can be divided into four parts, and each part contains one-fourth or 25% of all
observations. These divisions are referred to as the quartiles and each quartile is defined
as follows:

Q1 - the first quartile

Q2 - the second quartile ( also the median)

Q3 - the third quartile

The first, second, and third quartiles can be stated in terms of percentiles. Q1 is the 25th

percentile, Q2 the 50th percentile, and Q3 the 75th percentile. Therefore, the rules to

obtain Q 1 , Q2 , and
Q3 are the same rules applied to compute for 25th, 50th, and
75th percentiles.

Example 4.10

Given the list scores as shown below, obtain


Q1 , Q2 and Q3 .

60, 63, 68, 70, 75, 75, 75, 79,79, 80, 82,82, 82, 85, 86, 90, 90, 92, 94, 99

Solution

a) Q1 is the same as the 25th percentile. Let us begin by obtaining the


index i .

61
Central Tendency

i= ( 25100 ) 20=5
The values which occupy the 5th and 6th positions are 75 and 75. Their
average is 75. Therefore, Q1 =75

b) Q2 is the same as the 50th percentile.

i= ( 50100 ) 20=10
The values which occupy the 10th and 11th positions are 80 and 82. Their
average is 81. Therefore, Q2 =81 .

c) For
Q3 , it is 90.

Hinges are another variation for computing for quartiles. In this method the data set
arranged in ascending order is divided into four equal parts. The “first hinge” is the
lower hinge, and the “third hinge” is the upper hinge. In effect the lower hinge coincides
with Q 1 , and the upper hinge coincides with
Q3 .

62
Central Tendency

Name Date

Course-Section Score

Exercise 4.1 Measures of Centrality

1. Consider the list of numbers shown below:

20 18 17 19 20 22 24 18 26 18 21

a) Obtain the mean, median, and mode.

2. Consider the list of numbers shown below:

22 58 24 50 29 52 31 27 44 49 40 31 32 44

a) Compute for Q 1

b) Compute for
Q3

63
Central Tendency

c) Compute for the 90th percentile.

d) Compute for the lower hinge.

e) Compute for the upper hinge.

3. In beauty contest, participants are rated on four categories: swim – suit, evening
gown, talent and question and answer portion. These categories carry different
weights: swim – suit 20%, evening gown 20%, talent 30%, and question and
answer 30%. Contestants are rated from 1 to 10 for each category, 10 being the
best rating. Judge Sofia Vizco gave Ms. Chona Portina the following ratings:

Name of Contestant: Chona Portina


Swim – wear 9
Evening – gown 9
Talent 10
Q&A 8
Judge: Mrs. Sofia Vizco

Use weighted average to compute Ms. Chona Portina’s score from Judge Sofia.
Vizco’s score card.

64
Central Tendency

Name Date

Course-Section Score

Exercise 4.2 Measure of Central Tendency Grouped Data

1. Consider the grouped data on scores of 50 students in a qualifying exam.

Table 4.8 Raw Scores of 50


100 - 104 5

105 - 109 5

110 - 114 13

115 - 119 10

120 - 124 6

125 - 129 6

130 - 134 5

Total 50

a) Obtain the mean of the grouped data. Use the modified table below to
organize your solution.

Table 4.9 Raw Scores of 50


Midpoint f i mi
Class Interval f
mi
100 - 104 5

105 - 109 5

110 - 114 13

115 - 119 10

120 - 124 6

125 - 129 6

130 - 134 5

65
Central Tendency

Total 50

b) Obtain the median of the grouped data in Table 4.3. Use the modified table
below to organize your solution.

Table 4.10 Raw Scores of 50


Class Interval f ¿cf ¿cf
100 - 104 5

105 - 109 5

110 - 114 13

115 - 119 10

120 - 124 6

125 - 129 6

130 - 134 5

Total 50

66

You might also like