Professional Documents
Culture Documents
4 Measures of Centrality: Mean, Median, Mode, Grouped Data
4 Measures of Centrality: Mean, Median, Mode, Grouped Data
4 Measures of Centrality: Mean, Median, Mode, Grouped Data
4 Measures of Centrality
Mean, Median, Mode, Grouped Data
When one is presented a data set, it is most helpful if one is presented only one
statistic which is enough to tell a fitting characteristic of the entire data set. There is no
other statistic more helpful than the “average” which is a snapshot of the data set’s
important characteristic: central tendency. The most commonly used measures of
central tendency are mean, median, and mode.
4.1 MEAN
Mean is sometimes called the arithmetic mean or arithmetic average. The sample mean
is denoted by x ( eks-bar) whereas, the population mean is denoted by the Greek letter
μ (meeyu).
Sample Mean
x=
∑ xi
n
of the i-th observation. The value of the first data is x 1 , the second x 2 , the third
x 3 , and so on, the last data is x n .
Example 4.1
Obtain the mean of a sample of class sizes from three colleges, CBAM, LIMA,
and CIHTM: 30, 43, 45
Solution
x 1=30
x 2=43
x 3=45
49
Central Tendency
x=
∑ xi =30+43+ 45 =118 =39. 33≈39
n 3 3
The formula to compute for population mean is the same as the one used for sample mean
but the symbols used are different.
Population Mean
μ=
∑ xi
N
On some occasions a data set contains unusually large or small values that can
significantly affect the value of the mean. In such a situation the computed mean may
not give an appropriate description of the data set. To correct this, the data set is
“trimmed”. These unusually large or small values are removed from the data set, and the
resulting average is called the trimmed mean. For example, suppose a student took 10
quizzes in Statistics 1, and his scores are as follows:
Example 4.2
Obtain the mean of the 10 quizzes, and then compare this mean when the data set
is trimmed to 9 quizzes by removing by the lowest score.
2 85 7 86
3 96 8 92
4 86 9 94
5 97 10 89
50
Central Tendency
It is a point of interest to look at the student’s 6 th quiz which is 60%. In LPU Batangas
that means the student’s raw score is actually zero for that quiz. It is a bit unusual that a
student who does well in many quizzes will score zero in one quiz. In fact, it is possible
he did not take the quiz at all and so his score appeared 60 on the student record. To
remove this extreme value, we can choose to obtain the trimmed mean instead. Remove
60 from the data set and obtain the average of the remaining nine data values.
4.2 MEDIAN
The median is value which coincides with the middle observation when the data items are
sorted from smallest to largest values. If there is an odd number of data items, the
median will fall on the observation in the middle. If there is an even number of data
items there is no single middle value. Instead the median is computed from the average
of the two middle values.
Median
If there is an odd number of data values arranged in ascending order, the median is
the value in the middle of the list.
If there is an even number of data values arranged in ascending order, the median is
the average of the two middle values in the list.
Example 4.3
Obtain the median of the list shown below.
19 17 39 34 26 48 58 74 26 47 96
Solution
Arrange the list in ascending order.
17 19 26 26 34 39 47 48 58 74 96
Observation in the
middle
Median is 39.
51
Central Tendency
Example 4.4
Obtain the median of the data values in the list: 1002 3989 5789 3876 2999
4888
Solution
Arrange the list in ascending order.
The list contains six data values. The two middle values are 3876
and 3989. The median therefore, is
3876+3989
Median= =3932 .5
2
For much of the computations you will do in your profession, the mean is the measure of
central tendency you will use often. But there are occasions when median is the preferred
measure. This arises when extreme values contained in the data set can affect
significantly the value of the mean. Extreme values are observations which are either too
large or too small.
4.3 MODE
Mode is defined in the following manner.
Mode
The mode is the data value which occurs with greatest frequency
Given a list of data values, the observation which occurs most often is the mode of the
data set. This measure of central tendency is most applicable with qualitative data.
Consider for example, a survey about soft drinks presented in Table 1.1.
The mode of this data set is Coca-cola because it is the most preferred soft-drink. For
this data set, to look for median or mean makes no sense. It is the mode which is the
appropriate measure for central tendency in this situation.
Sometimes survey results reveal two or more categories occur with the greatest
frequency. If such instance occurs then, the data set is said to be multimodal. It is
bimodal if it shows two modes. When a data set is multimodal, it may not be appropriate
to choose mode as a measure of central tendency.
52
Central Tendency
Softdrink Frequency
Coca-cola 30
Pop Cola 10
Pepsi 15
Sprite 27
Sparkle 17
Root Beer 14
RC Cola 20
In extreme cases, when each category appears only once then by definition, it is
multimodal; each category is a mode, but this does not tell us anything about the central
tendency of the data set. Therefore, it is inappropriate to report a mode in such a
situation.
Grouped data refers to a data set which is already organized in frequency distribution
form. It can be either in the form of a frequency distribution table or graph.
10 1
7–9 17
4–6 15
1–3 2
Let us use Table 4.1 to illustrate the sample mean of a grouped data.
53
Central Tendency
∑ f i mi
x=
n
f i− the frequency of observations in class interval i
mi− the midpoint of class interval i
f i mi − the product of f i and mi
n− is the total number of data values
7–9 8 17 136
4–6 5 15 75
1–3 2 2 4
Total n = 35 ∑ f i m i=
225
When the mean is calculated from all the data values, it does not necessarily follow that it
will be equal to the mean of the grouped data. The difference between ungrouped data
and grouped data is called grouping error. The mean obtained from the grouped data is
an approximation of the true mean obtained from all the data values.
54
Central Tendency
7–9 17 34 18
4–6 15 17 33
1–3 2 2 35
n
Median=U b −
2
( )
−Cf u
fm
×c
Example 4.5
Obtain the median of the data presented in Table 4.2. This table shows the results
of an examination in Tourism Practicum.
55
Central Tendency
90 – 94 2 80 2
85 – 89 4 78 6
80 – 84 12 74 18
75 – 79 21 62 39
70 – 74 17 41 56
65 – 69 9 24 65
60 – 64 8 15 73
55 – 59 6 7 79
50 – 54 1 1 80
Total 80
Solution
Since there are nine classes, the median class falls at the 5 th class 70 – 74,
and
f m=17 .
n
Median=U b −
2
fm ( )
−Cf u
×c
=74 . 5− (40−39
17 )×5=74 . 21
If we are to count the classes from bottom to top, the alternative formula is presented as
follows.
56
Central Tendency
n
Median=Lb +
2
fm ( )
−Cf l
×c
Example 4.6
Obtain the median of the data presented in Table 4.2 using the alternative
formula.
Solution
n
Median=Lb +
2
( )
fm
−Cf l
×c
40−24
=69 .5+ ( 17 )
×5=74 .21
57
Central Tendency
Weighted average is somewhat similar to arithmetic mean but the difference is the data
values carry different “weights”. Some carry more weight than others, and naturally
some data values can influence the average more than others can. One good example is
the manner by which your average grades are computed at the end of each semester.
Weighted Average
n
∑ wi x i
x= i=1n
∑ wi
i=1
Example 4.7
Felix Dinglasan received his grades at the end of first semester. Compute his
general weighted average for all his subjects.
English 1 3 3
Seamanship 1 3 2.0
Solution
58
Central Tendency
∑ wi x i 47 .75
x= = =2. 39
20
∑ wi
.
Admission test scores are often reported in percentiles. It tells the performance of a
student relative to other examinees.
Percentiles
The pth percentile is a value such that at least p percent of the observations take on
this value or less, and at least ( 100 – p) percent of the observations take on this
value or more.
59
Central Tendency
i= ( 100p ) n
where p is the percentile of interest, and n is the number of observations.
Example 4.8
Obtain the 85th percentile for the test scores shown below:
65, 67, 72, 75, 75, 75, 75, 75, 79, 80, 80, 80, 83, 86, 88, 89,90
Solution
1) The data set is already sorted from lowest to highest.
i= ( 100p ) n=(100
85
) 17=14 . 45
Since the value has a decimal part, we round this up to 15.
What score then occupies the 85th percentile? Count the observations
from lowest to the highest score. The 15 th score which is 88 occupies
the 85th percentile.
Example 4.9
Obtain the 50th percentile for the list of scores shown below.
60, 63, 68, 70, 75, 75, 75, 79,79, 80, 82,82, 82, 85, 86, 90, 90, 92, 94, 99
60
Central Tendency
Solution
1) The data set is already sorted in ascending order.
The first, second, and third quartiles can be stated in terms of percentiles. Q1 is the 25th
percentile, Q2 the 50th percentile, and Q3 the 75th percentile. Therefore, the rules to
obtain Q 1 , Q2 , and
Q3 are the same rules applied to compute for 25th, 50th, and
75th percentiles.
Example 4.10
60, 63, 68, 70, 75, 75, 75, 79,79, 80, 82,82, 82, 85, 86, 90, 90, 92, 94, 99
Solution
61
Central Tendency
i= ( 25100 ) 20=5
The values which occupy the 5th and 6th positions are 75 and 75. Their
average is 75. Therefore, Q1 =75
i= ( 50100 ) 20=10
The values which occupy the 10th and 11th positions are 80 and 82. Their
average is 81. Therefore, Q2 =81 .
c) For
Q3 , it is 90.
Hinges are another variation for computing for quartiles. In this method the data set
arranged in ascending order is divided into four equal parts. The “first hinge” is the
lower hinge, and the “third hinge” is the upper hinge. In effect the lower hinge coincides
with Q 1 , and the upper hinge coincides with
Q3 .
62
Central Tendency
Name Date
Course-Section Score
20 18 17 19 20 22 24 18 26 18 21
22 58 24 50 29 52 31 27 44 49 40 31 32 44
a) Compute for Q 1
b) Compute for
Q3
63
Central Tendency
3. In beauty contest, participants are rated on four categories: swim – suit, evening
gown, talent and question and answer portion. These categories carry different
weights: swim – suit 20%, evening gown 20%, talent 30%, and question and
answer 30%. Contestants are rated from 1 to 10 for each category, 10 being the
best rating. Judge Sofia Vizco gave Ms. Chona Portina the following ratings:
Use weighted average to compute Ms. Chona Portina’s score from Judge Sofia.
Vizco’s score card.
64
Central Tendency
Name Date
Course-Section Score
105 - 109 5
110 - 114 13
115 - 119 10
120 - 124 6
125 - 129 6
130 - 134 5
Total 50
a) Obtain the mean of the grouped data. Use the modified table below to
organize your solution.
105 - 109 5
110 - 114 13
115 - 119 10
120 - 124 6
125 - 129 6
130 - 134 5
65
Central Tendency
Total 50
b) Obtain the median of the grouped data in Table 4.3. Use the modified table
below to organize your solution.
105 - 109 5
110 - 114 13
115 - 119 10
120 - 124 6
125 - 129 6
130 - 134 5
Total 50
66