Statistical Analysis With Computer Applications: Dr. Angelica M. Aquino

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 40

Statistical Analysis with Computer

Applications

Dr. Angelica M. Aquino

60
Table of Contents

Module 4: Measures of Central Tendency 60


Introduction 60
Learning Outcomes 60
Lesson 1. Concepts and Computation for Ungrouped Data 61
Lesson 2. Concepts for Grouped Data 63
Lesson 3. Computation of the Mean for Ungrouped Data 73
Assessment Tasks 87
Summary 91

Module 5: Measures of Variability 92


Introduction 92
Learning Outcomes 92
Lesson 1. What is Variability? 92
Lesson 2. Common Measures of Dispersion or Variability of Score 93
Lesson 3. Guiding Principle for Measures of Variability in the
Interpretation of Data 94
Assessment Tasks 96
Summary 97
References 97

61
MODULE 4
MEASURE OF CENTRAL TENDENCY (UNGROUPED
DATA)

Introduction

Once the data have been organized and presented in tables and graphs, the researcher
must be able to describe in terms of a single value. This value, which gives a summary of the
characteristics of a given set of data is called Measures of Central Tendency.

We shall consider in this lesson, the three most important measures of central tendency: the
mean, the median and the mode.

Learning Outcomes

At the end of this module, the students are expected to:


1. Differentiate grouped from ungrouped data.
2. Define mean, median, and mode
3. Calculate the mean, median, and mode for ungrouped data.
4. Apply the weighted arithmetic mean in a distribution with weighted scores.
5. Give the advantages of using mean, median, or mode as a measure of central
tendency.
6. Identify the most appropriate measure of central tendency in a certain
distribution.
7. Determine the quartile, decile, and percentile values in a distribution.

60
Lesson 1. Concepts and Computation for Ungrouped Data

What is Central Tendency?

According to Sanchez (n.d), measure of central tendency is simply the average or typical
value in a set of scores.

In addition, a measure of central tendency is a single value that attempts to describe a set
of data by identifying the central position within that set of data. As such, measures of central
tendency are sometimes called measures of central location. They are also classed as summary
statistics (Sanchez, n.d.).

The mean (often called the average) is most likely the measure of central tendency that you
are most familiar with, but there are others, such as, the median and the mode (Sanchez, n.d.).

The "mean" is the "average" you're used to, where you add up all the numbers and then
divide by the number of numbers. The "median" is the "middle" value in the list of numbers. ... If no
number in the list is repeated, then there is no mode for the list (Sanchez, n.d.).

What is Summation Notation? (Tanbakuchi, 2009)

In Statistics, we often need to sum sets of numbers. It is necessary to work with sums of
numerical values, and to express these, we make use of standard notation.

Summation (Σ) just means to “add up.” For example, let’s say you had 5 items in a data
set: 1,2,5,7,9; you can think of these as x-values. If you were asked to add all of the items up in
summation notation, you would see:
Σ(x) which equals 1 + 2 + 5 + 7 + 9 = 24.

When using summation notation, X1 means “the first x-value”, X2 means “the second x-
value” and so on. For example, let’s say you had a list of weights: 100lb, 150lb, 153lb and 202lb.
The weights and their corresponding x-values are:
X1: 100lb
X2: 150lb
X3: 153lb
X4: 202lb

61
The “i=1” at the base of Σ means “start at your first x-value”. This would be X1 (100lb in this example).
The “n” at the top of Σ means “end at n”. In statistics, n is the number of items in the data set. So
what this summation is asking you to do is “add up all of your x-values from the first to the last.” For
this set of data, that would be:
100 lb + 150 lb + 153 lb + 202 lb = 605 lb.

Note: If you see a number above Σ, instead of n, it means to add up to a certain point. For example,
a “3” above the Σ means to sum up the the third item (X 3) in the set.

Why the difficult notation? Why not just say “add up”? There are cases when you might want
to start at a different point in the data set. Although you probably won’t come across these in
an elementary statistics class, if you go onto more advanced stats (or calculus), you’ll come across
many different variations. So introducing the Σ notation is getting you used to the format, much like
x and y is introduced very early on in basic math.

Summation notation is also a shorthand that helps to avoid long equations. For example,
take this lengthy expression, where a, b, and c are constants, and X And Y are random variables.
(aX1+bY1+c)+(aX2+bY2+c)+(aX3+bY3+c)+(aX4+bY4+c)+(aX5+bY5+c)+(aX5+bY5+c)
This can be written more succinctly in summation notation as:

A More Complicated Example


One of the most challenging formulas you’ll come across in elementary statistics that involves
summation notation is Pearson’s correlation coefficient:

62
The Pearson correlation coefficient

There are multiple summations in the formula and although it’s time consuming to solve, it is fairly
straightforward if you break it down into steps. Note that there are two summations of X in the
formula:
ΣX2, which means to square the x-values and add them all up
and
(ΣX)2, which means to add up all of the x-values and then square.

Lesson 2. Concepts for Grouped Data (Frost, 2018)

A measure of central tendency is a summary statistic that represents the center point or
typical value of a dataset. These measures indicate where most values in a distribution fall and
are also referred to as the central location of a distribution. You can think of it as the tendency of
data to cluster around a middle value.
The most commonly used measure of central tendency is the arithmetic mean. It is called
the mean or the computed average.
It is defined to be the sum of the values of a group of items divided by the number of such
items.

Characteristics of the Mean

The mean is a reliable or a more stable measurement to use when a sample data are being
used to make inferences about populations. It is the point in which balances all the values on either
side. The mean is sensitive or is greatly affected by the values, high or low and this makes it an
inappropriate average to use when the distribution is highly skewed. It loses its representative
quality. The mean cannot be computed when the distribution contains open-minded intervals in the
balance of additional information.

Uses of the Mean

The mean is the most commonly used easily understood, easily calculated and generally
recognized average. It is the best measure to use when the distribution is symmetrical. It is useful
measure for inferential statistics. It is also used to obtain an average value of a series of values after
each item is weighted. It is referred to as weighted average.

63
Central Tendency for Grouped Data

As mentioned by Frost (2018), the central tendency of a distribution represents one


characteristic of a distribution. Another aspect is the variability around that central value.

Data which are arranged in a frequency distribution is called grouped data. Once the data
have been organized and presented in tables and graphs, the researcher must be able to describe
in terms of a single value. This value, which gives a summary of the characteristics of a given set of
data is called Measures of Central Tendency.

We shall consider in this lesson, the three most important measures of central tendency: the
mean, the median and the mode for grouped data.

Data which are arranged in a frequency distribution is called grouped data. Observations
belonging to each class interval are represented by the classmark of the interval.

The Mean for Grouped Data

Steps in Computing the Mean:


1. Calculate the midpoint or class marks of all class intervals.
2. Multiply each class mark by their corresponding frequency.
3. Add the products of each in No. 2
4. Divide the sum by the total number of cases (n) to obtain the mean.

64
Median from Grouped Data

Steps in Computing Median from Grouped data (Frost, 2018)


1. Determine the median class.
Divide n by 2 (n/2)
Construct the less than cumulative frequency column in the table.
2. Locate the n/2 in the cumulative frequency to determine the median class.
3. Get the lower boundary of the median class.
4. From the computed n/2, subtract the ˂F.
5. Divide the difference by the frequency of the median class, then multiply the
quotient by the class size (i)
6. Add the obtained value in No. 5 to the lower boundary of the median class.

Mode for Grouped Data


Steps in Computing the Mode from Grouped data (Increasing Order) (Frost, 2018)

1. Determine the modal class. The modal class is the highest frequency in the
distribution. In the example 26- 30 which has 14 frequency.
2. Get the lower boundary of the modal class. Determine delta 1 and delta 2.
Delta 1 = the difference of the highest frequency and the frequency just above
it. In the example (14-7 =7)
Delta 2= the difference of the highest frequency and the frequency just below it.
In the example (14 – 8 = 6)
3. Multiply the results in No. 3 with the class size (i). In the example, 5.
4. Add the answer in number 4 by the lower boundary of the modal class.
Example:

Mean, Median and Mode


from Grouped Frequencies (Frost, 2018)

Explained with Three Examples

The Race and the Naughty Puppy

65
This starts with some raw data (not a grouped frequency yet)

Alex timed 21 people in the sprint race, to the nearest second:

59, 65, 61, 62, 53, 55, 60, 70, 64, 56, 58, 58, 62, 62, 68, 65, 56, 59, 68, 61, 67

To find the Mean Alex adds up all the numbers, then divides by how many numbers:

Mean = 59 + 65 + 61 + 62 + 53 + 55 + 60 + 70 + 64 + 56 + 58 + 58 + 62 + 62 + 68 + 65 + 56 + 59
+ 68 + 61 + 6721
= 61.38095...

To find the Median Alex places the numbers in value order and finds the middle number.

In this case the median is the 11 th number:

53, 55, 56, 56, 58, 58, 59, 59, 60, 61, 61, 62, 62, 62, 64, 65, 65, 67, 68, 68, 70

Median = 61

To find the Mode, or modal value, Alex places the numbers in value order then counts how many
of each number. The Mode is the number which appears most often (there can be more than one
mode):

53, 55, 56, 56, 58, 58, 59, 59, 60, 61, 61, 62, 62, 62, 64, 65, 65, 67, 68, 68, 70

62 appears three times, more often than the other values, so Mode = 62

Grouped Frequency Table

Alex then makes a Grouped Frequency Table:

66
Seconds Frequency

51 - 55 2

56 - 60 7

61 - 65 8

66 - 70 4

So 2 runners took between 51 and 55 seconds, 7 took between 56 and 60 seconds, etc

The original data may get lost. Only the Grouped Frequency Table survived ...

... can we help Alex calculate the Mean, Median and Mode from just that table?

The answer is ... no we can't. Not accurately anyway. But, we can make estimates.

Estimating the Mean from Grouped Data

So all we have left is:

Seconds Frequency

51 - 55 2

56 - 60 7

61 - 65 8

66 - 70 4

The groups (51-55, 56-60, etc), also called class intervals, are of width 5

The midpoints are in the middle of each class: 53, 58, 63 and 68

67
We can estimate the Mean by using the midpoints.

So, how does this work?

Think about the 7 runners in the group 56 - 60: all we know is that they ran somewhere between 56
and 60 seconds:

 Maybe all seven of them did 56 seconds,


 Maybe all seven of them did 60 seconds,
 But it is more likely that there is a spread of numbers: some at 56, some at 57, etc

So we take an average and assume that all seven of them took 58 seconds.

Let's now make the table using midpoints:

Midpoint Frequency

53 2

58 7

63 8

68 4

Our thinking is: "2 people took 53 sec, 7 people took 58 sec, 8 people took 63 sec and 4 took 68
sec". In other words we imagine the data looks like this:

53, 53, 58, 58, 58, 58, 58, 58, 58, 63, 63, 63, 63, 63, 63, 63, 63, 68, 68, 68, 68

68
Then we add them all up and divide by 21. The quick way to do it is to multiply each midpoint by
each frequency:

Midpoint Frequency Midpoint ×


Frequency
x f fx

53 2 106

58 7 406

63 8 504

68 4 272

Totals: 21 1288

And then our estimate of the mean time to complete the race is:

Estimated Mean = 128821 = 61.333...

Very close to the exact answer we got earlier.

Estimating the Median from Grouped Data

Let's look at our data again:

Seconds Frequency

51 - 55 2

56 - 60 7

61 - 65 8

66 - 70 4

69
The median is the middle value, which in our case is the 11 th one, which is in the 61 - 65 group:

We can say "the median group is 61 - 65"

But if we want an estimated Median value we need to look more closely at the 61 - 65 group.

We call it "61 - 65", but it really includes values from 60.5 up to (but not including) 65.5.

Why? Well, the values are in whole seconds, so a real time of 60.5 is measured as 61. Likewise
65.4 is measured as 65.

At 60.5 we already have 9 runners, and by the next boundary at 65.5 we have 17 runners. By
drawing a straight line in between we can pick out where the median frequency of n/2 runners is:

And this handy formula does the calculation:

Estimated Median = L + (n/2) − BG × w

where:

 L is the lower class boundary of the group containing the median


 n is the total number of values
 B is the cumulative frequency of the groups before the median group

70
 G is the frequency of the median group
 w is the group width

For our example:

 L = 60.5
 n = 21
 B=2+7=9
 G=8
 w=5

Estimated Median= 60.5 + (21/2) − 98 × 5


= 60.5 + 0.9375
= 61.4375

Estimating the Mode from Grouped Data (Frost, 2018)

Again, looking at our data:

Seconds Frequency

51 - 55 2

56 - 60 7

61 - 65 8

66 - 70 4

We can easily find the modal group (the group with the highest frequency), which is 61 - 65

We can say "the modal group is 61 - 65"

But the actual Mode may not even be in that group! Or there may be more than one mode.
Without the raw data we don't really know.

71
But, we can estimate the Mode using the following formula:

Estimated Mode = L + fm − fm-1(fm − fm-1) + (fm − fm+1) × w

where:

 L is the lower class boundary of the modal group


 fm-1 is the frequency of the group before the modal group
 fm is the frequency of the modal group
 fm+1 is the frequency of the group after the modal group
 w is the group width

In this example:

 L = 60.5
 fm-1 = 7
 fm = 8
 fm+1 = 4
 w=5

Estimated Mode= 60.5 + 8 − 7(8 − 7) + (8 − 4) × 5


= 60.5 + (1/5) × 5
= 61.5

Our final result is:

 Estimated Mean: 61.333...


 Estimated Median: 61.4375
 Estimated Mode: 61.5

(Compare that with the true Mean, Median and Mode of 61.38..., 61 and 62 that we got at the very
start.)

72
Lesson 3. Computation of the Mean for Ungrouped Data
(Frost, 2018)

Let us look into some example problems to understand the above concept.
Question 1:
The marks obtained by 10 students in a test are 15, 75, 33, 67, 76, 54, 39, 12, 78, 11. Find the
arithmetic mean.
Solution:
Mean = Total marks of 10 students / 10
= (15 + 75 + 33 + 67 + 76 + 54 + 39 + 12 + 78 + 11) / 10
= 460 / 10
= 46

Question 2:
Find the mean of 2, 4, 6, 8, 10 , 12, 14, 16.
Solution:
Mean = Sum of given numbers / 8
= (2 + 4 + 6 + 8 + 10 + 12 + 14 + 16) / 8
= 72 / 8

73
= 9

Question 3:
John studies for 4 hours, 5 hours and 3 hours respectively on three consecutive days. How many
hours does he study daily on an average?
Solution:
The average study time of John
= Total number of study hours / Number of days for which he studied
= (4 + 5 + 3) / 3
= 12 / 3
= 4 hours
Thus, we can say that John studies for 4 hours daily on an average.

Question 4:
A batsman scored the following number of runs in six innings:
36, 35, 50, 46, 60, 55
Calculate the mean runs scored by him in an inning.
Solution:
To find the mean, we find the sum of all the observations and divide it by the number of
observations.
Mean = Total runs / Number of innings
= (36 + 35 + 50 + 46 + 60 + 55) / 6
= 47
Thus, the mean runs scored in an inning are 47.

Question 5:
The ages in years of 10 teachers of a school are:
32, 41, 28, 54, 35, 26, 23, 33, 38, 40
What is the mean age of these teachers?
Solution:
Mean age of the teachers
= Sum of age of teachers / Number of teachers
74
= (23 + 26 + 28 + 32 + 33 + 35 + 38 + 40 + 41 + 54) /10
= 350 / 10
= 35 years
Question 6:
Following table shows the points of each player scored in four games:
Player Game 1 Game 2 Game 3 Game 4
A 14 16 10 10
B 0 8 6 4
C 8 11 Did not play 13

Now answer the following questions:


(i) Find the mean to determine A’s average number of points scored per game.
(ii) To find the mean number of points per game for C, would you divide the total points by 3 or by
4? Why?
(iii) B played in all the four games. How would you find the mean?
(iv) Who is the best performer?

Solution:
(i) Mean score of A = (14 + 16 + 10 + 10) / 4
= 12.5
Mean score of A per game is 12.5

(ii) To find the mean number of points per game for C, we have to divide the total points by 3.
Because he didn't participate in game 3. Total number of games he played is 3.

(iii) Mean score of B = (0 + 8 + 6 + 4) / 4


= 18/4
= 4.5
Mean score of B per game is 4.5

(iv) To choose the best performer, we have to find the mean score of each player.
Mean score of C = (8 + 11 + 13) / 3
= 32/3
= 10.6
75
Mean score of A per game is 12.Mean score of B per game is 4.5Mean score of C per game is
10.6
Hence C is the best performer.
The Weighted Mean for Ungrouped Data
A weighted mean is a kind of average. Instead of each data point contributing equally
to the final mean, some data points contribute more “weight” than others.If all the weights
are equal, then the weighted mean equals the arithmetic mean (the regular “average” you’re
used to).
The weighted mean is a type of mean that is calculated by multiplying the weight (or
probability) associated with a particular event or outcome with its associated quantitative
outcome and then summing all the products together. It is very useful when calculating a
theoretically expected outcome where each outcome has a different probability of occurring,
which is the key feature that distinguishes the weighted mean from the arithmetic mean.
The benefit of using a weighted average is that it allows the final average number to
reflect the relative importance of each number that is being averaged

Formula for Weighted Mean.

Figure 1. Weighted Mean


Taken from https://www.wallstreetmojo.com/weighted-mean-formula
To use the formula:
1. Multiply the numbers in your data set by the weights.
2. Add the numbers in Step 1 up. Set this number aside for a moment.
3. Add up all of the weights.
76
4. Divide the numbers you found in Step 2 by the number you found in Step 3.

Weighted mean is calculated by multiplying the weight with the quantitative outcome
associated with it and then adding all the products together. If all the weights are equal, then the
weighted mean and arithmetic mean will be the same.

Where:
∑ denotes the sum
w is the weights and
x is the value
In cases where the sum of weights is 1,

Step 1: List the numbers and weights in tabular form. Presentation in tabular form is not
compulsory but makes the calculations easy.
Step 2: Multiply each number and relevant weight assigned to that number (w 1 by x1, w2 by x2 and
so on)
Step 3: Add the numbers obtained in Step 2 (∑x1wi)
Step 4: Find the sum of the weights (∑wi)
Step 5: Divide the total of the values obtained in Step 3 by the sum of the weights obtained in Step
4 (∑x1wi/∑wi)
Note: If the sum of the weights is 1, then the total of the values obtained in Step 3 will be the
weighted mean.

77
Example #1
The following are 5 numbers and the weights assigned to each number. Calculate the weighted
mean of the above numbers.
Solution:

WM will be –

78
Example #2
The CEO of a company has decided that he will continue the business only if the return on capital
is more than the weighted average cost of capital. The company makes a return of 14% on its
capital. The capital consists of equity and debt in the proportion of 60% and 40% respectively. The
cost of equity is 15% and the cost of debt is 6%. Advise the CEO on whether the company should
continue with its business.
Solution:
Let us first present the given information in tabular form to understand the scenario under.
We will use the following data for the calculation.

WM =0.60*0.15 + 0.40*0.06
= 0.090 + 0.024

Since the return on capital at 14% is more than the weighted average cost of capital of 11.4%, the
CEO should continue with his business.
Example #3

79
It is difficult to gauge the future economic scenario. The stock returns could get affected. The finance
advisor develops different business scenarios and expected stock returns for each scenario. This
would enable him to make a better investment decision. Calculate the weighted mean average from
the above data to help the Investment Advisor to showcase the expected stock returns to his clients.
Solution:
We will use the following data for the calculation.

=0.20*0.25 + 0.30*(-0.10) + 0.50*0.05


= 0.050 – 0.030 + 0.025
WM will be –

The expected return for the stock is 4.5%.

80
Example #4
Jay is a rice merchant who sells various types of rice in Maharashtra. Some rice grades are of higher
quality and are sold at a higher price. He wants you to calculate the weighted mean from the
following data:
Solution:
We will use the following data for the calculation.

Step 1: In Excel, there is an inbuilt formula for calculating the products of the numbers and then their
sum, which is one of the steps in calculating the weighted mean. Select a blank cell and type this
formula = SUMPRODUCT(B2: B5, C2: C5) where the range B2: B5 represents the weights and the
range C2: C5 represents the numbers.

Step 2: Calculate the sum of the weights using the formula =SUM(B2:B5) where the range B2:B5
represents the weights.

81
Step 3: Calculate =C6/B6,

WM will be –

This gives the WM as Rs 51.36.

Lesson 5. The Median for Ungrouped Data (Frost, 2018)

Median is the "middle" or mid value of a data set.


To find the median, you have to first order the numbers - from smallest to largest,
and then find the middle number.

82
Basic examples

 In a set of three numbers - the middle number is the number in 2nd position.

 So, in a set of 10, 20, 30, the middle number is 20.

 In a set of five numbers - the middle number is the number in 3rd position since there are
two numbers before it and two numbers after it.

 So, in a set of 10, 20, 30, 40, 50, the middle number is 30.

Example with detailed explanation

Find the median of the following numbers :


10, 15, 50, 45, 60, 90, 25

Median is 45.
Explanation:
1. When you put the numbers in numerical order, it will look like this:
10, 15, 25, 45, 50, 60,90
2. There are seven numbers in the series, and out of which the middle number is the fourth from the
start. This divides the set into two equal parts - three on the left side and three on the right side of
the fourth number.
3. So, the median is 45.

Determination of the Median of Ungrouped Data


The formulas for calculating the median of an ungrouped data, which has "n" number of observations
are:

83
Ungrouped Data Median Example
Example 1: Median of coffee prices
There are 7 boxes of coffee, which are priced as below (in US dollars). Find the median.
65, 49, 87, 54, 90, 95, 70.

Median is 70

Explanation:

First of all, we need to arrange the data in the ascending order. The arranged data is:

 49
 54
 65
 70
 87
 90
 95

Number of observations = 7 (odd)


Since the number of observations is odd, we can find the median using the following formula:

median = (n + 1)/ 2 where n = frequency

median = (7+1) /2

median = 4, the fourth is 70.

Example 2:

Median of sugar packets

There are 10 packets of biscuits with sugar content (in grams) as shown below. Find the median.
20, 35, 18, 40, 34 , 42, 25, 30, 29, 44

Median is 32
Explanation:
Given data needs to be arranged in ascending order as shown in the table below:
 18

84
 20
 25
 29
 30
 34
 35
 40
 42
 44

Number of observations = 10 (even)


Since the number of observations is even, we can find the median using the following formula:
Median = (5th + 6th)/2
Median = (30 + 34)/2

Median = 32

The Mode for Ungrouped Data

The most frequently occurring observation in a data set is called Mode (also known
as modal value).This is a measure of the tendency of the data values.

Example

Find the mode of 2, 5, 5, 8 and 9


In this case, the most frequently occurring value is 5. So, the mode is 5.

How to Find Mode of a Data (Mode of Ungrouped Data)

Mode of an ungrouped data is equal to the most frequent observation in the data. Data can
consist of more than one mode.
A data distribution with one mode value is called unimodal whereas distributions with more
than one mode values is called multimodal (they can be bimodal, trimodal etc.)

85
Example 1
Find mode of the record of marks shown below:
4, 6, 7, 7, 8, 9, 3, 9, 8, 6, 5, 5, 5, 4, 3, 8, 2, 7, 9, 2, 4, 8, 6, 9, 8
State whether the distribution is unimodal, bimodal or trimodal.

The mode is 8, and the data distribution is unimodal.

Explanation:

First we arrange the data in the ascending order as shown below:

2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 7, 7, 7, 8, 8, 8, 8, 8, 9, 9, 9, 9

Next we try to determine the most frequently occurring observation, which in this case is 8.

=> Mode = 8

Since there is only one most frequently occurring value (which is 8), the data distribution is unimodal.

Example 2

Find mode of the record of marks shown below:


1, 3, 5, 3, 2, 4, 4, 5, 6, 1, 3, 4, 2, 5, 1, 1, 3, 2, 6, 4
State whether the distribution is unimodal, bimodal or trimodal.

The modal values are 1, 3 and 4, and the data distribution is trimodal.

Explanation:

First we arrange the data in the ascending order as shown below:

1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 6, 6

Next we try to determine the most frequently occurring observation. Note that each of 1, 3 and 4
appear 4 times.

Therefore 1, 3, 4 are modes of the given set of data.

=> Modal values = 1, 3, 4

Since there are 3 modal values in the given data set, the data distribution is trimodal.

86
AT#1
Mr. Ramos, an instructor at Laguna University, assigns Statistics practice problems to
be worked via the net. Students must use a password to access the problems and the
time of log-in and log-off are automatically recorded for the teacher. At the end of the
week, the teacher examines the amount of time each student spent working the
assigned problems. The data is provided below in minutes.

Data

15

28

25

48

22

43

49

34

22

33

27

25

22

20

39

1.Find the mean.

2. Find the median.

3. Find the mode.

Save your file as LN_PF_AT1.xls

87
Assessment Task 1

Question: What does this information tell you about students' length of time on the
computer solving Statistics problems?
Answers:

1. Mean
_____________________________________
2. Median
_____________________________________
3. Mode
______________________________________
4. __________________________________________________________________
__
__________________________________________________________________
__
__________________________________________________________________
__

88
Assessment Task 2

A group committed to quality television has been concerned about a new talk show.
For two weeks, they decide to count the number of words that must be "bleeped" as
too obscene for television and the number of physical altercations. They hope that
after recording this data that they will be able to argue that the show is inappropriate
for television particularly during the day. The data for number of words censored is
provided below.

Data

342

267

321

157

33

254

166

132

289

349

1. Find the mean


2. Find the median
3. Find the mode
4. What does this information tell you about the talk show?

Save your file as LN_PF_AT2.xls

89
Assessment Task 3

Nora wants to buy a new camera, and decides on the following rating system:

 Image Quality 50%


 Battery Life 30%
 Zoom Range 20%

The Sonu camera gets 8 (out of 10) for Image Quality, 6 for Battery Life and 7
for Zoom Range

The Conan camera gets 9 for Image Quality, 4 for Battery Life and 6 for Zoom
Range

Which camera is best?

Save your file as LN_PF_AT3.xls

Assessment Task 4

Ana usually eats lunch 7 times a week, but some weeks only gets 1, 2, or 5 lunches.

Alex had lunch:

 On 2 weeks: only one lunch for the whole week


 on 14 weeks: 2 lunches each week
 on 8 weeks: 5 lunches each week
 on 32 weeks: 7 lunches each week

What is the mean number of lunches Ana has each week?

Save your file as LN_PF_AT4.docx

90
Summary

In this module, we have learned that the measure of central tendency is a summary statistic
that represents the center point or typical value of a dataset. These measures indicate where most
values in a distribution fall and are also referred to as the central location of a distribution. You can
think of it as the tendency of data to cluster around a middle value. In statistics, the three most
common measures of central tendency are the mean, median, and mode. Each of these measures
calculates the location of the central point using a different method.

An average that uses the exact value of each entry is the mean (sometimes called the
arithmetic mean). To compute the mean, we add the values of all the entries and then divide by the
number of entries. The mean is the average usually used to compute a test average. The median
is the central value of an ordered distribution. The mode of a data set is the value that occurs most
frequently.

91
MODULE 6
MEASURES OF VARIABILITY

Introduction

Descriptive measures that are used to indicate the amount of variation in a data set are
called measures of variability, dispersion, or spread. When descriptive statistics are presented, there
is usually at least one measure of central tendency and at least one measure of variability reported.
The measures of dispersion to be discussed are the range, mean absolute deviation, quartile
deviation, interquartile range, variance, and standard deviation for the ungrouped data.

Learning Outcomes

At the end of this module, the students are expected to:


1. Solve for range, quartile deviation, mean absolute deviation, and standard deviation
for ungrouped data.
2. Solve for quartile, decile and percentile for ungrouped data.
3. Use calculator and MS Excel to facilitate faster and easier computation.

Lesson 1. What is Variability? (Frost, 2018)

A measure of variability is a summary statistic that represents the amount of dispersion in a


dataset. How spread out are the values? While a measure of central tendency describes the typical
value, measures of variability define how far away the data points tend to fall from the center. We
talk about variability in the context of a distribution of values. A low dispersion indicates that the data
points tend to be clustered tightly around the center. High dispersion signifies that they tend to fall
further away.

Variability refers to how "spread out" a group of scores is. It is described as the extent of
“scattering” of individual items about the average on point of central location i.e. range, quartile
deviation, mean absolute deviation and standard deviation.
92
Lesson 2. Common Measures of Dispersion or Variability of
Scores (Frost, 2018)

The Range
The range is the simplest and the easiest of the measures of dispersion.
It simply measures the distance given by the highest score and the lowest score.
It is considered as the least satisfactory measure of dispersion because it does not tell
anything about the scores between these two extremes.

The Quartile Deviation/Semi-Interquartile Range


It is half the difference between P75 and P25 in the distribution.

The Mean Absolute Deviation


This is referred to as the Average Deviation. This measures the absolute dispersion
that is affected by every individual score. It is the mean of the absolute deviation of the
individual scores from the mean of all the scores. It gives equal weight to the deviation of
every observation.
The Standard Deviation
It is considered a special form of measure of dispersion that involves all the individual
values of the items in the distribution rather than through extreme scores.
It is important as a measure of heterogeneity or unevenness within a set of observations.
Increasing in value as the distribution of scores becomes more heterogehoues.
The Variance
It is the square of the deviation from the mean.

93
Lesson 3. Guiding Principle for Measures of Variability in the
Interpretation of Data (Frost, 2018)

- The lesser the value of the measure, the more consistent, the more homogeneous and
the less scattered are the observations in the set of data.

- If there is a large amount of variation, then on average, the data values will be far from
the mean. Hence, the SD will be large.

- If there is only a small amount of variation, then on average, the data values will be
close to the mean. Hence, the SD will be small.

The Standard Deviation (Ungrouped Data)

The standard deviation is a statistic that measures the dispersion of a dataset relative to its
mean and is calculated as the square root of the variance. It is calculated as the square root of
variance by determining the variation between each data point relative to the mean. If the data points
are further from the mean, there is a higher deviation within the data set; thus, the more spread out
the data, the higher the standard deviation.

Steps in determining the standard deviation from ungrouped data:


1. Compute the mean.
2. Get the deviation from the mean.
3. Square the deviations from the mean.
4. Get the sum of the squared deviations.
5. Divide the sum by the total frequency minus 1.
6. Extract the square root of the quotient obtained in #5.
7. The result is the standard deviation.

Example Problem
You grow 20 crystals from a solution and measure the length of each crystal in millimeters. Here is
your data:
9, 2, 5, 4, 12, 7, 8, 11, 9, 3, 7, 4, 12, 5, 4, 10, 9, 6, 9, 4
Calculate the sample standard deviation of the length of the crystals.

94
Calculate the mean of the data. Add up all the numbers and divide by the total number of data
points.(9+2+5+4+12+7+8+11+9+3+7+4+12+5+4+10+9+6+9+4) / 20 = 140/20 = 7
Subtract the mean from each data point (or the other way around, if you prefer... you will be squaring
this number, so it does not matter if it is positive or negative).(9 - 7)2 = (2)2 = 4
(2 - 7)2 = (-5)2 = 25
(5 - 7)2 = (-2)2 = 4
(4 - 7)2 = (-3)2 = 9
(12 - 7)2 = (5)2 = 25
(7 - 7)2 = (0)2 = 0
(8 - 7)2 = (1)2 = 1
(11 - 7)2 = (4)22 = 16
(9 - 7)2 = (2)2 = 4
(3 - 7)2 = (-4)22 = 16
(7 - 7)2 = (0)2 = 0
(4 - 7)2 = (-3)2 = 9
(12 - 7)2 = (5)2 = 25
(5 - 7)2 = (-2)2 = 4
(4 - 7)2 = (-3)2 = 9
(10 - 7)2 = (3)2 = 9
(9 - 7)2 = (2)2 = 4
(6 - 7)2 = (-1)2 = 1
(9 - 7)2 = (2)2 = 4
(4 - 7)2 = (-3)22 = 9

Calculate the mean of the squared


differences.(4+25+4+9+25+0+1+16+4+16+0+9+25+4+9+9+4+1+4+9) / 19 = 178/19 = 9.368
This value is the sample variance. The sample variance is 9.368
The population standard deviation is the square root of the variance. Use a calculator to obtain this
number.(9.368)1/2 = 3.061The population standard deviation is 3.061

95
Assessment Task 5

1. Find the mean deviation for the following hourly rate of randomly selected
employees in Laguna. 106, 140, 175, 196, 238

2. The following are data on ATM transaction times in seconds. Find the
measures of central tendency and dispersion.
32 41 51 42 39 32
43 35 33 32 42 33
Find the following:
Range _________________________________
Quartile Deviation ______________________________
Mean Absolute Deviation ________________________
Standard Deviation ____________________________
3. From the following sales of ABC Merchandise, determine the:

a. Quartile deviation
b. Mean absolute deviation
c. Standard deviation
d. Variance
P20,000 P10,300 P16,300 P25,000 P18,400
P15,000 P11,600 P 9,600 P17,000 P15,900

Save your file as LN_PF_AT5.xls

96
Summary
In this module, measure of variability was described as a summary statistic that
represents the amount of dispersion in a dataset. Also, how spread out are the values was
explained. While a measure of central tendency describes the typical value, measures of
variability define how far away the data points tend to fall from the center. We talk about
variability in the context of a distribution of values. A low dispersion indicates that the data
points tend to be clustered tightly around the center. High dispersion signifies that they tend
to fall further away.

We have also learned the standard deviation as a statistic that measures the
dispersion of a dataset relative to its mean and is calculated as the square root of
the variance. It is calculated as the square root of variance by determining the variation
between each data point relative to the mean. If the data points are further from the mean,
there is a higher deviation within the data set; thus, the more spread out the data, the higher
the standard deviation. The variance is the square of the standard deviation.

References
Ali, Z. & Bhaskar S. (2016). Basic Statistical Tools in Research and Data Analysis. Indian

journal of anaesthesia 60(9):662 · September 2016 with 185,956 Reads

DOI: 10.4103/0019-5049.190623
Comprehensive Medicinal Chemistry III. (2017). Statistical Analysis. Retrieved from

https://www.sciencedirect.com/topics/medicine-and-dentistry/statistical-analysis

Frost, J. (2018). Measures of Central Tendency: Mean, Median, and Mode. Retrieved from

https://statisticsbyjim.com/basics/measures-central-tendency-mean-median-mode/

Schmuller, J. (2013). Statistical-Analysis-with-Excel-For-Dummies-3rd-Edition. John Wiley &

Sons, Inc., Hoboken, New Jersey

97

You might also like