Module 3 Describing Data

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

SED Math 213 Elementary Statistics and Probability

Module 3: Describing Data

Introduction

How fast is your internet connection?

During the “new normal”, teachers and students alike need


internet connection to cope up with the new trend in the
teaching and learning process.

To compare internet speed given by different internet


providers, statisticians make comparisons using descriptive
statistics.

According to OOKLA’s June 2020 Speedtest Global Index


report, the Philippines’ mobile connection has an average
download speed of 16.7 megabits per second (Mbps) placing https://www.rappler. com
the country at 114th out of 134 countries. For fixed broadband,
the country is at 108th place out of 174 countries with an
average download speed of 23.74 Mbps. To see how other
countries compared to this average, see Speedtest Global Index
at https://www.speedtest.net/global-index.

Numerical measures such as average, position, and spread


touch many things that the general public use on a daily basis.

This module will show you how to obtain and interpret


descriptive statistics such as measures of average, measures of https://www.twobirdsbreakingfree.com
variation, and measures of position.

Learning Outcomes

After completing this module, you should be able to:

 Summarize data using measures of central tendency such as mean, weighted mean, median, and
mode;
 Describe data using measures of variation such as range, variance, standard deviation, and
coefficient of variation;
 Identify the position of a data value in a data set using various measures of position such as
standard scores, percentiles and quartiles;
 Use the techniques of exploratory data analysis, including boxplots and five-number summaries
to discover various aspects of data.
 Advocate the use of statistical data in making important decisions

Learning Content

Suppose that you are browsing the net to look for information on internet providers in your place. To
make an informed decision regarding your planned subscription, you decide to collect as much
information as possible such as availability, monthly payment, internet speed, volume allowance, and
installation fee. What information is important in helping you make this decision?

Information from the net is sometimes presented to us in a frequency distribution. When we look at a
distribution of data, we should consider the three characteristics of the distribution: shape, center, and
spread.

Isabela State University Page 1


SED Math 213 Elementary Statistics and Probability

Graphs are useful for the visual description of a data set, however, they are not always the best tool
when you want to make inferences or make decisions using information based on a sample. Hence, it
is better to use numerical measures such as center, relative position, and spread to describe the data.

3.1. Measures of Central Tendency

There is a natural tendency for data to group around a central point. Any data set can be characterized
by measuring its central tendency. For teachers, one of the most useful statistics is the center point of
the data. Knowing the center point answers questions such as, “what is the average score?” or “who
scored below average?”

Definition: The measure of central tendency is a single score/value that indicates the center of a
distribution or data set.

Measures of Central Tendency are descriptive measures that are used to indicate where the center, the
middle property, or the most typical value of a set of data lies. There are three fundamental statistics
that measure the central tendency of data: the mean, median, and mode. All three provide insights into
“the center” of a distribution of data points.

A. The Mean

The mean is the most commonly used measure of center. It is the measure of central tendency that you
are most familiar with.

Definition: The mean, or arithmetic mean, of a set of data is equal to the sum of all the values
divided by the total number of data values.

How to find the mean?


1. Compute ∑ ; that is add all the data values.
2. Divide the total by the number of data values; that is n for sample and N for population.
Sample mean Population mean
∑ ∑
̅= =

The symbol ̅ , called “x bar”, is used to denote the mean of the sample and the symbol , called
“mu”, is used to represent the mean of the population.

Properties of Mean
 A data set has one mean.
 Easy to calculate and all scores/values in a data set are included in computing the mean.
 The mean is the balance point in any shaped distribution.
 Mean is only applicable to interval and ratio scales of data.
 Mean is sensitive to the size (or weight) of each score and is affected by outliers.

Example 1. The data in Table 1 represent the different Table 1


Internet Service Providers’ (ISPs) internet ISPs Download Speed
download speed for a particular day. Smart Bro 9.67 Mbps
Converge ICT 20.92 Mbps
a. Find the mean download speed. Solutions
b. Is the available information enough for you to Globe Telecom 7.11 Mbps
decide on what ISP to subscribe? PLDT 18.93 Mbps
Sky Cable 9.64 Mbps
www.broadbandspeedchecker.co.uk

Isabela State University Page 2


SED Math 213 Elementary Statistics and Probability

Solution:

a. Mean download speed


̅= =

̅= = 13.25 Mbps

b. The given information is not enough since you have to consider other factors such as
availability, upload speed, volume allowance, installation fee, and early termination fee
to decide on your internet subscription.

Example 2. A sample of n = 7 scores has a mean of 12. If a new person with a score of 10 is added to
the sample, what is the value for the new sample mean?

Solution:


̅= = ; n= 7, ̅ = 12, ∑ = 84
The new sample has n= 8, ∑ = 84 + 10 = 94. Thus, ̅ = = 11.75

Note: Some people refer to the mean as the “average”. In fact, there are many kinds of average;
the mean is just one of them.

Weighted Mean

As student, are you aware of how your instructors compute your final grade? How course units affect
the computation of your “General Weighted Average (GWA)” at the end of semester?

Weighted mean is typically used when computing for your academic performance. You can find your
GWA in the Certificate of Grades (CoG) issued by the Office of the Registrar every end of the
semester. If you did poorly in some of your courses with higher units, then your GWA will be greatly
affected.

Definition: A weighted mean is the average of all the entries with varying weights in a given data
set. The weighted mean is found by multiplying each value by its corresponding
weight and dividing the total by the sum of all the weights.

How to find the weighted mean?


1. Compute ∑ by multiplying each data value by its corresponding weight then get the
sum.
2. Divide the total by the sum of all the weights.

̅= ∑ =

Example 3. In Ms. A’s Statistics class, your grade is determined from the following sources: 25%
each from prelim, midterm, and final examinations; 10% from your quizzes; 10% from
your problem sets; and 10% from your participation. Your scores are 85 (prelim), 89
(midterm), 82 (finals), 90 (quizzes), 95 (problem sets), and 98 (participation). What is
your grade in Statistics?

Isabela State University Page 3


SED Math 213 Elementary Statistics and Probability

Solution:
Source Weight (w) Score (x) w. x
Prelim 0.25 85 21.25
Midterm 0.25 89 22.25
Final 0.25 82 20.5
Quizzes 0.10 90 9.0
Problem Sets 0.10 95 9.5
Participation 0.10 98 9.8
∑ = 1.0 ∑ = 92.3


̅= ∑
= = 92.3

Mean for Grouped Data

When dealing with data presented in a frequency distribution, access to the raw data is not possible.
Thus, to compute for the mean the computation is based on the use of class midpoint instead of all the
raw data in a set.

∑ f
Definition: The mean of a frequency distribution for a sample is approximated by 𝑥̅ =
𝑛
where x and f are midpoint and frequency of each class in the data set.

How to find the mean for grouped data?


1. Find the midpoint of each class.
2. For each class, multiply the class midpoint by the class frequency.
3. Compute for ∑ .
4. Divide the sum by the sum of the frequencies (n)

Example 4. The frequency distribution shows the salaries (in thousand pesos) for a specific year of
the faculty in a state university. Find the mean.

Table 2
Salary (in thousand pesos) 20 - 30 31 - 40 41 - 50 51 - 60 61 - 70 71 - 80 81 - 90 91 - 100
Frequency 10 15 12 6 3 2 1 1

Solution:

Salary (in thousand Frequency (f) Midpoint (x) f.x


pesos)
21 – 30 10 25.5 255
31 – 40 15 35.5 532.5
41 – 50 12 45.5 546
51 – 60 6 55.5 333
61 – 70 3 65.5 196.5
71 – 80 2 75.5 151
81 – 90 1 85.5 85.5
91 - 100 1 95.5 95.5
∑ = 50 ∑ = 2,195


̅= ∑
= = ₱ 43.9K

Isabela State University Page 4


SED Math 213 Elementary Statistics and Probability

B. The Median

The word “median” is synonymous to the word “middle” and median is the middle value in a
distribution where half of the values in the set fall at or above the median and the other half fall at or
below the median. The median is generally referred to as positional average.

Definition: The median of the data set is the value that lies in the middle of the set when data is
arranged in ascending or descending order.

How to find the median?


1. Arrange the data in ascending/descending order.
2. Determine the number of observations, n.
3. Find the value in the middle of the data set. The position of the median is given by ( ). If n
is odd, the median is exactly the middle value in the set. If n is even, the median is the
average of the two middle values in the data set.

The median of the sample is sometimes denoted by ̃, called “x-tilde”, or M, MD, or Med; there is no
commonly accepted notation and there isn’t special symbol for the median of the population.

Properties of Median
 The median is unique.
 It can be used for ordinal, interval and ratio scales of data.
 Median is less affected by outliers, and is most appropriate in a skewed data set.

Note: An outlier is a value that is either much higher or much lower than the median.

Example 5. Consider the salary of faculty at a state university below:


Faculty 1 2 3 4 5 6 7 8 9
Salary 38k 29k 24k 46k 32k 75k 96k 66k 85k

a. Find the median salary of the faculty.


b. Why median is the most appropriate measure to use instead of mean?

Solution:

a. Arrange the value in ascending order: 24k 29k 32k 38k 46k 66k 75k 85k 96k
Find n: n = 9
Determine the median: Since n = 9, the median is the value. = , thus, the
median is the 5th value; ̃ = ₱ 46K

b. The mean of the data set is 54.5k. However, looking at the raw data there are more
faculty whose salary is less than 54.5k. Therefore, median is the most appropriate
measure to use to have a better measure of central tendency.

Example 6. The data show the number of tablet sales in millions of units for a 6-year period. Find the
median of the data.
12.6 124.5 72.4 108.2 159.8 63.4

Solution:

Arrange the value in ascending order: 12.6 63.4 72.4 108.2 124.5 159.8
Find n: n = 8

Isabela State University Page 5


SED Math 213 Elementary Statistics and Probability

Determine the median: Since n = 9, the median is the value or the average of the
two middle value. So, the median is = 90.3. The median of tablet sales is 90.3M.

C. The Mode

The mode is the value in the data set that appears with the highest frequency. It is considered an
inspection average. The set can be unimodal, bimodal, or multimodal. If the data set values have the
same frequency, then the set has no mode.

Definition: The mode of a data set is a value that occurs most frequently.

Sometimes mode is considered as the most popular option. In dealing with categorical data, mode is
usually used to find out the most common category.

How to find the mode?


1. Arrange the data in order.
2. Determine the value that occurs with the greatest frequency. A data set that has a single value
that appears most frequently is considered unimodal. If the data has two values with the same
highest frequency, both values are considered mode, and the data set is bimodal. If the data
set has more than two modes, then the data set is said to be multimodal. If no value is
repeated, the data set has no mode.

Properties of Mode
 The easiest average to find.
 It can be used for nominal, ordinal, interval and ratio scales of data.
 Mode is not affected by size/weight of scores and outliers

Example 7. The mobile data download speed (in Mbps) for a particular day is listed. Find the mode.
5.5 6.2 4.5 5.1 7.4 6.2 3.5 6.2

Solution:

The mode is 6.2 Mbps because it is the download speed occurring most often (3 times)

Example 8. Following is a list of the manufacturer of all cars available at Grab Transport App on a
particular day. Which manufacturer of of car is the mode?

Honda Toyota Toyota Ford Chevrolet


Nissan Chevrolet Toyota Toyota Ford
Dodge Honda Chevrolet Toyota Chevrolet
Chevrolet Ford Toyota Toyota Nissan

Solution: The most frequent category is “Toyota,” which appears seven times. Therefore,
the mode is “Toyota”.

Note: Not all measures of central tendency can be used for all scales of measurement”. For
nominal data (such as sex or race), the mode is the only valid measure. For ordinal data
(such as salary categories), only median and mode can be used. For interval and ratio data
(without outliers), mean is used; and for interval and ratio data (with outliers), the median
is the most appropriate.

Isabela State University Page 6


SED Math 213 Elementary Statistics and Probability

STATISTech

Technology in Statistics: Step by Step Guide in Computing Measures of Central Tendency

A. Using MS Excel

Open excel worksheet and enter the label of the data in A1. Starting at A2 enter the data up to An.
In a blank cell, key in =average(A2:An) to compute the mean of the data set. On another blank
cell, key in =median(A2:An) to compute the median of the data set. To compute for the mode,
type =mode(A2:An) on another blank cell.

B. Using SPSS

Isabela State University Page 7


SED Math 213 Elementary Statistics and Probability

3.2: Measures of Variability

When instructors return your exam papers after checking and recording the scores, you often discuss
and compare your scores with friends and classmates. Did you notice the difference in your scores
from one test to another?

The word “vary” is synonymous to the word differ. In statistics, variability refers to how spread or
dispersed are the values in a distribution. Statisticians use measure of variation in addition to measure
of central tendency to describe the data set accurately.

Definition: The measure of variability is a number that represent a set of data based on how the
values differ of vary from each other.

The measure of variability, also known as measure of dispersion or spread, tells how varied are the
values in a data set or how much distance to expect between a given score/value and the mean. To
show the variability or spread of the values in a data set, three measures are commonly used: range,
variance and standard deviation.

A. The Range

One way to describe the difference between the values in a data set is to compare the highest and the
lowest value. The range is the difference between the highest and the lowest value in a data set and is
considered the simplest measure of spread.

Definition: The range (R) is the mathematical difference between the highest and the lowest value
in a set of data.

How to find the range?


1. Determine the highest and lowest value.
2. Subtract the lowest value from the highest value.
R = HV - LV

Properties of Range
 The range is easy to compute.
 It is very sensitive to extreme values.
 Does not truly reflect the difference among all of the data values in a set.

Example 9. The data show the 2019 Metro Manila Film Festival (MMFF) earnings (in millions of
pesos) in ticket sales from five out of eight entries. Find the range.
320 72 90 412 18

Solution: HV – LV = 412 – 18 = ₱ 394 M

Example 10. The following are the scores of 10 students in a 100-item Statistics exam: 82 77 90
84 68 88 62 94 71 67. Find the range.

Solution: HV – LV = 94 – 62 = 32

B. The Variance

In a statistics class, some students compute for variance first so that when they compute for standard
deviation they have to simply get the square root of variance. Based on this, there are students
assuming that the only use of variance is to make the computation of standard deviation faster.

Isabela State University Page 8


SED Math 213 Elementary Statistics and Probability

What is variance?

Unlike the range, variance combines all values in a data set to produce a good measure of variability.
It is a measure that describes the spread of the data values from the mean and how each value relates
to each other. The difference between each value in a data set and the mean is called deviation.

In any data set, the sum of the positive and negative deviations from the mean is always equal to zero.
To resolve this problem on the “balancing act” of positive and negative deviations, square each
deviation from the mean, hence, the variance.

Definition: The variance of the set of data is the mean of the squared deviations of the values
from the mean

How to find the variance? For Sample (s²) For Population (σ²)
∑ ∑
1. Find the mean of the values in a data set. ̅= =

2. Subtract the mean from each value (X - ̅ ) (X - )

3. Square each difference found in step 2. ̅

4. Find the sum of the squares in step 3. ∑ ̅ ∑

∑ ̅ ∑
5. Divide the sum in step 4 by total cases.

Properties of Variance
 The variance is always non-negative.
 It is sensitive to extreme values.
 The units of variance are the squared units of the original data value.

Example 11. Using the data in example #9, find the variance.
320 72 90 412 18

Solution:
The calculations are shown in the table below


Step 1. Find the mean: ̅= = = 182.4 M

Step 2. Subtract ̅ from each value as shown in the 2nd column.

Step 3. Square the deviations from the mean as shown in the 3 rd column

x x- ̅ ̅
320 (320 – 182.4) = 137.6 18,933.76
72 (72 – 182.4) = -110.4 12,188.16
90 (90 – 182.4) = -92.4 8,537.76
412 (412 – 182.4) = 229.6 52,716.16
18 (18 – 182.4) = -164.4 27,027.36
∑ ̅ = 119,403.2

Step 4. Find ∑ ̅ = 119,403.2

Isabela State University Page 9


SED Math 213 Elementary Statistics and Probability

Step 5. Divide the sum obtained in step 4 by n to obtain the sample variance
∑ ̅
s² = = = 29,850.8

Since variance is measured in squared units, the value 29,850.8 cannot be directly
related to the values in the data set. Hence calculating the standard deviation is
necessary.

Note: The units of measure in variance are squared values, making it difficult to interpret
variance. However, variance is important for conducting statistical inferences. When
comparing means, the hypothesis test to be used depends on variances are the same or not.
You use a different hypothesis test comparing means if the sample variances are the same or
not. To check for homogeneity of variances, you can use tests such as F-test, Bartlett’s test,
Levene’s test, and Tukey test.

C. The Standard Deviation

In most situations, it is better to use a measure of dispersion that has the same units as the data.
Taking the square of variance, the standard deviation is obtained, which returns the measure of
dispersion to its original units.

Standard deviation describe how far, on average, each observation is from the typical data value. It is
based on the deviation about the mean.

Definition: The standard deviation is the positive square root of variance. It represents the
average deviation of a data value from the mean.

In statistical practice, standard deviation is the measure of spread commonly used in conjunction with
the mean. Accordingly, it measures spread around the mean. The more widely spread the values are,
the larger the standard deviation is.

How to find the standard deviation? For Sample (s) For Population (σ)
∑ ∑
1. Find the mean of the values in a data set. ̅= =

2. Subtract the mean from each value (X - ̅ ) (X - )

3. Square each difference found in step 2. ̅

4. Find the sum of the squares in step 3. ∑ ̅ ∑

∑ ̅ ∑
5. Divide the sum in step 4 by total cases.

∑ ̅ ∑
6. Take the square root of the answer in step 5. √ √

Properties of Standard Deviation


 Most widely used measure of variation.
 The standard deviation is always positive.
 It is sensitive to extreme values.
 For two sets of data with the same mean, the greater the spread, the greater the standard
deviation.
 If all the values are the same in the data set, the standard deviation is zero.

Isabela State University Page 10


SED Math 213 Elementary Statistics and Probability

Example 12. Using the answer in example # 11: a) find the standard deviation, and b) interpret the
result in the context of the data.

Solution:
a. The computed variance is 29,850.8. To get the standard deviation, get the square
root of the variance, that is s = √ = 172.77M

b. The standard deviation (s) = 172.77M is considered large which means that there
is a large difference on the individual earnings of the five entries to the MMFF.

Example 13. The different ISPs’ internet download speeds in Mbps for a particular day (presented in
example 1) were: 9.67, 20.92, 7.11, 18.93, and 9.64. Find the standard deviation of the
internet download speeds and interpret the result in the context of the problem.

Solution:

Step 1. Find the mean: ̅= = = 13.25 Mbps

Step 2. Subtract ̅ from each value as shown in the 2nd column.

Step 3. Square the deviations from the mean as shown in the 3 rd column

x x- ̅ ̅
9.67 (9.67 – 13.25) = -3.58 12.8164
20.92 (20.92 – 13.25) = 7.66 58.8289
7.11 (7.11 – 13.25) = -6.14 37.6996
18.93 (18.93 – 13.25) = 5.68 32.2624
9.64 (9.64 – 13.25) = -3.61 13.0321

∑ ̅ = 154.6394

Step 4. Find ∑ ̅ = 154.6394

Step 5. Divide the sum obtained in step 4 by n to obtain the sample variance
∑ ̅
s² = = = 38.65985

Step 6. Take the square root of variance in step 5


s =√ = 6.22Mbps

The ISPs’ download speeds differ by 6.22 Mbps on the average. So, if you open the
Facebook App on different phones subscribed to the mentioned ISPs, the amount of time
it takes to load the feed would be different. Some will load faster or slower than the
others depending on the ISP’s download speed.

In general, a large standard deviation shows that the data values are far from the mean, and a small
standard deviation indicates that the data values are clustered around the mean. However, standard
deviation might be difficult to interpret in terms of how big it has to be in order to consider the date
widely spread.

To make sense with the value of standard deviation, there are two approaches for interpreting standard
deviation: the empirical rule and Chebyshev’s theorem.

Isabela State University Page 11


SED Math 213 Elementary Statistics and Probability

1. The Empirical Rule

The empirical (68-95-99.7) rule for data states that


for data sets having a distribution that is
approximately bell-shaped (normal), the following
properties apply:
a) about 68% of all values fall within 1 standard
deviation from the mean;
b) about 95% of all values fall within 2 standard
deviation from the mean; and
c) about 99.7% of all values fall within 3 standard
deviation from the mean
Copyright © 2017 Pearson Education, Inc.

Example 14. A class of 200 students took an exam. The scores had sample mean ̅ = 65 and a sample
standard deviation s = 10. The distribution is normal. Find: a) the interval that is likely
to contain approximately 68% of the scores, and b) Approximately what percentage of
the scores was between 45 and 85?

Solution:
a. Using the Empirical Rule, approximately 68% of the data set will be between ̅ – s and
̅ + s.
̅ – s = 65 – 10 = 55 ̅ + s = 65 + 10 = 75
It is likely that approximately 68% of the scores were between 55 and 75.

b. The value 45 is two standard deviations below the mean


2 standard deviations = 2s = 2(10) = 20
̅ – 2s = 65 – 20 = 45
and the value 85 is two standard deviations above the mean
̅ + 2s = 65 + 20 = 85.
Therefore, it is likely that approximately 95% of the scores were between 45 and 85.

The empirical rule applies only to data sets with bell-shaped distributions. It gives an approximation
to the proportion of data that will be within one or two standard deviations of the mean. To interpret
standard deviation for any data set, use Chebyshev’s theorem.

2. The Chebyshev’s Theorem

The Chebyshev’s Theorem states that the proportion of any set of data lying within K standard
deviations of the mean is always at least 1 - ⁄ where K is any positive number greater than 1.
For K = 2 and K = 3, we get the following statements:
a) at least (or 75%) of all values lie within two standard deviations of the mean; and
b) at least (or 89%) of all values lie within three standard deviations of the mean.

Example 15. As part of a study on IQ as predictor of job performance, the data shows that IQ scores
have a mean of 100 and a standard deviation of 15. What information does
Chebyshev’s inequality provide about these data?

Solution: Compute: ̅ – 2s = 100 – 2(15) = 70 ̅ + 2s = 100 + 2(15) = 130


̅ – 3s = 100 – 3(15) = 55 ̅ + 3s = 100 + 3(15) = 145

Conclude: At least 75% of the respondents had IQ scores between 70 and 130
At least 89% of the respondents had an IQ scores between 55 and 145

Isabela State University Page 12


SED Math 213 Elementary Statistics and Probability

When comparing variation in samples with approximately the same mean, it is a good practice to
compare the two standard deviations. However, when comparing variation in samples with different
means or with different units, it is better to use coefficient of variation.

Coefficient of Variation

The coefficient of variation (CV) tells how large the standard deviation is relative to the mean. It can
be used to compare the spread of data sets whose values have different units.

Definition: The coefficient of variation (CV) of a data set describes the standard deviation as a
percent of the mean.
σ
Sample: CV = . 100 Population: CV = . 100
𝑥̅ 𝜇

Based on the definition, the numerator and denominator have the same units, so CV itself has no
units. Thus, you can directly compare variability of two different populations or samples.

Example 16. In ABC Auto Shop, the mean of the number of sales of cars over a 3-month period is 87,
and the standard deviation is 5. The mean of the sales commissions is ₱50225 and the
standard deviation is ₱7730. Compare the variations between car sales and sales
commissions.

Solution: CV = ̅ . 100 = . 100 = 5.7% sales

CV = ̅ . 100 = . 100 = 15.4% commission

Since the CV is larger for commission, the commissions are more variable than the
Sales.

Note: Standard deviation is usually preferred over variance because it is directly interpretable.
However, coefficient of variation is useful when comparing variation in population or
samples with different means or with different units.

Isabela State University Page 13


SED Math 213 Elementary Statistics and Probability

STATISTech

Technology in Statistics: Step by Step Guide in Computing Measures of Variability

A. Using MS Excel

Open excel worksheet and enter the label of the data in A1. Starting at A2 enter the data up to An.
In a blank cell, key in . . .

B. Using SPSS

Isabela State University Page 14


SED Math 213 Elementary Statistics and Probability

3.3: Measures of Position

You usually compare yourselves to others, whether it is test scores, height, weight, or your allowance
in a day. When you scored 30 out of 50 in an exam, you also want to know how your score compared
to the scores of your classmates. Sometimes you need to know the position of one observation relative
to others in a data set.

There are three ways on how to locate the relative position of a data value in a data set: z-scores,
percentiles, and quartiles.

A. The Standard Score (Z – Score)

Suppose you also want to know how your score of 30 compared to the scores of your classmates on a
50-item exam. The mean and the standard deviation of the scores can be used to compute the z-score,
which measures the distance between a particular score and the mean, measured in units of standard
deviation.

Definition: The standard score or z-score is a measure of relative position defined as


𝜇 𝑥̅
z= for population and z = for sample
σ 𝑠

The z-score tells how many standard deviations a data value is above or below the mean in a data set.
The z-score can be positive, negative, or zero. When z is positive, the corresponding x-value is greater
than the mean. When z is negative, the corresponding x-value is less than the mean. For z = 0, the
corresponding data value is the same as the mean.

Example 17. In a Statistics class, Jen scored 30 pts in a 50-pt examination. She wants to know her
score of 30 compares to the scores of her classmates. The mean and the standard
deviation of the exam scores are 25 and 4, respectively. Calculate Jen’s z-score and
interpret the result.

̅ ̅̅̅̅
Solution: z= = = 1.25

Jen’s score is 1.25 standard deviations higher than the mean (30 = 25 + 1.25s).

Example 18. The mean speed of cars along a stretch of highway is 56 kph with a standard deviation
of 4 kph. The MMDA personnel measure the speed of three cars travelling along this
strech of highway on a particular day as 62kph, 47 kph, and 56 kph. Find the z-score
that corresponds to each speed and interpret the result.
̅ ̅
Solution: = = = 1.5 = = = -2.25
̅
= = =0

The speed of 62 kph is 1.5 standard deviations from the mean; a speed of 47kph is
2.25 standard deviations below the mean; and the speed of 56 kph is equal to the
mean. The car travelling at 47kph is said to be travelling slowly because the speed
corresponds to z = -2.25.

A z-score is a measure of position, because it describes the location of a data value relative to the
mean. Using the Empirical Rule, you can easily interpret z-scores for data set that is normal (bell-
shaped). For skewed distributions, it is difficult to interpret the z-scores, so it is better to use other
measures of position.

Isabela State University Page 15


SED Math 213 Elementary Statistics and Probability

B. Quartiles

When analyzing data sets, sometimes it is helpful when you grouped the set into sub-groups.

C. Percentiles

C. Quartiles

4.4: Box plots

A. Five number summary


B. Box plots

5. Teaching and Learning Activities

Try These:

1. The following are Jay’s grades las semester. Compute his GWA if: a. PE grade is included in
the computation. b. PE grade is not included in the computation.

Courses Math English Soc. Sci. Filipino Major 1 Major2 PE Science


Units 3 3 3 3 3 3 2 3
Grade 2.5 2.0 1.75 1.75 2.25 2.0 1.5 2.75

2.

Recommended Learning Materials and Resources

The following videos on YouTube and other online sources will supplement your learning on
describing data.

a. Measures of Central Tendency


https://numberbender.com/lessons/view/1198/2.1-Measure-of-Center-and-Spread

b. Measures of Variability
https://numberbender.com/lessons/view/1198/2.1-Measure-of-Center-and-Spread

c. Measures Of Position
https://www.youtube.com/watch?v=CiZCtar7iI8

Isabela State University Page 16


SED Math 213 Elementary Statistics and Probability

d. Exploratory Data Analysis


https://www.youtube.com/watch?v=zHcQPKP6NpM

Flexible Teaching Learning Modality (FTLM) adopted

Online (Synchronous)
 Zoom, Edmodo, Facebook Messenger

Remote (Asynchronous)
 Module 3, Problems Sets, PowerPoint Lessons, Consultations through GC

Assessment Task

Problem Set # 3

1. Consider the following data obtained from a random sample of 50 credit card accounts. Identify
all appropriate measure of central tendency that can be used to summarize the data.
a. outstanding balance on each account
b. type of credit card (e.g., MasterCard, Visa, American Express, etc.)
c. amount due on next payment

References

Bluman, Allan G. (2017). Elementary Statistics: A Step By Step Approach. McGraw-Hill Education

Larson, Ron & Farber, Betsy. (2014). Elementary Statistics: Picturing the World. 6th Edition. Pearson
Education, Inc.

Navidi, William C. & Monk, Barry J. (2019). Elementary Statistics. 3 rd Edition. McGraw-Hill
Education

Triola, Mario F. (2017). Elementary Statistics. 13th Edition. Pearson Education, Inc.

Speedtest Global Index. Retrieved from https://www.speedtest.net/global-index.

Isabela State University Page 17

You might also like