Handouts For Stt101 BBLL (Chapters 1-4) Descriptive Statistics

Handouts for STT101 BbLl
(Chapters 1-4) DESCRIPTIVE STATISTICS…

1.) Definitions:
a. Data (Datum): Items in a record or report are facts expressed in numbers or described by
their quality or kind. These facts are called data. The major concern of Statistics is about
data and how to deal with it.
Ex. 1. Color of the eyes 3. scores
2. Class size 4. height
b. Population, Sample: A population is a collection of all the units from which data is to be
collected. A subset or a representative part of the population is called sample .
c. Census, sampling: Census is the process where information is gathered from all the units
in the population, sampling is the process when only a part of the population is used to
obtain data. The information derived from the sample is used to make some
generalizations about the whole population. Errors are unavoidable when these
generalizations are made. The role of statistics is to provide the procedures so as to
minimize these errors.
d. Statistics (Singular and plural sense):

In singular sense: Statistics is a branch of Science that deals with the development
of methods for a more effective way of collecting, organizing, presenting, and analyzing
data. In plural sense Statistics can mean the data itself or some numerical computations
derived from the data.
2.) Two Major Areas in Statistics:

a. Descriptive Statistics – deals largely with summary calculations, graphical displays and
describing important features of a set of data. It does not attempt to draw
conclusions/insights about anything that pertains to more than the data themselves.
b. Inductive / Inferential Statistics – is concerned with making generalizations from

information gathered from a small group of observations (sample) to a bigger group of
observations (population). It is equipped with an enormous number of analytical tools
that allows the investigator to grasp a better understanding about the population from
which the sample data was gathered based on the information that is contained only in the
sample.
3.) Two Types of Data:
a. Quantitative data – are data that can be expressed in numbers. These are things or
information that can be measured, like a person’s height or a student’s score in a quiz.
b. Qualitative data – are facts for which no numerical measure exists and are usually
expressed in categories or kind like the color of the skin – black, white or brown; a
person’s sex – male or female; a person’s personality – extrovert or introvert.
4.) Types of Variables:
Variables are the characteristics or properties measured from objects, persons or things.
a. Discrete variables can be counted thus assume a value which is a whole number.
Example: Number of passers or failures in the LET
b. Continues variables can be measured using some units of measurement, which may take
some decimal numbers. Example: weights, heights, ages
5.) Properties of Numbers :
Another way of looking at data is on the way they are measured. Measurements are always
associated with numbers that have some interesting properties or characteristics.
a. Identity – it enables a person to distinguish one number from the other. They are
identified by their shapes or the way they are written. This is the simplest property of
numbers.
b. Order – it refers to the way the numbers are arranged in a sequence. It is an established
convention that 1 comes before 2, 2 comes before 3, and so on. We also say that “7 is
greater than 6” or “3 is less than 5”.
c. Additivity / equality of scale – the property that allows us to add numbers. There is
equality of scale used, when we say “ 3g + 5g = 8g” , we are confident that we are correct
Daniel R. Sanson’s Property 1
because of equality of scale. This means that in the sequence 1g, 2g, 3g. … the distance
from 1g to 2g is the same as the distance from 2g to 3g, is the same as the distance from
3g to 4g and so on. Certainly, 2 flowers + 5 peanuts is not equal to 7 apples because the
scales used are not equal.
d. “Absolute zero” – it means it has nothing of the characteristic that is being measured. In
temperature, the characteristic that is being measured is ‘amount of heat’. A temperature
of zero degree Celsius does not mean that the object does not have any amount of heat. If
a student received a score of zero in an intelligence exam, it does not necessarily mean
that the student has ‘no intelligence’ at all. A length of zero meters absolutely means the
object has ‘no length’. An object with a weight of zero pound is certainly weightless.
6.) Four Types of measurement:
a. Nominal –measurements possess only the property of identity. Ex. Color and sex.
b. Ordinal - measurements possess the properties of identity and order but do not have the
equality of scale property. Ordinal measurements are usually associated with ranks.
When students are ranked according to class performance, an order of 1st, 2nd, 3rd, …can
be established. But these numbers can not be added because the distance from 1st to 2nd
may not be the same as the distance from 2nd to 3rd, and so on. Ex. Social classes, military
ranks, honor roll, and taste preferences.
c. Interval – measurements possess the properties of identity, order, and equality of scale.
Temperature and Intelligence Scores are examples of interval measurement. The numbers
associated with these variables have identity and order and they can be added in the
usual manner. However they do not have the property of absolute zero.
d. Ratio - measurements possess all the properties of identity, order, equality of scale and
absolute zero. This is the highest form of measurement. Ex. Length, weight.
A large number of statistical analysis tools are available for each type of measurement. It is
important that the statistical user have a good understanding of the type of data that is to be processed in
order that the chosen statistical tool is used properly.
7.) Four Methods of Collecting Data:

a. By Interview
b. By Questionnaires
c. By Direct Observation
d. By utilizing Existing Records - published or unpublished, primary(first hand and have not
been subjected to some transcription or condensation) or secondary (transcribed or
compiled from original sources))
8.) Sampling techniques. Usually, doing a census, that is studying the entire population, is not
always feasible because of limited resources like money and time. Oftentimes researchers resort
to do sample surveys. To make reliable inferences regarding the population, from which the
sample was taken, one should select a sample that is a good representation of the population, that
is, unbiased sample. Several methods of obtaining a sample from the population:
Methods of Probability Sampling
a. Simple Random Sampling
b. Systematic Random Sampling
c. Stratified Sampling
d. Cluster Sampling
Methods of Non-Probability Sampling
e. Convenience (Accidental) Sampling
f. Purposive (Judgment) Sampling
g. Quota Sampling
h. Network (Snowballing) Sampling
9.) Presentation of Data: Data may be presented by

a. Tabulating – is a process by tallying scores in a statistical table called talligram, a
contraction of tally and diagram. This table consists of columns for the units digit and
rows for the tens digit.
Example: Math 1 Prelim exam result.(raw scores).
86 74 66 70 75 56 69 70 73 66 74 81
60 76 80 81 61 67 63 68 73 63 75 71
58 72 83 69 79 67 68 64 69 73 69 78
88 62 76 72 65 66 70 73 61 78 84 77

Table 1.
Units digit 0 1 2 3 4 5 6 7 8 9 T
Tens digit
8 1 11 1 1 1 1 7
7 111 1 11 1111 11 11 11 1 11 1 20
6 1 11 1 11 1 1 111 11 11 1111 19
5 1 1 2
T 5 5 3 7 4 3 7 3 6 5 48
Other examples of tabular presentation

Frequency and percent table (One way table).
Ex.
Frequency %
Male 25 38
Female 40 62
Total 65 100
Cross-tabulation (Two-way table)

Chinese Filipino T(%)
Male 8 19 27 (42)
Female 5 33 38 (58)
T(%) 13 (20) 52 (80) 65 (100)
b. Ordering – The data is arranged in descending (highest to lowest) or ascending (lowest to

highest) order writing each score as many times as it occurs. Ordered arrangement of
scores is a prerequisite to ranking of scores
c. Ranking – is assigning a position or rank to an observation, score, or individual in
relation to the others in the group according to some characteritics such as magnitude,
quality, worth, chronology or importance. It is usually indicated by a number, thus , the
highest score may be given a rank of 1, the second a rank of 2, and so on. In the case of
chronological ranking, the item occurring first is ranked 1, the second is ranked 2 , and so
on.
Ranking a small number of scores:
Procedure:
1. Arrange scores in descending order (from highest to lowest). Write each scores
as many times as it occurs in one column. This is the first column.
2. Number each score consecutively from 1 to n where n equals the number of
scores. This is the second column.
3. On the third column write the ranks of each scores.
a. The rank of a score occurring once is the same as its consecutive number.
b. To find the rank of a score occurring two or more times, add the first and the
last consecutive numbers of the score and divide the sum by two. The result is
the rank.
Ex. Find the ranks of the following scores in English 1 prelim exam. What is the rank of
61?
Scores: 72 45 61 69 45 61 37 69 45 88 41
Answer:
Scores Consecutive #s Rank
88 1 1
72 2 2
69 3 3.5
69 4 3.5
61 5 5.5
61 6 5.5
45 7 8
45 8 8
45 9 8
41 10 10
37 11 11 //
The rank of 61 is 5.5

For large number of scores:
Procedure:
1. Tabulate the scores in a talligram.
2. Use the talligram as an aid in arranging the scores in descending order in a vertical
column writing a score only once even if it occurs two or more times. This is the first
column.
3. Number the scores consecutively. The consecutive numbers of a score must be equal
to the number of times it occurs. This is the second column.
4. Assign the ranks on the third column by following rule 3 for small number of scores.
d.Grouping Scores into a Class frequency distribution:

Class frequency distribution is the process of placing scores in scaled groups called classes.
A class is a group of a specified number of consecutive single scores or measures. The specified
number of consecutive scores that a class contains is called the class width. The lower end-
number of the class is called “lower limit” and the upper-end of the class is called “upper limit”.
Example: The class 36 – 41 has the lower limit of 36 and the upper-limit of 41. The lower class
boundary is 35.5 and the upper class boundary is 41.5. The class width is 6 because there are six
consecutive single scores contained in the class: 36, 37, 38, 39, 40, and 41.
Ex:The following are test scores in Math 31. Construct the class frequency distribution
table.
86 74 66 70 75 57 69 70 73 66 60 81
90 62 76 72 61 58 63 68 73 63 75 71
63 66 74 73 78 61 78 72 67 83 59 67
68 64 59 73 69 76 80 81 79 84 77 68
Steps:
1. Find the range R. Range = Highest score – Lowest score. Ex. R = 90 – 57 = 33
2. Determine/ estimate the number of intervals/classes, k.
Formula: k = √n Ex. k= √48 = 6.9 = 7 .
3. Find the class width ( c ) or the width of the interval. Divide the range by the tentative number
of classes and raise the quotient to the next higher integer if there is any fractional part
Ex. c = R/k = 33 / 7 = 4.7 = 5.
4. Find the lowest limit of the classes. This is the number equal to or next lower than the lowest
score. ( The lowest score must be contained in the lowest class while the highest score must be
contained in the highest class.)
5. Find the lower and upper limits of the classes.
6. Tally the scores.
7. Write the frequencies, class boundaries, cumulative frequencies.
Table 2.
------------------------------------------------------------------------------------------------------------------------------
Class Tally f Class Mid pts Cum Freq
Interval Boundaries (Class marks) (lesser than)
87 – 91 1 1 86.5 – 91.5 89 48
82 – 86 111 3 81.5 – 86.5 84 47
77 – 81 1111111 7 76.5 – 81.5 79 44
72 – 76 111111111111 12 71.5 – 76.5 74 37
67 – 71 1111111111 10 66.5 – 71.5 69 25
62 – 66 11111111 8 61.5 – 66.5 64 15
57 – 61 1111111 7 56.5 – 61.5 59 7
n = 48
USES of a Class Frequency Distribution (CFD):
1. It shows whether the distribution is normal or skewed. It also indicates the relative difficulty of
the test from which the scores are taken.
If most of the scores are massed at the middle portion of the frequency table, the distribution
is normal and the test is of moderate difficulty.
If most of the scores are gathered at the upper portion of the distribution, the distribution is
skewed to the left or skewed negatively. The test is relatively easy for the students.
If the majority of the scores are clustered at the lower part of the frequency table, the
distribution is skewed to the right or skewed positively and the test is relatively difficult.

2. It facilitates the computation of statistical measures such as the median, mean, quartiles,
percentiles, standard deviation, etc.
3. Grouping also minimizes space.
e.) Graphing:
Graphical Presentation of Class Frequency Distributions:
1. Frequency polygon – It is a linear graph representing the frequencies of the midpoints of the
classes in a class frequency distribution and which forms a polygon when its ends are joined with the
baseline.
Construction Procedure:
On the XY plane reflect the mid points of the classes on the X axis and the class frequencies on the
Y axis. Locate and connect the intersection points of the mid points of the classes and their
corresponding frequencies. To close the curve connect the end points to the baseline along the two
extended midpoints below and above the distribution.
Uses of the frequency polygon.

It displays the distribution of the scores. If it is more or less symmetrical with the highest point
approximately in the middle, the distribution is more or less regular or normal and the test is of
moderate difficulty. If the graph is asymmetrical and it is higher at the left than at the right, the
distribution is said to be skewed to the right and the test is relatively difficult. If the graph is higher
at the right than at the left side, the distribution is said to be skewed to the left and the test is
relatively easy.
2. Bar Chart
The bar chart is a graph consisting of bars or rectangles placed side-by-side vertically
representing the frequencies of classes in a class frequency distribution. The width of the bar or rectangle
is the width of the interval represented by the class limits in the horizontal axis. The length of the rectangle
represented by the class frequency is drawn in the vertical axis.
3. Histogram
The histogram is a graph which is a close resemblance of the bar chart. The bar chart uses the
class limits for the horizontal axis while the histogram employs the class boundaries. Using the class
boundaries eliminates the spaces between the rectangles giving it a solid appearance. Usually, but not
necessarily, the sides of the adjacent sides of the bars are omitted so that the graph also looks like a
polygon.
4.Pie Chart and Pictograph

Categorical variables are often described graphically by using a pie chart, a circle which is divided
into pie-shaped sectors. The angle of a sector is proportional in size to the frequencies or percentages but
it is advisable to convert the frequency table into percentages.
The pictograph uses pictures usually to represent a certain population.
Both give a more dramatic and livelier appearance or presentation of the distribution of data.
Descriptive Measures:
To investigate a set of data, it is useful to define measures that describe its important features.
We have Measures of Central Tendency and other locations, and Measures of Variability.
1.)Measures of Central Tendency

Central tendency is the tendency of observations or cases or scores to cluster about a point. A
measure of central tendency is defined as a statistics calculated from a set of observations or scores and
designed to typify or represent the whole population. It is either an average (mean), a midpoint (median)
, or the most frequent score in a distribution of scores (mode). The most common central measures are
the mean, the median and the mode.
A.) The mean, median, and mode of UNGROUPED data (raw or real data/scores):
a. The Mean – It is the most popular and the most reliable measure of central tendency. It is the average
of a set of scores or observations.

The mean is the sum of a set of scores (or observations) divided by the total number of scores in
the set. The formula is:
∑𝑥
𝑥̅ = where: x are the scores,
𝑛
n is the number of scores, and
∑ is the symbol for summation
The Weighted Mean is the average computed for k quantities x1, x2, …, xk where more
significance is attached to some scores than to others, weights w1, w2, …, wk are assigned to the k
quantities respectively. These weights represent measures of relative importance to the individual scores.
The formula is:
∑𝒙 𝒘
̅ = ∑𝒊 𝒊
𝒙 where: xi are the scores,
𝒘 𝒊
wi are the weights of each score
∑ is the symbol for summation
Ex: 1.) Find the mean of the following test scores in Math 1.
71, 68, 68, 58, 55, 52, 52, 45, 38, 38, 38, 30, 25, 25
Solution:
The mean is
∑𝑥
𝑥̅ = = 71 + 68 + 68 + 58 + 55 + 52 + 52 + 45 + 38 + 38 + 38 + 30 + 25+25
𝑛
14
= 663
14
= 47.36
Ex: 2.) When Nikka Sanchez was in her fourth year high school her final grades in Math 4,
English 4, Filipino 4, Physics, Chemistry, Journalism, World History, and Research were
78, 89, 90, 79, 83, 93, 89, and 95 respectively. If each subject had the equivalent units of
2, 1, 1, 1.5, 1.5, 1, 1, and 1.5 respectively what was her weighted average grade when she
graduated?
Solution:
78(2) + 89(1) + 90(1) + 79(1.5) + 83(1.5) + 93(1) + 89(1) + 95(1.5)
̅=
𝒙 2+1+1+1.5+1.5+1+1+1.5
̅ = 902.5
𝒙
10.5
̅ = 85.95
𝒙
b. The Median ( 𝑥̃ ) is a point in a scale which divides the scale into two equal parts. A scale is a
succession of numbers, steps, classes, degrees, gradations, or categories with a fixed interval. The
median is just the middle value of a set of observations arranged in an increasing or decreasing order of
magnitude. It is the middle score or value when the number of observations is odd, or the arithmetic
mean of the two middle values when the number of observations is even. It is the value such that half of
the observations fall above it and the other half fall below it.
Formula:
x(n+1)/2 if n is odd
𝑥̃= {
½ ( x n+ xn +1 ) if n is even
2 2
Ex: 1.) Find the median of the following test scores in Math 1.
25, 71, 52, 68, 58, 55, 38, 52, 45, 38, 38, 30, 25, 68
Solution: Arranging first the data from highest to lowest or vice-versa.

71, 68, 68, 58, 55, 52, 52, 45, 38, 38, 38, 30, 25, 25
Since n = 14 (which is even), we have two middle scores.
Hence, the median,
𝑥̃ = ½( x14 + x14+1) = ½( x7 + x7+1) = ½( x7 + x8) = ½ (52+45) = 48.5
2 2
Ex 2.) Find the median of the following set of scores in ED 103PRT.

24, 15, 13, 23, 27, 18, 16
Solution: Arranging the data from highest to lowest we have 27, 24, 23, 18, 16, 15, 13.
Since n = 7 (which is odd), we have a middle score. Hence
𝑥̃ = xn+1 = x7+1 = x4 = 18.
2 2
̂ ). It is the most frequently occurring score in a set of data or the score with the highest
c. The Mode ( 𝒙
frequency. A set of score data can have one mode (unimodal), two modes (bimodal), three modes
(trimodal), or more, or no mode at all. The mode is the poorest measure of central tendency.
Ex. The mode of example 1 is 38. → ̂ = 38

𝒙
Example 2 has no mode
B.) The mean, median, and mode of GROUPED data (class frequency distribution):
a.) The Mean : The class mark or midpoint method
Formula:
__
∑ 𝑓𝑖 𝑥𝑖
𝑥̅ = where fi is the class frequency of the ith class interval
𝑛
xi is the class mark or the midpoint of the ith class interval
Ex. Table 2.
---------------------------------------------------------------------------------
Class fi xi fixi
Interval (Mid pts/class marks)
87 – 91 1 89 89
82 – 86 3 84 252
77 – 81 7 79 553
72 – 76 12 74 888
67 – 71 10 69 690
62 – 66 8 64 512
57 – 61 7 59 413
_______________________________________________________
n = 48 ∑ fixi = 3397
∑ 𝑓𝑖 𝑥𝑖 3397
𝑥̅ = = = 70.77
𝑛 48
b.) The Median.

Formula:
̃ = Lm + ( n/2 – cf<) c .
𝒙 where Lm = lower class boundary of the median class
f n = total frequency or total number of observations
cf< = cumulative frequency equal to or next lower than the
n/2
c = class interval
f = frequency of the median class

Example.
Table 2.
------------------------------------------------------------------------------------------------------------------------------
Class Tally f Class Mid pts cf<
Interval Boundaries (Class marks) (cum freq lesser than)
87 – 91 1 1 86.5 – 91.5 89 48
82 – 86 111 3 81.5 – 86.5 84 47
77 – 81 1111111 7 76.5 – 81.5 79 44
72 – 76 111111111111 12 71.5 – 76.5 74 37
67 – 71 1111111111 10 66.5 – 71.5 69 25
62 – 66 11111111 8 61.5 – 66.5 64 15
57 – 61 1111111 7 56.5 – 61.5 59 7
____________________________________________________________________________________
n = 48
Solution:
n/2 = 48/2 = 24 → median class is (67 – 71)
cf< = 15, the cum freq lower than 24
Lm = 66.5
f = 10
c=5
f = 10
̃ = Lm + ( n/2 – cf<) c
𝒙 = 66.5 + ( 24 – 15) 5 = 66.5 + 45 = 66.5 + 4.5 = 71 .
f 10 10
c.) The Mode.
Crude Mode: 𝒙 ̂ = Lm + C where c is the class interval and Lm is the
2 lower class boundary of the modal class
Refined Mode: 𝒙 ̂ = 3𝒙̃ – 2𝒙 ̅
2.) Other Measures of location:
Other measures of location that describe or locate the non-central position of a set of data are
referred to as quantiles or fractiles . Most common fractiles are known as percentiles, deciles, and
quartiles.
Percentiles- are values that divide an ordered set of observations into 100 equal parts denoted by
P1, P2, …, P99 such that 1% of the data falls below P1, 2% of the data falls below P2, …, and 99% of the
data falls below P99.
Deciles are values that divide an ordered set of observations into 10 equal parts denoted byD1,
D2, …, D9 such that 10% of the data falls below D1, 20% of the data falls below D2, …, and 90% of the
data falls below D9.
Quartiles are values that divide an ordered set of observations into 4 equal parts denoted byQ1,
Q2, …, Q3 such that 25% of the data falls below Q1, 50% of the data falls below Q2, …, and 75% of the
data falls below Q3.
A.) From Ungrouped Data:

To solve for percentiles, deciles or quartiles from ungrouped data follow the following procedure:
1. Arrange the data in an increasing order of magnitude (ascending order).
2. Solve for the value of L where
L = mn / 100 for percentiles where m is the location of the percentile,
L = mn / 10 for deciles decile, or quartile
L = mn / 4 for quartiles n is the number of observations
3. If L is an integer, the desired fractile gets the average of the Lth and the (L+1)th observation.
If L is fractional, the desired fractile gets the next higher integer to find the required location.
The fractile corresponds to the value in that location.
Examples:
1.) Find P63, D8, and Q1 in the following set of score data in Bio 1.
95, 34, 45, 67, 56, 58, 76, 87, 91, 39, 56, 78
Solution:
Data arranged in ascending order: 34, 39, 45, 56, 56, 58, 67, 76, 78, 87, 91, 95. n = 12
a.) P63: L = 63(12) / 100 = 7.56 → 8.
This means that the 8th value in the set of data is the 63rd percentile. Therefore, P63 = 76. This
means that 63% of the data falls below 76.
b.) D8: L = 8(12) /10 = 9.6 → 10.

This means that the 10th value in the data is the 8th decile. Therefore, D8 = 87 which means
that 80% 0f the data falls below 87.
c.) Q1: L = 1(12) / 4 = 3

This means that the 1st quartile is the average between the 3rd and the 4th value in the data. Hence,
Q1 = (45 + 56) / 2 = 50.5 . This further means that 25% of the data falls below 50.5
B.) From Grouped Data:
a.) Percentile: The computing formula is Pm = Lm + ( mn/100 – cf<) c .

fm
b.) Decile: The computing formula is Dm = Lm + ( mn/10 – cf<) c .

fm
c.) Quartile: The computing formula is Qm = Lm + ( mn/4 – cf<) c .

fm
Examples:
1.) Using the data given in Table 2, compute for the following:
a.) P43
b.) D9
c.)Q3
Table 2.
----------------------------------------------------------------------------
Class f Class Cum Freq
Boundaries (cf<)
87 – 91 1 86.5 – 91.5 48
82 – 86 3 81.5 – 86.5 47
77 – 81 7 76.5 – 81.5 44
72 – 76 12 71.5 – 76.5 37
67 – 71 10 66.5 – 71.5 25
62 – 66 8 61.5 – 66.5 15
57 – 61 7 56.5 – 61.5 7
____________________________________________________
n = 48
a.) P43: mn/100 = 43(48)/100 = 20.64 → class (67 – 71)

cf< = 15
Lm = 66.5
fm = 10
c=5
Hence, P43 = 66.5 + (20.64 – 15) 5 = 69.32 . This means that 43% of the data falls
10 below 69.32
b.) D9: mn/10 = 9(48) /10 = 43.2 → class (77 -81)

cf< = 37
Lm = 76.5
fm = 7
c=5
Hence, D9 = 76.5 + (43.2 – 37) 5 = 80.93 This means that 90% of the data falls
7 below 80.93
c.) Q3: mn/4 = 3(48)/4 = 36 → class (72 – 76)
cf< = 25
Lm = 71.5
fm = 12
c=5
Hence, Q3 = 71.5 + (36 – 25) 5 = 76.08 This means that 75 % of the data
12 falls below 76.08
3.)Measures of Variability / Dispersion / Spread
The measures of central tendency characterize only the location at which a given set of data clusters
to. To further give an adequate description on how the data cluster around or scatter away from the central
point, we need to know another important measures that also characterize a given set of data, we call it
measures of variability or dispersion or spread.
Example 1. The same group of 8 students took their final exams in English 1 and Math 1. Their
scores and the mean scores are
__
English 1 : 75, 77, 80, 80, 81, 82, 83, 84 𝑥̿ = 80.25
__
Math 1 : 60, 65, 76, 82, 83, 85, 95, 96 𝑥̿ = 80.25
The two sets of data have the same means equal to 80.25 but they are not identical. Scores in
English 1 clustered close to the mean while scores in Math 1 are more dispersed about its mean. The
measures used to describe this variation are the range, the variance, the standard deviation, the quartile
deviation, and the coefficient of variation. The quartile deviation and coefficient of variation will no
longer be discussed in this material. Our discussion will be limited only to the three common measures
of variability (Range, variance and standard deviation) of ungrouped data.
1.) The range is the easiest to compute but it is the poorest measure of dispersion. The larger the
range, the more dispersed is the data.
Range = Highest Score – Lowest Score. → R = Hs – Ls
From Ex. 1: Eng 1: R = 84 – 75 = 9

Math 1: R = 96 – 60 = 36
In our example, test results in Math 1 is more variable than in English 1. Dispersion of scores in
Math 1 is wider than that in English 1. __
2.) Another measure of variability is the variance. It is always non-negative and thus it can never
be negative. A large variance corresponds to a highly dispersed set of values. It makes use of all
observations in the data set. Its unit of measure is the square of the unit of measure of the given
set of values.
𝟐
𝟐 ∑(𝒙−𝑥̅ )𝟐 𝟐 𝒏 ∑𝒏 𝟐 𝒏
𝒊=𝟏 𝒙𝒊 −(∑𝒊=𝟏 𝒙𝒊 )
Variance, s2 . Formula: 𝒔 = or 𝒔 =
𝒏−𝟏 𝒏(𝒏−𝟏)
Example 2. The number of hours spent by ten students (out of 40 in Bio 1) in studying per day
were recorded as follows: 5, 8, 4, 2, 2, 2, 2, 5, 3, and 4. Find the variance.
Solution:
Total
𝑥𝑖 5 8 4 2 2 2 2 5 3 4 37
2
𝑥𝑖 25 64 16 4 4 4 4 25 9 16 171
So we have:
∑ 𝑥𝑖 = 37 and ∑ 𝑥𝑖2 = 171
𝟏𝟎(𝟏𝟕𝟏)−(𝟑𝟕)𝟐
𝒔𝟐 = = 𝟑. 𝟕𝟗 sq.hrs.
𝟏𝟎(𝟗)
3.) Standard Deviation, s . Formula: 𝒔 = √𝒔𝟐
The standard deviation is the positive square root of the variance.
Hence, the standard deviation of example 2 above is
𝒔 = √𝒔𝟐 = √𝟑. 𝟕𝟗 = 1.95

Handouts For Stt101 BBLL (Chapters 1-4) Descriptive Statistics

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Handouts For Stt101 BBLL (Chapters 1-4) Descriptive Statistics

Uploaded by

Copyright:

Available Formats

Handouts for STT101 BbLl

(Chapters 1-4) DESCRIPTIVE STATISTICS…

d. Statistics (Singular and plural sense):

2.) Two Major Areas in Statistics:

b. Inductive / Inferential Statistics – is concerned with making generalizations from

7.) Four Methods of Collecting Data:

9.) Presentation of Data: Data may be presented by

Daniel R. Sanson’s Property 2

Other examples of tabular presentation

Cross-tabulation (Two-way table)

b. Ordering – The data is arranged in descending (highest to lowest) or ascending (lowest to

Daniel R. Sanson’s Property 3

d.Grouping Scores into a Class frequency distribution:

Daniel R. Sanson’s Property 4

Uses of the frequency polygon.

4.Pie Chart and Pictograph

1.)Measures of Central Tendency

Daniel R. Sanson’s Property 5

Solution: Arranging first the data from highest to lowest or vice-versa.

Ex 2.) Find the median of the following set of scores in ED 103PRT.

Ex. The mode of example 1 is 38. → ̂ = 38

Example 2 has no mode

a.) The Mean : The class mark or midpoint method

b.) The Median.

Daniel R. Sanson’s Property 7

A.) From Ungrouped Data:

b.) D8: L = 8(12) /10 = 9.6 → 10.

c.) Q1: L = 1(12) / 4 = 3

B.) From Grouped Data:

a.) Percentile: The computing formula is Pm = Lm + ( mn/100 – cf<) c .

b.) Decile: The computing formula is Dm = Lm + ( mn/10 – cf<) c .

c.) Quartile: The computing formula is Qm = Lm + ( mn/4 – cf<) c .

a.) P43: mn/100 = 43(48)/100 = 20.64 → class (67 – 71)

b.) D9: mn/10 = 9(48) /10 = 43.2 → class (77 -81)

Range = Highest Score – Lowest Score. → R = Hs – Ls

From Ex. 1: Eng 1: R = 84 – 75 = 9

𝒔 = √𝒔𝟐 = √𝟑. 𝟕𝟗 = 1.95

Daniel R. Sanson’s Property 10

You might also like