Professional Documents
Culture Documents
Handouts For Stt101 BBLL (Chapters 1-4) Descriptive Statistics
Handouts For Stt101 BBLL (Chapters 1-4) Descriptive Statistics
c. Census, sampling: Census is the process where information is gathered from all the units
in the population, sampling is the process when only a part of the population is used to
obtain data. The information derived from the sample is used to make some
generalizations about the whole population. Errors are unavoidable when these
generalizations are made. The role of statistics is to provide the procedures so as to
minimize these errors.
e.) Graphing:
Graphical Presentation of Class Frequency Distributions:
1. Frequency polygon – It is a linear graph representing the frequencies of the midpoints of the
classes in a class frequency distribution and which forms a polygon when its ends are joined with the
baseline.
Construction Procedure:
On the XY plane reflect the mid points of the classes on the X axis and the class frequencies on the
Y axis. Locate and connect the intersection points of the mid points of the classes and their
corresponding frequencies. To close the curve connect the end points to the baseline along the two
extended midpoints below and above the distribution.
2. Bar Chart
The bar chart is a graph consisting of bars or rectangles placed side-by-side vertically
representing the frequencies of classes in a class frequency distribution. The width of the bar or rectangle
is the width of the interval represented by the class limits in the horizontal axis. The length of the rectangle
represented by the class frequency is drawn in the vertical axis.
3. Histogram
The histogram is a graph which is a close resemblance of the bar chart. The bar chart uses the
class limits for the horizontal axis while the histogram employs the class boundaries. Using the class
boundaries eliminates the spaces between the rectangles giving it a solid appearance. Usually, but not
necessarily, the sides of the adjacent sides of the bars are omitted so that the graph also looks like a
polygon.
Descriptive Measures:
To investigate a set of data, it is useful to define measures that describe its important features.
We have Measures of Central Tendency and other locations, and Measures of Variability.
A.) The mean, median, and mode of UNGROUPED data (raw or real data/scores):
a. The Mean – It is the most popular and the most reliable measure of central tendency. It is the average
of a set of scores or observations.
The Weighted Mean is the average computed for k quantities x1, x2, …, xk where more
significance is attached to some scores than to others, weights w1, w2, …, wk are assigned to the k
quantities respectively. These weights represent measures of relative importance to the individual scores.
The formula is:
∑𝒙 𝒘
̅ = ∑𝒊 𝒊
𝒙 where: xi are the scores,
𝒘 𝒊
wi are the weights of each score
∑ is the symbol for summation
Ex: 1.) Find the mean of the following test scores in Math 1.
71, 68, 68, 58, 55, 52, 52, 45, 38, 38, 38, 30, 25, 25
Solution:
The mean is
∑𝑥
𝑥̅ = = 71 + 68 + 68 + 58 + 55 + 52 + 52 + 45 + 38 + 38 + 38 + 30 + 25+25
𝑛
14
= 663
14
= 47.36
Ex: 2.) When Nikka Sanchez was in her fourth year high school her final grades in Math 4,
English 4, Filipino 4, Physics, Chemistry, Journalism, World History, and Research were
78, 89, 90, 79, 83, 93, 89, and 95 respectively. If each subject had the equivalent units of
2, 1, 1, 1.5, 1.5, 1, 1, and 1.5 respectively what was her weighted average grade when she
graduated?
Solution:
78(2) + 89(1) + 90(1) + 79(1.5) + 83(1.5) + 93(1) + 89(1) + 95(1.5)
̅=
𝒙 2+1+1+1.5+1.5+1+1+1.5
̅ = 902.5
𝒙
10.5
̅ = 85.95
𝒙
b. The Median ( 𝑥̃ ) is a point in a scale which divides the scale into two equal parts. A scale is a
succession of numbers, steps, classes, degrees, gradations, or categories with a fixed interval. The
median is just the middle value of a set of observations arranged in an increasing or decreasing order of
magnitude. It is the middle score or value when the number of observations is odd, or the arithmetic
mean of the two middle values when the number of observations is even. It is the value such that half of
the observations fall above it and the other half fall below it.
Formula:
x(n+1)/2 if n is odd
𝑥̃= {
½ ( x n+ xn +1 ) if n is even
2 2
Ex: 1.) Find the median of the following test scores in Math 1.
25, 71, 52, 68, 58, 55, 38, 52, 45, 38, 38, 30, 25, 68
̂ ). It is the most frequently occurring score in a set of data or the score with the highest
c. The Mode ( 𝒙
frequency. A set of score data can have one mode (unimodal), two modes (bimodal), three modes
(trimodal), or more, or no mode at all. The mode is the poorest measure of central tendency.
B.) The mean, median, and mode of GROUPED data (class frequency distribution):
Formula:
__
∑ 𝑓𝑖 𝑥𝑖
𝑥̅ = where fi is the class frequency of the ith class interval
𝑛
xi is the class mark or the midpoint of the ith class interval
Ex. Table 2.
---------------------------------------------------------------------------------
Class fi xi fixi
Interval (Mid pts/class marks)
87 – 91 1 89 89
82 – 86 3 84 252
77 – 81 7 79 553
72 – 76 12 74 888
67 – 71 10 69 690
62 – 66 8 64 512
57 – 61 7 59 413
_______________________________________________________
n = 48 ∑ fixi = 3397
∑ 𝑓𝑖 𝑥𝑖 3397
𝑥̅ = = = 70.77
𝑛 48
̃ = Lm + ( n/2 – cf<) c .
𝒙 where Lm = lower class boundary of the median class
f n = total frequency or total number of observations
cf< = cumulative frequency equal to or next lower than the
n/2
c = class interval
f = frequency of the median class
3. If L is an integer, the desired fractile gets the average of the Lth and the (L+1)th observation.
If L is fractional, the desired fractile gets the next higher integer to find the required location.
The fractile corresponds to the value in that location.
Examples:
1.) Find P63, D8, and Q1 in the following set of score data in Bio 1.
95, 34, 45, 67, 56, 58, 76, 87, 91, 39, 56, 78
Solution:
Data arranged in ascending order: 34, 39, 45, 56, 56, 58, 67, 76, 78, 87, 91, 95. n = 12
Daniel R. Sanson’s Property 8
a.) P63: L = 63(12) / 100 = 7.56 → 8.
This means that the 8th value in the set of data is the 63rd percentile. Therefore, P63 = 76. This
means that 63% of the data falls below 76.
Example 1. The same group of 8 students took their final exams in English 1 and Math 1. Their
scores and the mean scores are
__
English 1 : 75, 77, 80, 80, 81, 82, 83, 84 𝑥̿ = 80.25
__
Math 1 : 60, 65, 76, 82, 83, 85, 95, 96 𝑥̿ = 80.25
The two sets of data have the same means equal to 80.25 but they are not identical. Scores in
English 1 clustered close to the mean while scores in Math 1 are more dispersed about its mean. The
measures used to describe this variation are the range, the variance, the standard deviation, the quartile
deviation, and the coefficient of variation. The quartile deviation and coefficient of variation will no
longer be discussed in this material. Our discussion will be limited only to the three common measures
of variability (Range, variance and standard deviation) of ungrouped data.
1.) The range is the easiest to compute but it is the poorest measure of dispersion. The larger the
range, the more dispersed is the data.
2.) Another measure of variability is the variance. It is always non-negative and thus it can never
be negative. A large variance corresponds to a highly dispersed set of values. It makes use of all
observations in the data set. Its unit of measure is the square of the unit of measure of the given
set of values.
𝟐
𝟐 ∑(𝒙−𝑥̅ )𝟐 𝟐 𝒏 ∑𝒏 𝟐 𝒏
𝒊=𝟏 𝒙𝒊 −(∑𝒊=𝟏 𝒙𝒊 )
Variance, s2 . Formula: 𝒔 = or 𝒔 =
𝒏−𝟏 𝒏(𝒏−𝟏)
Example 2. The number of hours spent by ten students (out of 40 in Bio 1) in studying per day
were recorded as follows: 5, 8, 4, 2, 2, 2, 2, 5, 3, and 4. Find the variance.
Solution:
Total
𝑥𝑖 5 8 4 2 2 2 2 5 3 4 37
2
𝑥𝑖 25 64 16 4 4 4 4 25 9 16 171
So we have:
∑ 𝑥𝑖 = 37 and ∑ 𝑥𝑖2 = 171
𝟏𝟎(𝟏𝟕𝟏)−(𝟑𝟕)𝟐
𝒔𝟐 = = 𝟑. 𝟕𝟗 sq.hrs.
𝟏𝟎(𝟗)
3.) Standard Deviation, s . Formula: 𝒔 = √𝒔𝟐
The standard deviation is the positive square root of the variance.
Hence, the standard deviation of example 2 above is