Professional Documents
Culture Documents
MODULE 0 Review On Statistics
MODULE 0 Review On Statistics
MODULE 0 Review On Statistics
Instruction: Read and understand the concepts in this short review then answer the exercises. Show
your solution.
1.) Definitions:
a. Data (Datum): Items in a record or report are facts expressed in numbers or described by
their quality or kind. These facts are called data. The major concern of Statistics is about
data and how to deal with it.
Ex. 1. Color of the eyes 3. scores
2. Class size 4. height
b. Population, Sample: A population is a collection of all the units from which data is to be
collected. A subset or a representative part of the population is called sample .
c. Census, sampling: Census is the process where information is gathered from all the units
in the population, sampling is the process when only a part of the population is used to
obtain data. The information derived from the sample is used to make some
generalizations about the whole population. Errors are unavoidable when these
generalizations are made. The role of statistics is to provide the procedures so as to
minimize these errors.
3
Class frequency distribution is the process of placing scores in scaled groups called classes.
A class is a group of a specified number of consecutive single scores or measures. The specified
number of consecutive scores that a class contains is called the class width. The lower end-
number of the class is called “lower limit” and the upper-end of the class is called “upper limit”.
Example: The class 36 – 41 has the lower limit of 36 and the upper-limit of 41. The lower class
boundary is 35.5 and the upper class boundary is 41.5. The class width is 6 because there are six
consecutive single scores contained in the class: 36, 37, 38, 39, 40, and 41.
Ex:The following are test scores in Math 31. Construct the class frequency distribution
table.
86 74 66 70 75 57 69 70 73 66 60 81
90 62 76 72 61 58 63 68 73 63 75 71
63 66 74 73 78 61 78 72 67 83 59 67
68 64 59 73 69 76 80 81 79 84 77 68
Steps: 1. Find the range R. Range = Highest score – Lowest score. Ex. R = 90 – 57 = 33
2. Determine/ estimate the number of intervals/classes, k.
Formula: k = √n Ex. k= √48 = 6.9 = 7 .
3. Find the class width ( c ) or the width of the interval. Divide the range by the tentative number
of classes and raise the quotient to the next higher integer if there is any fractional part
Ex. c = R/k = 33 / 7 = 4.7 = 5.
4. Find the lowest limit of the classes. This is the number equal to or next lower than the lowest
score. ( The lowest score must be contained in the lowest class while the highest score must
be contained in the highest class.)
5. Find the lower and upper limits of the classes.
6. Tally the scores.
7. Write the frequencies, class boundaries, cumulative frequencies.
Table 2.
------------------------------------------------------------------------------------------------------------------------------
Class Tally f Class Mid pts Cum Freq
Interval Boundaries (Class marks) (lesser than)
87 – 91 1 1 86.5 – 91.5 89 48
82 – 86 111 3 81.5 – 86.5 84 47
77 – 81 1111111 7 76.5 – 81.5 79 44
72 – 76 111111111111 12 71.5 – 76.5 74 37
67 – 71 1111111111 10 66.5 – 71.5 69 25
62 – 66 11111111 8 61.5 – 66.5 64 15
57 – 61 1111111 7 56.5 – 61.5 59 7
n = 48
USES of a Class Frequency Distribution (CFD):
1. It shows whether the distribution is normal or skewed. It also indicates the relative difficulty of
the test from which the scores are taken.
If most of the scores are massed at the middle portion of the frequency table, the distribution
is normal and the test is of moderate difficulty.
If most of the scores are gathered at the upper portion of the distribution, the distribution is
skewed to the left or skewed negatively. The test is relatively easy for the students.
If the majority of the scores are clustered at the lower part of the frequency table, the
distribution is skewed to the right or skewed positively and the test is relatively difficult.
2. It facilitates the computation of statistical measures such as the median, mean, quartiles,
percentiles, standard deviation, etc.
3. Grouping also minimizes space.
Construction Procedure:
On the XY plane reflect the mid points of the classes on the X axis and the class
frequencies on the Y axis. Locate and connect the intersection points of the mid points of the
4
classes and their corresponding frequencies. To close the curve connect the end points to the
baseline along the two extended midpoints below and above the distribution.
2. Bar Chart
The bar chart is a graph consisting of bars or rectangles placed side-by-side vertically
representing the frequencies of classes in a class frequency distribution. The width of the bar or
rectangle is the width of the interval represented by the class limits in the horizontal axis. The length of
the rectangle represented by the class frequency is drawn in the vertical axis.
3. Histogram
The histogram is a graph which is a close resemblance of the bar chart. The bar chart uses the
class limits for the horizontal axis while the histogram employs the class boundaries. Using the class
boundaries eliminates the spaces between the rectangles giving it a solid appearance. Usually, but not
necessarily, the sides of the adjacent sides of the bars are omitted so that the graph also looks like a
polygon.
Descriptive Measures:
To investigate a set of data, it is useful to define measures that describe its important features.
We have Measures of Central Tendency and other locations, and Measures of Variability.
A.) The mean, median, and mode of UNGROUPED data (raw or real data/scores):
a. The Mean – It is the most popular and the most reliable measure of central tendency. It is the average
of a set of scores or observations.
The mean is the sum of a set of scores (or observations) divided by the total number of scores in
the set. The formula is:
x=
∑x where: x are the scores,
n
n is the number of scores, and
∑ is the symbol for summation
The Weighted Mean is the average computed for k quantities x1, x2, …, xk where more
significance is attached to some scores than to others, weights w1, w2, …, wk are assigned to the k
quantities respectively. These weights represent measures of relative importance to the individual scores.
The formula is:
5
x=
∑ x i wi where: xi are the scores,
∑ wi
wi are the weights of each score
∑ is the symbol for
summation
Ex: 1.) Find the mean of the following test scores in Math 1.
71, 68, 68, 58, 55, 52, 52, 45, 38, 38, 38, 30, 25, 25
Solution:
The mean is
x=
∑ x = 71 + 68 + 68 + 58 + 55 + 52 + 52 + 45 + 38 + 38 + 38 + 30 + 25+25
n
14
= 663
14
= 47.36
Ex: 2.) When Nikka Sanchez was in her fourth year high school her final grades in Math 4,
English 4, Filipino 4, Physics, Chemistry, Journalism, World History, and Research were
78, 89, 90, 79, 83, 93, 89, and 95 respectively. If each subject had the equivalent units of
2, 1, 1, 1.5, 1.5, 1, 1, and 1.5 respectively what was her weighted average grade when she
graduated?
Solution:
78(2) + 89(1) + 90(1) + 79(1.5) + 83(1.5) + 93(1) + 89(1) + 95(1.5)
x=¿ 2+1+1+1.5+1.5+1+1+1.5
x = 902.5
10.5
x = 85.95
b. The Median ( ~ x ) is a point in a scale which divides the scale into two equal parts. A scale is a
succession of numbers, steps, classes, degrees, gradations, or categories with a fixed interval. The
median is just the middle value of a set of observations arranged in an increasing or decreasing order of
magnitude. It is the middle score or value when the number of observations is odd, or the arithmetic
mean of the two middle values when the number of observations is even. It is the value such that half of
the observations fall above it and the other half fall below it.
Formula:
x(n+1)/2 if n is odd
~
x= {
½ ( x n+ xn +1 ) if n is even
2 2
Ex: 1.) Find the median of the following test scores in Math 1.
25, 71, 52, 68, 58, 55, 38, 52, 45, 38, 38, 30, 25, 68
6
Since n = 7 (which is odd), we have a middle score. Hence
~
x = xn+1 = x7+1 = x4 = 18.
2 2
c. The Mode ( ^x ). It is the most frequently occurring score in a set of data or the score with the highest
frequency. A set of score data can have one mode (unimodal), two modes (bimodal), three modes
(trimodal), or more, or no mode at all. The mode is the poorest measure of central tendency.
B.) The mean, median, and mode of GROUPED data (class frequency distribution):
Formula:
__
x=
∑ f i xi where fi is the class frequency of the ith class interval
n
xi is the class mark or the midpoint of the ith class interval
Ex. Table 2.
---------------------------------------------------------------------------------
Class fi xi fixi
Interval (Mid pts/class marks)
87 – 91 1 89 89
82 – 86 3 84 252
77 – 81 7 79 553
72 – 76 12 74 888
67 – 71 10 69 690
62 – 66 8 64 512
57 – 61 7 59 413
_______________________________________________________
n = 48 ∑ fixi = 3397
x=
∑ f i x i = 3397 =70.77
n 48
7
62 – 66 11111111 8 61.5 – 66.5 64 15
57 – 61 1111111 7 56.5 – 61.5 59 7
____________________________________________________________________________________
n = 48
Solution:
n/2 = 48/2 = 24 → median class is (67 – 71)
cf< = 15, the cum freq lower than 24
Lm = 66.5
f = 10
c=5
~
x = Lm + ( n/2 – cf<) c = 66.5 + ( 24 – 15) 5 = 66.5 + 45 = 66.5 + 4.5 = 71 .
f 10 10
c.) The Mode.
Crude Mode: ^x = Lm + C = 74 where c is the class interval and Lm is the
2 lower class boundary of the modal class
Refined Mode: ^x = 3~
x – 2 x = 71.46
3. If L is an integer, the desired fractile gets the average of the Lth and the (L+1)th observation.
If L is fractional, the desired fractile gets the next higher integer to find the required location.
The fractile corresponds to the value in that location.
Examples:
1.) Find P63, D8, and Q1 in the following set of score data in Bio 1.
95, 34, 45, 67, 56, 58, 76, 87, 91, 39, 56, 78
Solution:
Data arranged in ascending order: 34, 39, 45, 56, 56, 58, 67, 76, 78, 87, 91, 95. n = 12
8
c.) Q1: L = 1(12) / 4 = 3
This means that the 1st quartile is the average between the 3rd and the 4th value in the data. Hence,
Q1 = (45 + 56) / 2 = 50.5 . This further means that 25% of the data falls below 50.5
9
The measures of central tendency characterize only the location at which a given set of data
clusters to. To further give an adequate description on how the data cluster around or scatter away from
the central point, we need to know another important measures that also characterize a given set of data,
we call it measures of variability or dispersion or spread.
Example 1. The same group of 8 students took their final exams in English 1 and Math 1. Their
scores and the mean scores are
__
English 1 : 75, 77, 80, 80, 81, 82, 83, 84 x́ = 80.25
__
Math 1 : 60, 65, 76, 82, 83, 85, 95, 96 x́ = 80.25
The two sets of data have the same means equal to 80.25 but they are not identical. Scores in
English 1 clustered close to the mean while scores in Math 1 are more dispersed about its mean. The
measures used to describe this variation are the range, the variance, the standard deviation, the quartile
deviation, and the coefficient of variation. The quartile deviation and coefficient of variation will no
longer be discussed in this material. Our discussion will be limited only to the three common measures
of variability (Range, variance and standard deviation) of ungrouped data.
1.) The range is the easiest to compute but it is the poorest measure of dispersion. The larger the
range, the more dispersed is the data.
2.) Another measure of variability is the variance. It is always non-negative and thus it can never
be negative. A large variance corresponds to a highly dispersed set of values. It makes use of all
observations in the data set. Its unit of measure is the square of the unit of measure of the given
set of values.
( )
❑ ❑ 2
n ∑ x 2i − ∑ xi
Variance, s2 . Formula: s2= ∑
2
(x−x)
or 2 ❑ ❑
n−1 s=
n( n−1)
Example 2. The number of hours spent by ten students (out of 40 in Bio 1) in studying per day
were recorded as follows: 5, 8, 4, 2, 2, 2, 2, 5, 3, and 4. Find the variance.
Solution:
Total
xi 5 8 4 2 2 2 2 5 3 4 37
2
xi 25 64 16 4 4 4 4 25 9 16 171
So we have:
❑ ❑
∑ x i=37 and
❑
∑
❑
x 2i =171
2
2 10 ( 171 )−( 37 )
s= =3.79 sq.hrs.
10 (9)
s= √ s =√ 3.79=¿1.95
2
10
11