Professional Documents
Culture Documents
MODULE 0 CPE105 Review On Statistics
MODULE 0 CPE105 Review On Statistics
1.) Definitions:
a. Data (Datum): Items in a record or report are facts expressed in numbers or described
by their quality or kind. These facts are called data. The major concern of Statistics is
about data and how to deal with it.
Ex. 1. Color of the eyes 3. scores
2. Class size 4. height
b. Population, Sample: A population is a collection of all the units from which data is to
be collected. A subset or a representative part of the population is called sample .
c. Census, sampling: Census is the process where information is gathered from all the
units in the population, sampling is the process when only a part of the population is
used to obtain data. The information derived from the sample is used to make some
generalizations about the whole population. Errors are unavoidable when these
generalizations are made. The role of statistics is to provide the procedures so as to
minimize these errors.
1
Variables are the characteristics or properties measured from objects, persons or
things.
a. Discrete variables can be counted thus assume a value which is a whole number.
Example: Number of passers or failures in the LET
b. Continues variables can be measured using some units of measurement, which may
take some decimal numbers. Example: weights, heights, ages
5.) Properties of Numbers :
Another way of looking at data is on the way they are measured. Measurements are
always associated with numbers that have some interesting properties or characteristics.
a. Identity – it enables a person to distinguish one number from the other. They are
identified by their shapes or the way they are written. This is the simplest property of
numbers.
b. Order – it refers to the way the numbers are arranged in a sequence. It is an
established convention that 1 comes before 2, 2 comes before 3, and so on. We also
say that “7 is greater than 6” or “3 is less than 5”.
c. Additivity / equality of scale – the property that allows us to add numbers. There is
equality of scale used, when we say “ 3g + 5g = 8g” , we are confident that we are
correct because of equality of scale. This means that in the sequence 1g, 2g, 3g. … the
distance from 1g to 2g is the same as the distance from 2g to 3g, is the same as the
distance from 3g to 4g and so on. Certainly, 2 flowers + 5 peanuts is not equal to 7
apples because the scales used are not equal.
d. “Absolute zero” – it means it has nothing of the characteristic that is being measured.
In temperature, the characteristic that is being measured is ‘amount of heat’. A
temperature of zero degree Celsius does not mean that the object does not have any
amount of heat. If a student received a score of zero in an intelligence exam, it does
not necessarily mean that the student has ‘no intelligence’ at all. A length of zero
meters absolutely means the object has ‘no length’. An object with a weight of zero
pound is certainly weightless.
6.) Four Types of measurement:
a. Nominal –measurements possess only the property of identity. Ex. Color and sex.
b. Ordinal - measurements possess the properties of identity and order but do not have
the equality of scale property. Ordinal measurements are usually associated with
ranks. When students are ranked according to class performance, an order of 1st, 2nd,
3rd, …can be established. But these numbers can not be added because the distance
from 1st to 2nd may not be the same as the distance from 2nd to 3rd, and so on. Ex.
Social classes, military ranks, honor roll, and taste preferences.
c. Interval – measurements possess the properties of identity, order, and equality of
scale. Temperature and Intelligence Scores are examples of interval measurement.
The numbers associated with these variables have identity and order and they can be
added in the usual manner. However they do not have the property of absolute zero.
d. Ratio - measurements possess all the properties of identity, order, equality of scale
and absolute zero. This is the highest form of measurement. Ex. Length, weight.
A large number of statistical analysis tools are available for each type of measurement. It is
important that the statistical user have a good understanding of the type of data that is to be
processed in order that the chosen statistical tool is used properly.
Table 1.
Units digit 0 1 2 3 4 5 6 7 8 9 T
Tens digit
8 1 11 1 1 1 1 7
7 111 1 11 1111 11 11 11 1 11 1 20
6 1 11 1 11 1 1 111 11 11 1111 19
5 1 1 2
T 5 5 3 7 4 3 7 3 6 5 48
3
Scores: 72 45 61 69 45 61 37 69 45 88 41
Answer:
Scores Consecutive #s Rank
88 1 1
72 2 2
69 3 3.5
69 4 3.5
61 5 5.5
61 6 5.5
45 7 8
45 8 8
45 9 8
41 10 10
37 11 11 //
The rank of 61 is 5.5
For large number of scores:
Procedure:
1. Tabulate the scores in a talligram.
2. Use the talligram as an aid in arranging the scores in descending order in a vertical
column writing a score only once even if it occurs two or more times. This is the
first column.
3. Number the scores consecutively. The consecutive numbers of a score must be
equal to the number of times it occurs. This is the second column.
4. Assign the ranks on the third column by following rule 3 for small number of
scores.
Construction Procedure:
On the XY plane reflect the mid points of the classes on the X axis and the class
frequencies on the Y axis. Locate and connect the intersection points of the mid points of the
classes and their corresponding frequencies. To close the curve connect the end points to the
baseline along the two extended midpoints below and above the distribution.
2. Bar Chart
The bar chart is a graph consisting of bars or rectangles placed side-by-side vertically
representing the frequencies of classes in a class frequency distribution. The width of the bar or
rectangle is the width of the interval represented by the class limits in the horizontal axis. The length
of the rectangle represented by the class frequency is drawn in the vertical axis.
5
3. Histogram
The histogram is a graph which is a close resemblance of the bar chart. The bar chart uses
the class limits for the horizontal axis while the histogram employs the class boundaries. Using the
class boundaries eliminates the spaces between the rectangles giving it a solid appearance. Usually,
but not necessarily, the sides of the adjacent sides of the bars are omitted so that the graph also looks
like a polygon.
Descriptive Measures:
To investigate a set of data, it is useful to define measures that describe its important features.
We have Measures of Central Tendency and other locations, and Measures of Variability.
A.) The mean, median, and mode of UNGROUPED data (raw or real data/scores):
a. The Mean – It is the most popular and the most reliable measure of central tendency. It is the
average of a set of scores or observations.
The mean is the sum of a set of scores (or observations) divided by the total number of scores
in the set. The formula is:
x=
∑x where: x are the scores,
n
n is the number of scores, and
∑ is the symbol for summation
The Weighted Mean is the average computed for k quantities x1, x2, …, xk where more
significance is attached to some scores than to others, weights w1, w2, …, wk are assigned to the k
quantities respectively. These weights represent measures of relative importance to the individual
scores. The formula is:
x=
∑ x i wi where: xi are the scores,
∑ wi
wi are the weights of each score
∑ is the symbol for
summation
Ex: 1.) Find the mean of the following test scores in Math 1.
71, 68, 68, 58, 55, 52, 52, 45, 38, 38, 38, 30, 25, 25
Solution:
The mean is
6
x=
∑x = 71 + 68 + 68 + 58 + 55 + 52 + 52 + 45 + 38 + 38 + 38 + 30 + 25+25
n
14
= 663
14
= 47.36
Ex: 2.) When Nikka Sanchez was in her fourth year high school her final grades in Math 4,
English 4, Filipino 4, Physics, Chemistry, Journalism, World History, and Research
were 78, 89, 90, 79, 83, 93, 89, and 95 respectively. If each subject had the equivalent
units of 2, 1, 1, 1.5, 1.5, 1, 1, and 1.5 respectively what was her weighted average
grade when she graduated?
Solution:
78(2) + 89(1) + 90(1) + 79(1.5) + 83(1.5) + 93(1) + 89(1) + 95(1.5)
x=¿ 2+1+1+1.5+1.5+1+1+1.5
x = 902.5
10.5
x = 85.95
b. The Median ( ~ x ) is a point in a scale which divides the scale into two equal parts. A scale is a
succession of numbers, steps, classes, degrees, gradations, or categories with a fixed interval. The
median is just the middle value of a set of observations arranged in an increasing or decreasing order
of magnitude. It is the middle score or value when the number of observations is odd, or the
arithmetic mean of the two middle values when the number of observations is even. It is the value
such that half of the observations fall above it and the other half fall below it.
Formula:
x(n+1)/2 if n is odd
~
x= {
½ ( x n+ xn +1 ) if n is even
2 2
Ex: 1.) Find the median of the following test scores in Math 1.
25, 71, 52, 68, 58, 55, 38, 52, 45, 38, 38, 30, 25, 68
c. The Mode ( ^x ). It is the most frequently occurring score in a set of data or the score with the
highest
7
frequency. A set of score data can have one mode (unimodal), two modes (bimodal), three
modes
(trimodal), or more, or no mode at all. The mode is the poorest measure of central tendency.
B.) The mean, median, and mode of GROUPED data (class frequency distribution):
Formula:
__
x=
∑ f i xi where fi is the class frequency of the ith class interval
n
xi is the class mark or the midpoint of the ith class
interval
Ex. Table 2.
---------------------------------------------------------------------------------
Class fi xi fixi
Interval (Mid pts/class marks)
87 – 91 1 89 89
82 – 86 3 84 252
77 – 81 7 79 553
72 – 76 12 74 888
67 – 71 10 69 690
62 – 66 8 64 512
57 – 61 7 59 413
_______________________________________________________
n = 48 ∑ fixi = 3397
x=
∑ f i x i = 3397 =70.77
n 48
f 10 10
c.) The Mode.
Crude Mode: x^ = Lm + C where c is the class interval and Lm is the
2 lower class boundary of the modal class
Refined Mode: ^x = 3~
x – 2x
3. If L is an integer, the desired fractile gets the average of the Lth and the (L+1)th
observation.
9
If L is fractional, the desired fractile gets the next higher integer to find the required location.
The fractile corresponds to the value in that location.
Examples:
1.) Find P63, D8, and Q1 in the following set of score data in Bio 1.
95, 34, 45, 67, 56, 58, 76, 87, 91, 39, 56, 78
Solution:
Data arranged in ascending order: 34, 39, 45, 56, 56, 58, 67, 76, 78, 87, 91, 95. n = 12
Example 1. The same group of 8 students took their final exams in English 1 and Math 1.
Their scores and the mean scores are
__
English 1 : 75, 77, 80, 80, 81, 82, 83, 84 x́ = 80.25
__
Math 1 : 60, 65, 76, 82, 83, 85, 95, 96 x́ = 80.25
The two sets of data have the same means equal to 80.25 but they are not identical. Scores in
English 1 clustered close to the mean while scores in Math 1 are more dispersed about its mean. The
measures used to describe this variation are the range, the variance, the standard deviation, the
quartile deviation, and the coefficient of variation. The quartile deviation and coefficient of variation
will no longer be discussed in this material. Our discussion will be limited only to the three common
measures of variability (Range, variance and standard deviation) of ungrouped data.
1.) The range is the easiest to compute but it is the poorest measure of dispersion. The larger
the range, the more dispersed is the data.
11
2.) Another measure of variability is the variance. It is always non-negative and thus it can
never be negative. A large variance corresponds to a highly dispersed set of values. It makes
use of all observations in the data set. Its unit of measure is the square of the unit of measure
of the given set of values.
(∑ )
n n 2
n∑ x −
Variance, s2 . Formula: s2= ∑
2 2
(x−x) i xi
or 2 i=1 i=1
n−1 s=
n( n−1)
Example 2. The number of hours spent by ten students (out of 40 in Bio 1) in studying per
day were recorded as follows: 5, 8, 4, 2, 2, 2, 2, 5, 3, and 4. Find the variance.
Solution:
Total
xi 5 8 4 2 2 2 2 5 3 4 37
2
xi 25 64 16 4 4 4 4 25 9 16 171
So we have:
❑ ❑
∑
❑
x i=37 and ∑
❑
2
x i =171
10 ( 171 )−( 37 )2
s2= =3.79 sq.hrs.
10 (9)
s= √ s =√ 3.79=¿1.95
2
12