Professional Documents
Culture Documents
Module 1
Module 1
Statistical information and development can be traced back from ancient times.
People compiled statistical data with regard to all sorts of things such as agricultural
crops, taxes, athletic events, commerce and trade, and so on. As time went by,
statistical work has continued to have a marked influence on the activities of mankind in
a wider scope from describing important features of the data and analyzing them.
Definition of Terms
Statistics is the plural form of the word statistics. It has the same meaning as
the Latin word datum which means a fact or information. The plural of datum
is data.
Statistics can refer to the mere tabulation of numeric information as in reports of
stock, market transactions, or to the body of techniques used in processing or
analyzing data.
The broader meaning of statistics which will be used is the science of
collecting, organizing, presenting, analyzing and interpreting numerical
data.
Data are the raw material which the statistician works. Data can be found
through surveys, experiments, numerical records, and other modes of research.
Statistician is also used in several ways. It can be a person who simply collects
information or one who prepares analysis or interpretations. It may mean a
scholar who develops a mathematical theory on which the science of statistics is
based.
Statistics has two branches; descriptive statistics and inferential statistics.
STATISTICS
Collect
Organize Generalize
Hypothesis Testing
Present
Make Predictions
Analyze
Examples
Descriptive Statistics
a. According to the Census Bureau, 20% of all Filipino workers get to work via carpool.
c. Cigarettes were associated with 29% of the 4,470 civilian fire deaths in 1989 (The
Book of Odds, Plume, 1991).
Inferential Statistics
a. The National Eye Institute has halted a clinical trial on a type of eye surgery, calling it
ineffective and possibly harmful to a person‟s vision.
b. “Allergy therapy may make bees go away.” (April 1995)
c. The Gallup Poll says 1 out of 10 Filipinos is a member of a health club or fitness
center.
d. Drinking decaffeinated coffee can raise cholesterol levels by 7%. (Philippine Heart
Association)
Let’s Try
For each statement, decide whether descriptive or inferential statistics were used.
1. A recent study showed that eating garlic can lower blood pressure.
4. Last year‟s total attendance at Ateneo de Manila‟s basketball games was 8,345.
A variable is a characteristic that takes two or more values which varies across
individuals
Variable
Quantitative Qualitative
Discrete Continuous
2. Continuous variable – variable that can assume any numerical value over
over an interval or intervals
Examples: height, weight, temperature, time
Levels / Scales of Measurement of Data
A scale or level of measurement relates to the rules used to assign scores and
is an indicator of the kind of information that the scores provide.
The scale to which measurement belong will be important in determining
appropriate methods for data description and analys.
2. Ordinal data connote ranking or inequalities. One category is higher than the
other one.
3. Interval Scales indicate an actual amount and there is equal unit of measurement
separating each score, specifically, equal interval.
4. Ratio data are similar to interval data, but has an absolute zero and multiples are
meaningful.
Lets Try
Determine the level of measurement of the following variables.
1. License plate numbers
2. First 10 students ranked in a class
3. Coded responses ( strongly agree, agree, disagree, strongly disagree)
4. Percentage of the students taking science course ( 72%) and non-science
courses (28%)in a certain school
5. Species of grass that people have in their yard
Measures of Central Tendency
The organization of data maybe done by using tables while the summary of data
maybe displayed by means of graphs and charts. Another method of summarizing data
is to compute numbers, such as average, that describe a set of data. Numbers that are
used to describe sets of data are called descriptive measures. The most important
descriptive measures are the measures of central tendency and the measures of
dispersion or variation.
Descriptive measures that indicate where the center or the most typical value of
a set of data are called measures of central tendency, often more simply referred to as
average. A measure of central allocation, or central tendency, is a single number
that represents the typical score of the data. In this section, we will discuss the three
most important measures of central tendency: the mean, the median, and the mode.
Central Tendency
The mean is the average value of all the data in the set.
The median is the value that has exactly half the data above it and half below it.
The mode is the value that occurs most frequently in the set.
The Mean
The mean is the average of the scores – the mathematical center of a
distribution. It is used with symmetrical, unimodal distributions of interval or ratio
scores.
The most commonly used measure of central tendency is the mean. When
taking an average, it is the mean that is often referring to.
The mean of ungrouped data is defined as the sum of all the scores or data
divided by the number of scores in the data.
In particular, the mean is denoted by x of the scores x 1 , x 2 , …, x n is given by
the formula
n
x i
x1 x 2 ... x n
x i 1
n n
Example 1. Find the mean of 78, 78, 85, 84, and 80.
87 78 84 84 80
x 82.6
5
In general, the weighted arithmetic mean / weighted mean of a group of
numbers or scores designated by x 1 , x 2 , …, x n which occur w 1 , w 2 , …, w n
respectively is
w1x1 w 2 x 2 . . . w n x n
x
x1 x 2 . . . x n
Example 2. Compute the weighted arithmetic mean/weighted mean of the numbers 12,
15, 16, 12, 15, 18, 18, 20, 12, and 18 is given by (12 occurs 3 times, 15 occurs twice, 18
occurs 3 times, 16 once, and 20 once)
Example 4. A student obtained a mid-term grade of 79. What should he get in the last
grading period with a weight of 2 to have a final grade of 85?
2x 3(85) 79
x 88
The Mean for Grouped Data
The arithmetic mean/weighted mean for grouped data is given by
x
fx
n
Scores f
65-69 2
70-74 8
75-79 10
80-84 9
85-89 7
90-94 2
94-99 2
Solution.
1. Compute the midpoint of each interval and the fx as indicated in the following table:
Scores F x fx
65-69 2 67 134
70-74 8 72 576
75-79 10 77 770
80-84 9 82 738
85-89 7 87 609
90-94 2 92 184
94-99 2 97 194
n 40 fx 3205
x
fx 3205 80.13
n 40
Median
The median of a data is the number that divides the bottom 50% of the data from
the top 50%.
Solution.
a. 75, 78, 79, 80, 84, 85, 90 ( median = 80 )
b. 9, 10, 10, 11, 12, 15, 15, (median = 11)
11+12
c. 9, 10, 10, 11, 12, 13, 15, 15 ( median = = 11.5 )
2
Scores f
65-69 2
70-74 8
75-79 10
80-84 9
85-89 7
90-94 2
94-99 2
Solution.
1. Determine the median class ( n/2).
40 / 2 = 20 ; the 15th of the data is the the 3rd class with interval 75 – 79
Scores Boundaries f CF
65-69 64.5 – 69.5 2 2
70-74 69.5 – 74.5 8 10 CF<
75-79 74.5 – 79.5 10 20 median class
80-84 79.5 – 84.5 9 29
85-89 64.5 – 68.5 7 36
90-94 64.5 – 68.5 2 38
94-99 64.5 – 68.5 2 40
40
−10
Using the formula: 𝑥 = 74.5 + 2
∙5
10
= 74.5 + 5
= 79.5
The Mode
The mode is the most frequently occurring score in the ungrouped data. It is
used with scores from a nominal variable.
f mo f1
x̂ L mo c
2f mo f1 f 2
Where:
L mo = lower boundary of modal class
Scores f
65 - 69 2
70 - 74 8
75 - 79 10
80 - 84 9
85 - 89 7
90 - 94 2
94 - 99 2
𝟏𝟎 − 𝟖
𝐱 = 𝟕𝟒. 𝟓 + ∙𝟓
𝟐 𝟏𝟎 − 𝟖 − 𝟗
𝐱 = 𝟕𝟕. 𝟖𝟑
Remember:
1. when the scores in a distribution are more or less symmetrically grouped about a
central point;
2. when the research problem requires a measure of central tendency that will also
form the basis of other statistics such as measures of central tendency;
3. when the research problem requires the combination of the mean with the means of
other groups measured on the same variable.
1. when the research problem calls for knowledge of the exact midpoint of a
distribution;
2. when the extreme distort the mean as in our hypothetical example of annual
marriage ceremonies. the mean reflects extreme values, the median does not;
3. when dealing with “oddly-shaped‟ distributions, for example, those in which a high
proportion of extremely high scores occur as well as a low proportion of extremely
low ones.