Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

Module 1

Review of Basic Concepts of Probability and Statistics

At the end of this lesson the learner should be able to:

1. discuss the background and the development of statistics;


2. identify the fields of contributions of statistics;
3. define and differentiate the two branches of statistics; and
4. differentiate population from sample.

Statistical information and development can be traced back from ancient times.
People compiled statistical data with regard to all sorts of things such as agricultural
crops, taxes, athletic events, commerce and trade, and so on. As time went by,
statistical work has continued to have a marked influence on the activities of mankind in
a wider scope from describing important features of the data and analyzing them.

 Statistics is the science of conducting studies to collect, organize, summarize,


analyze, and draw conclusions from data.
 Statistics is used in almost all fields of human endeavor. In public health, a
doctor may be interested with a number of residents in an area who contract a
new strain of flue virus in a certain number of months.

Definition of Terms
 Statistics is the plural form of the word statistics. It has the same meaning as
the Latin word datum which means a fact or information. The plural of datum
is data.
 Statistics can refer to the mere tabulation of numeric information as in reports of
stock, market transactions, or to the body of techniques used in processing or
analyzing data.
 The broader meaning of statistics which will be used is the science of
collecting, organizing, presenting, analyzing and interpreting numerical
data.
 Data are the raw material which the statistician works. Data can be found
through surveys, experiments, numerical records, and other modes of research.
 Statistician is also used in several ways. It can be a person who simply collects
information or one who prepares analysis or interpretations. It may mean a
scholar who develops a mathematical theory on which the science of statistics is
based.
 Statistics has two branches; descriptive statistics and inferential statistics.
STATISTICS

Descriptive Statistics Inferential Statistics

Collect
Organize Generalize
Hypothesis Testing
Present
Make Predictions
Analyze

 Descriptive statistics is concerned with collecting, organizing, presenting, and


analyzing numerical data. Masses of unorganized numerical data are of little
value unless statistical techniques are available to organize this type of data into
a meaningful form.

 Inferential statistics is a technique that allow us to use samples to make


generalizations about the populations from which the samples were drawn. It is,
therefore, important that the sample accurately represents the population.
Descriptive Statistics versus Inferential Statistics

1. Describe the target population 1.Make inferences form the sample


and generalize them to the populaton
2. Organize, analyze and present 2. Compares, tests, and predicts
data in a meaningful manner future outcomes
3. Final results are shown in form 3. Final result is the probability scores
of charts, tables, and graphs 4. Tries to make conclusions about
4. Describe the data which is the population that is beyond the data
already known available
5. Statistical tools: Measures of 5. Statistical Tools: hypothesis tests;
central tendency analysis of variance, regression
(mean/median/mode) ; Measures of analysis, and correlation analysis
variability (range, standard
deviation, coefficient of variation)

Examples

Descriptive Statistics

a. According to the Census Bureau, 20% of all Filipino workers get to work via carpool.

b. According to the Court Administration of the Philippines, 14% of trial-ready civil


actions and equity cases in Metro Manila during 1993 were decided in less than six
months (May 14, 1995).

c. Cigarettes were associated with 29% of the 4,470 civilian fire deaths in 1989 (The
Book of Odds, Plume, 1991).

Inferential Statistics
a. The National Eye Institute has halted a clinical trial on a type of eye surgery, calling it
ineffective and possibly harmful to a person‟s vision.
b. “Allergy therapy may make bees go away.” (April 1995)
c. The Gallup Poll says 1 out of 10 Filipinos is a member of a health club or fitness
center.
d. Drinking decaffeinated coffee can raise cholesterol levels by 7%. (Philippine Heart
Association)
Let’s Try

For each statement, decide whether descriptive or inferential statistics were used.
1. A recent study showed that eating garlic can lower blood pressure.

2. The average number of students in a class at the University of the Philippines is


22.6.

3. It is predicted that the average number of automobiles each households owns


will increase next year.

4. Last year‟s total attendance at Ateneo de Manila‟s basketball games was 8,345.

5. The chance that a person will be robbed in a certain city is 15%.

Uses and Importance of Statistics and Statistical Analysis

 Education. Teaching – learning process, measurement and evaluation,


educational studies, enrollment, management and finance.
 Engineering. Design and test of performance, quality control.
 Business. Car dealer may look at past sales records for a specific month to
decide what types of automobiles and how many of each type to order for that
month of next year.
 Agriculture. Varieties of plant we should grow and the best combinations of
fertilizers, pesticides and densities of planting.
 In Automatic Data Processing. Construction, operation and use of high speed
computing and data processing equipment.
 Biology. Research and experimentation in life processes plants and animals to
promote growth or prolong life.
 Business. Production, distribution, sale of merchandise, auditing and
accounting procedures.
 Demography. Composition, distribution, growth of human populations, birth,
death, migration rates, social economic standing of population.
 Economics. Production, resources, trade, labor force, consumers and
procedures‟ responses to products and price changing, advertising system and
distribution.
 Entertainment. The number of local and foreign movies shown in the certain
area. The income generated by the local producers.
 Environmental Studies. The increase of birth defects and death toll near
nuclear power plants.
 Fisheries. Number of fish of a given species in the fishing grounds and the level
of quotas imposed on fisherman to maintain fish stocks.
 Government. Taxes and wages, material resources, movement of population,
election.
 Health. Public health program, hospitalization, problems of medical care,
occurrence and cost of diseases, accidents, and handicaps.
 Insurance. Morality, morbidity and accident rates of the population, rates of
premiums for property and insurance program.
 Manufacturing. Things to be done in improving the quality of a product and
when to stop a manufacturing process and reset the machine. The ordering of
materials and delivering of processed goods.
 Medicine. Causes, diagnosis, treatment and prevention of communicable and
non-communicable diseases.
 Psychology. Intelligence test, aptitudes, personality traits and attitudes, creation
of scales and measuring instruments.
 Research and Statistics. Probability, statistical theories and methods.
 Social Sciences. Social systems and social welfare, behavior patterns of
groups of people.
 Sports. Points made out of so many attempts from the field or foul from the line
such as in basketball, football, etc.
Variables and Data

 A variable is a characteristic that takes two or more values which varies across
individuals

Variable

Quantitative Qualitative

Discrete Continuous

 Two type of variables


1. Qualitatitive variables - represent differences in quality, character, or kind but
not in amount
Examples: sex, bithplace or geographic locations, marital status, eye color

2. Quantitative variables - numerical in nature and can be ordered or ranked


Examples: age, height, weight, bdy temperature, test scores

 Classification of Quantitative variables


1. Discrete variable – variable whose values can be counted using integral
values
Examples: number of enrollees, drop-outs, deaths, number of calls receiverd
by an operator

2. Continuous variable – variable that can assume any numerical value over
over an interval or intervals
Examples: height, weight, temperature, time
Levels / Scales of Measurement of Data
 A scale or level of measurement relates to the rules used to assign scores and
is an indicator of the kind of information that the scores provide.
 The scale to which measurement belong will be important in determining
appropriate methods for data description and analys.

Four Levels / Scales of Measurement


1. Nominal data use numbers for the purpose of identifying name of membership in
a group or category.

2. Ordinal data connote ranking or inequalities. One category is higher than the
other one.

3. Interval Scales indicate an actual amount and there is equal unit of measurement
separating each score, specifically, equal interval.

4. Ratio data are similar to interval data, but has an absolute zero and multiples are
meaningful.

Lets Try
Determine the level of measurement of the following variables.
1. License plate numbers
2. First 10 students ranked in a class
3. Coded responses ( strongly agree, agree, disagree, strongly disagree)
4. Percentage of the students taking science course ( 72%) and non-science
courses (28%)in a certain school
5. Species of grass that people have in their yard
Measures of Central Tendency

At the end of this lesson the learner should be able to:


1. Define mean, median, and mode.
2. Compute the mean, median, and mode for ungrouped and grouped data.
3. Compute the percentile of the given data.
4. Compare and explain the appropriate uses of measures of Central Tendency.

The organization of data maybe done by using tables while the summary of data
maybe displayed by means of graphs and charts. Another method of summarizing data
is to compute numbers, such as average, that describe a set of data. Numbers that are
used to describe sets of data are called descriptive measures. The most important
descriptive measures are the measures of central tendency and the measures of
dispersion or variation.
Descriptive measures that indicate where the center or the most typical value of
a set of data are called measures of central tendency, often more simply referred to as
average. A measure of central allocation, or central tendency, is a single number
that represents the typical score of the data. In this section, we will discuss the three
most important measures of central tendency: the mean, the median, and the mode.

Measures of Cental Tendency

Central Tendency

Mathematical Average Positional Average

Mean Median Mode

Weighted Harmonic Geometric

 The mean is the average value of all the data in the set.
 The median is the value that has exactly half the data above it and half below it.
 The mode is the value that occurs most frequently in the set.
The Mean
 The mean is the average of the scores – the mathematical center of a
distribution. It is used with symmetrical, unimodal distributions of interval or ratio
scores.
 The most commonly used measure of central tendency is the mean. When
taking an average, it is the mean that is often referring to.

The Mean for Ungrouped Data

 The mean of ungrouped data is defined as the sum of all the scores or data
divided by the number of scores in the data.
 In particular, the mean is denoted by x of the scores x 1 , x 2 , …, x n is given by
the formula
n

x i
x1  x 2  ...  x n
x i 1

n n

Example 1. Find the mean of 78, 78, 85, 84, and 80.

87  78  84  84  80
x  82.6
5
 In general, the weighted arithmetic mean / weighted mean of a group of
numbers or scores designated by x 1 , x 2 , …, x n which occur w 1 , w 2 , …, w n
respectively is

w1x1  w 2 x 2  . . .  w n x n
x 
x1  x 2  . . .  x n

Example 2. Compute the weighted arithmetic mean/weighted mean of the numbers 12,
15, 16, 12, 15, 18, 18, 20, 12, and 18 is given by (12 occurs 3 times, 15 occurs twice, 18
occurs 3 times, 16 once, and 20 once)

12(3)  15(2)  18(3)  16  20


x   15.6
3 2311
Or
12  15  16  12  15  18  18  20  12  18
x  15.6
10
Example 3. The class standing of a student is 84, while the preliminary examination is
79. Compute the preliminary grade if the weight of the class standing is 2 and the
preliminary examination is 1.

Solution. Let x be the preliminary grade of the student, then

x  2 (class standing)  1 (prelim exam )


3 3
2 1
 (84)  (79)
3 3
 82.33

Example 4. A student obtained a mid-term grade of 79. What should he get in the last
grading period with a weight of 2 to have a final grade of 85?

Solution. Let: x = last grading period grade


2 x  1 (79)  85
3 3

Solving for x gives

2x  3(85)  79

x  88
The Mean for Grouped Data
The arithmetic mean/weighted mean for grouped data is given by

x 
 fx
n

Where: f = frequency in each class


x = midpoint of each class = ( lower interval + upper interval) / 2
n = total number of score or observations

 Grouped data are commonly considered in terms of classes or intervals.


\
Example 5. Determine the mean of the scores of 40 students organized into the
frequency distribution as follows:

Scores f
65-69 2
70-74 8
75-79 10
80-84 9
85-89 7
90-94 2
94-99 2

Solution.
1. Compute the midpoint of each interval and the fx as indicated in the following table:
Scores F x fx
65-69 2 67 134
70-74 8 72 576
75-79 10 77 770
80-84 9 82 738
85-89 7 87 609
90-94 2 92 184
94-99 2 97 194
n  40  fx  3205

2. Use the formula to solve for the mean.

x 
 fx  3205  80.13
n 40
Median
 The median of a data is the number that divides the bottom 50% of the data from
the top 50%.

Median of ungrouped data


Steps in determining the median of ungrouped data
1. Arrange the data from the smallest to the largest (ascending order) or
largest to smallest (descending order)
2. Determine the middle value in the ordered list .The median divides the
data into two equal parts – the bottom 50% and the top 50%.
Note:
1. If the number of values of data is odd, then the median is the data value exactly
in the middle of the two middle values in the ordered list.
2. If the number of values of data is even, then the median is the average of the
two middle values in the ordered list.

Example 1. Determine the median of the following data.


a. 78, 85, 84, 90, 75, 79, 80
b. 10, 9, 12, 15, 10, 15, 11
c. 10, 9 12, 15, 10, 15, 11, 13

Solution.
a. 75, 78, 79, 80, 84, 85, 90 ( median = 80 )
b. 9, 10, 10, 11, 12, 15, 15, (median = 11)
11+12
c. 9, 10, 10, 11, 12, 13, 15, 15 ( median = = 11.5 )
2

Median of a grouped data


Formula
 n  CF 
 
~
x  Lb   2  c
fm

Where: Lb = lower limit of the class containing median


n = total number of scores / observations
CF< = cumulative number of frequencies in all the classes immediately preceding
the class containing the median
f m = frequency of the class containing the median
c = width of the class ( upper limit – lower limit ) + 1
Example 2. Compute the median of the scores of 40 students organized into the
frequency distribution as follows:

Scores f
65-69 2
70-74 8
75-79 10
80-84 9
85-89 7
90-94 2
94-99 2
Solution.
1. Determine the median class ( n/2).
40 / 2 = 20 ; the 15th of the data is the the 3rd class with interval 75 – 79

Scores Boundaries f CF
65-69 64.5 – 69.5 2 2
70-74 69.5 – 74.5 8 10 CF<
75-79 74.5 – 79.5 10 20 median class
80-84 79.5 – 84.5 9 29
85-89 64.5 – 68.5 7 36
90-94 64.5 – 68.5 2 38
94-99 64.5 – 68.5 2 40

From the table: Lb = 74.5


n = 40
CF< = 10
f m = 10
c = (69 – 65) + 1 = 5

40
−10
Using the formula: 𝑥 = 74.5 + 2
∙5
10

= 74.5 + 5

= 79.5
The Mode

Mode of ungrouped data

 The mode is the most frequently occurring score in the ungrouped data. It is
used with scores from a nominal variable.

Example 1. Determine the mode of the following data.


a. 75, 78, 80, 85, 80, 81, 81, 80
Answer: The mode is 80. It is unimodal because there is only one mode.

b. 75, 78, 80, 85, 81, 81, 79, 85


Answer: The mode is 81 and 85. It is bimodal because there are two modes.

c. 75, 78, 80, 83, 85, 79


Answer. There is no mode.

Mode of grouped data


Formula

 f mo  f1 
x̂  L mo     c
 2f mo  f1  f 2 

Where:
L mo = lower boundary of modal class

f mo  frequency of the modal class

f1  frequency of the class preceding the modal class

f 2  frequency of the class after the modal class


Example 2. Compute the mode of the scores of 40 students organized into the
frequency distribution as follows:

Scores f
65 - 69 2
70 - 74 8
75 - 79 10
80 - 84 9
85 - 89 7
90 - 94 2
94 - 99 2

Scores Boundaries frequency


65 - 69 64.5 – 69.5 2
70 - 74 69.5 – 74.5 8
75 - 79 74.5 – 79.5 10 (Modal class)
80 - 84 79.5 – 84.5 9
85 - 89 64.5 – 68.5 7
90 - 94 64.5 – 68.5 2
94 - 99 64.5 – 68.5 2

𝟏𝟎 − 𝟖
𝐱 = 𝟕𝟒. 𝟓 + ∙𝟓
𝟐 𝟏𝟎 − 𝟖 − 𝟗

𝐱 = 𝟕𝟕. 𝟖𝟑
Remember:

Use the Mean:

1. when the scores in a distribution are more or less symmetrically grouped about a
central point;
2. when the research problem requires a measure of central tendency that will also
form the basis of other statistics such as measures of central tendency;
3. when the research problem requires the combination of the mean with the means of
other groups measured on the same variable.

Use the Median:

1. when the research problem calls for knowledge of the exact midpoint of a
distribution;
2. when the extreme distort the mean as in our hypothetical example of annual
marriage ceremonies. the mean reflects extreme values, the median does not;
3. when dealing with “oddly-shaped‟ distributions, for example, those in which a high
proportion of extremely high scores occur as well as a low proportion of extremely
low ones.

Use the Mode:


1. when all that is required is a quick and appropriate way of determining central
tendency;
2. when in referring to what is „average‟, the word is used in the sense of the „typical‟ or
the „most usual‟.

You might also like