Professional Documents
Culture Documents
Stat 101 Module PDF
Stat 101 Module PDF
OF STATISTICS
1
FUNDAMENTALS OF STATISTICS
Table of Contents
INTRODUCTION 4
History 4
Chapter I Nature of Statistics
Definition of Statistics and Key Terms 6
Basic Terms 7
Summation Notation 12
Sampling 19
Chapter II Descriptive Statistics
Display Data 26
Tabular Presentation 27
Graphical Presentation 31
Measures of Central Tendency 45
Skewness 52
Measures of Variation/Spread 54
Chapter III Probability
Techniques of Counting 75
Probability of an Event 79
Special Discrete Probability Distribution 87
Chapter 4 Normal Distribution and the Central Limit Theorem
Normal Distribution 93
Standard Normal Distribution 100
Application of the Normal Distribution 110
Central Limit Theorem 114
Chapter V Confidence Interval
Point Estimates and Confidence Intervals 122
Confidence Interval for Population Mean with Known 125
Std dev for Large Samples
Confidence Interval for Population Mean with Unknown 129
Std dev for Small Samples
Confidence Interval for population Proportion 133
Finite Population Correction Factor 136
Choosing appropriate Sample Size 139
3
Batangas State University
FUNDAMENTALS OF STATISTICS
INTRODUCTION
You are probably asking yourself the question, "When and where will I use
statistics?" If you read any newspaper, watch television, or use the Internet, you
will see statistical information. There are statistics about crime, sports, education,
politics, and real estate. Typically, when you read a newspaper article or watch a
television news program, you are given sample information. With this information,
you may make a decision about the correctness of a statement, claim, or "fact."
Statistical methods can help you make the "best educated guess."
Included in this chapter are the basic ideas and words of probability and
statistics. You will soon understand that statistics and probability work together.
You will also learn how data are gathered and what "good" data can be
distinguished from "bad." (Introductory Business Statistics, page 5)
History of Statistics
Some computations of odds for games of chance were already made in
antiquity. Beginning around the 1200s increasingly elaborate results based on the
combinatorial enumeration of possibilities were obtained by mystics and
mathematicians, with systematically correct methods being developed in the mid-
1600s and early 1700s. The idea of making inferences from sampled data arose in
the mid-1600s in connection with estimating populations and developing
precursors of life insurance. The method of averaging to correct for what were
assumed to be random errors of observation began to be used, primarily in
astronomy, in the mid-1700s, while least squares fitting and the notion of
probability distributions became established around 1800. Probabilistic models
based on random variations between individuals began to be used in biology in the
4
Batangas State University
FUNDAMENTALS OF STATISTICS
mid-1800s, and many of the classical methods now used for statistical analysis
were developed in the late 1800s and early 1900s in the context of agricultural
research.
In physics fundamentally probabilistic models were central to the
introduction of statistical mechanics in the late 1800s and quantum mechanics in
the early 1900s. Beginning as early as the 1700s, the foundations of statistical
analysis have been vigorously debated, with a succession of fairly specific
approaches being claimed as the only ones capable of drawing unbiased
conclusions from data. The practical use of statistical analysis began to increase
rapidly in the 1960s and 1970s, particularly among biological and social scientists,
as computers became more widespread. All too often, however, inadequate
amounts of data have ended up being subjected to elaborate statistical analyses
whose results are then blindly assumed to represent definitive scientific
conclusions. In the 1980s, at least in some fields, traditional statistical analysis
began to become less popular, being replaced by more direct examination of data
presented graphically by computer. In addition, in the 1990s, particularly in the
context of consumer electronics devices, there has been an increasing emphasis on
using statistical analysis to make decisions from data, and methods such as fuzzy
logic and neural networks have become popular.(Stephen Wolfram, A New Kind of
Science (Wolfram Media, 2002), page 1082.© 2002, Stephen Wolfram, LLC)
5
Batangas State University
FUNDAMENTALS OF STATISTICS
Chapter 1
Nature of Statistics
The word statistics is derived from the Latin word status (meaning state). Early uses of
statistics involved compilations of data and graphs describing various aspects of a state or
country. This chapter will introduce students to Statistics which is more than the simple
collection, tabulation and summarizing of data. It will allow the student to learn how to
develop general and meaningful conclusions that go beyond the original data.
Learning Objectives
The aim of this section is for students to explain the basic terminology used in Statistics.
Also, to describe data and variables based on types and levels of measurement.
Demonstrate the different sampling methods and evaluate summation notations.
At the end of this section, the students should be able to discuss and give examples of the
basic terminology in Statistics. Solve problems by applying the concepts learned in this
section.
A. Division of Statistics
6
Batangas State University
FUNDAMENTALS OF STATISTICS
Note: If our inferences from the sample to the population are to be valid we
must obtain samples that are representative of the population. All too often we are
tempted to choose a sample by selecting the most convenient members of the
population. Such a procedure may lead to erroneous inferences concerning the
population.
Definition1.5: A parameter is any numerical value describing a characteristic of
a population.
8
Batangas State University
FUNDAMENTALS OF STATISTICS
9
Batangas State University
FUNDAMENTALS OF STATISTICS
Practice Exercises
A. Write Q-qualitative, IQ-Quantitative, N-Nominal, R-ratio, I –Interval and O-
Ordinal to determine the type and level of measurement of the given characteristics
of Batangas State University employees.
__________1. position
__________8. gender
__________9. religion
__________10. height
__________13. age
__________14. weight
11
Batangas State University
FUNDAMENTALS OF STATISTICS
b. Insurance companies are interested in the average health costs each year for their
clients, so that they can determine the costs of health insurance.
c. A marketing company is interested in the proportion of people that will buy a particular
product. Define the following in terms of the study. Give examples where appropriate.
Activity I
Movie Survey
Ask five classmates from a different class how many Netflix series they saw last month.
1. Record the data
2. In class, randomly pick one person. On the class list, mark that person's name. Move
down four people's names on the class list. Mark that person's name. Continue doing this
until you have marked 12 people's names. You may need to go back to the start of the list.
For each marked name record below the five data values. You now have a total of 60 data
values.
3. For each name marked, record the data:
The expression in front of the equals sign in what follows is summation notation;
the expression that follows gives the meaning of the expression in "longhand"
notation.
!
!!! 𝑥! = 𝑥! +𝑥! +....+𝑥!
The expression is read, "the sum of X sub i from i equals 1 to N." It means "add up
all the numbers." In the example set of five numbers (𝑥1 =5,𝑥2 =7,𝑥3 =7,𝑥4 =6,
!
!!! 𝑥! = 𝑥1 +𝑥2 +....+𝑥! =5+7+7+6+8=3
12
Batangas State University
FUNDAMENTALS OF STATISTICS
The "i=1" in the bottom of the summation notation tells where to begin the
sequence of summation. If the expression were written with "i=3", the summation
would start with the third number in the set. For example:
!
! 𝑥! = 𝑥! +𝑥! +....+𝑥!
In the example set of numbers, this would give the following result:
!
! 𝑥! = 𝑥! +𝑥! +....+𝑥! = 7 + 6 +8 = 21
The "N" in the upper part of the summation notation tells where to end the
sequence of summation. If there were only three scores then the summation and
example would be:
!
! 𝑥! = 𝑥! + 𝑥! + 𝑥! = 5+7+7=19
X Y
5 6
7 7
7 8
6 7
8 8
13
Batangas State University
FUNDAMENTALS OF STATISTICS
The preceding sum may be most easily computed by creating a third column on the
data table below
5 6 30
7 7 49
7 8 56
6 7 42
8 8 64
Total 33 36 241
!
!!! 𝑥! ∗ 𝑦! = 30 + 49 + 56 +42 + 64 =241
Note that a change in the position of the parentheses dramatically changes the results:
! !
! 𝑥! ∗ ! 𝑦! = 33 * 36 =1188
A similar kind of differentiation is made between 𝑥 ! and 𝑥 2. In the former the sum
would be 223, while the latter would be 332 or 1089.
Three exceptions to the general rule provide the foundation for some simplification and
statistical properties to be discussed later. The three exceptions are:
1. When the expression being summed contains a "+" or "-" at the highest
level, then the summation sign may be taken inside the parentheses. The rule
may be more concisely written:
14
Batangas State University
FUNDAMENTALS OF STATISTICS
! ! !
!!! 𝑥! + 𝑦! = !!! 𝑥! + !!! 𝑦!
5 6 11 -1
7 7 14 0
7 8 15 -1
6 7 13 -1
8 8 16 0
Total 33 36 69 -3
Note that the sum of the X+Y column (69) is equal to the sum of X (33) plus the sum of Y
(36). Similar results hold for the X-Y column.
2. The sum of a constant times a variable is equal to the constant times the
sum of the variable.
A constant is a value that does not change with the different values for the counter
variable, "i", such as numbers. If every score is multiplied by the same number and then
summed, it would be equal to the sum of the original scores times the constant. Constants
are usually identified in the statement of a problem, often represented by the letters "c" or
"k". If c is a constant, then, as before, this exception to the rule may be written in algebraic
form:
! !
!!! 𝑐 ∗ 𝑥! = c * !!! 𝑥!
For example, suppose that the constant was equal to 5. Using the example data produces
the result:
15
Batangas State University
FUNDAMENTALS OF STATISTICS
5 25
7 35
7 35
6 30
8 40
Total 33 165
Note that c * 33 = 165, the same as the sum of the second column.
!
! 𝑐= N+c
!
! 𝑐 = 8+8+8+8+8 = 5 * 8 =40
1. The expression to the right of the summation sign may be simplified using any of the
algebraic rewriting rules.
16
Batangas State University
FUNDAMENTALS OF STATISTICS
2. The entire expression including the summation sign may be treated as a phrase in the
language.
3. The summation sign is NOT a variable, and may not be treated as one (cancelled for
example.)
4. The three exceptions to the general rule may be used whenever applicable.
Example 1:
(𝑥 + 𝑦) + 𝑥 − 𝑦
𝑥
𝑥+ 𝑦+ 𝑥 − 𝑦
𝑥
2 𝑥
𝑥
=2
Example 2:
𝑥 ! + 2𝑥𝑦 + 𝑦 ! ) − (𝑥 ! − 2𝑥𝑦 + 𝑦 ! )
8∗ 𝑥𝑦
𝑥! + 2𝑥𝑦 + 𝑦! − 𝑥! + 2𝑥𝑦 − 𝑦!
8∗ 𝑥𝑦
2𝑥𝑦 + 2𝑥𝑦
8∗ 𝑥𝑦
2 2𝑥𝑦
8∗ 𝑥𝑦
4 𝑥𝑦
8∗ 𝑥𝑦
!
=!
17
Batangas State University
FUNDAMENTALS OF STATISTICS
Practice Exercises
Summation Notation
Problems
Data
i xi
1 1
2 2
3 3
4 4
1. Find
2. Find
Data
i xi
1 -1
2 3
3 7
4. Find
5. Find
Data
i xi yi
1 10 0
2 8 3
18
Batangas State University
FUNDAMENTALS OF STATISTICS
3 6 6
4 4 9
5 2 12
6. Find
7.
Find
8. Find
9. Find
1.3 Sampling
When you conduct quantitative research it is very important that your sample is a
representative of the population that you are studying.
Before we discuss the statistics of sampling, the two main approaches to sampling,
probability sampling and non-probability sampling, and their associated methods,
are discussed. Those sampling techniques based on probability involve some form of
random selection while non-probability sampling methods do not.
While both types of sampling approach are commonly used in research, probability
sampling has two main advantages:
19
Batangas State University
FUNDAMENTALS OF STATISTICS
(1) it helps to minimize (but not eradicate) sampling error; that is, the extent to
which our sample does not reflect the population; and
(2) it enables us to perform statistical analysis that, at specified levels of statistical
significance, allow us to make inferences from our sample to the population.
While there are a large number of probability sampling techniques that can be
used, four main methods include (1) simple random sampling, (2) systematic random
sampling, (3) stratified random sampling, and (4) cluster random sampling. In some
cases, a number of these techniques may be required in what is known as multi-stage
sampling.
D. Cluster Sampling
For the purpose of cluster random sampling an example of the 1000 students
is no longer applicable. This is because the cluster random sample is useful when the
population being studied is spread out geographically, perhaps across counties, states,
regions or countries. For example, when a general election is near, opinion-poll
organizations need to assess the general way that the population of a country will vote.
However, it would be unfeasible and unpractical to sample people from every state or
county, which is where cluster sampling helps. First, every state/county is assigned a
number. Then, a random sample of these states/countries is selected. The researcher can
then choose to perform another probability-based sampling method at the state/county
level to select those individuals to be polled.
In many research settings researchers draw on a variety of probability-based sampling
techniques in what becomes multi-stage sampling.
There are a wide variety of non-probability sampling techniques that can be used.
These techniques tend to be popular in student’s research because they are less costly and
time consuming. Two of the main techniques include (1) quota sampling and (2)
convenience sampling. Again, in order to discuss these two sampling methods we use the
example of 1000 students in a school from which a researcher needs to survey 200 of
them.
A. Quota sampling
Quota sampling is similar to stratified random sampling in the sense that our
population of students would also be divided into groups and a number from each group
would be sampled based on their relative frequency. However, it differs significantly from
stratified random sampling by not involving a random means of choosing which students
in each group should be sampled. Instead, the choice of which students from each group
should be selected is left to the researcher. While this inevitably saves considerable data
collection time, it does result in a number of potential biases, which may mean that the
sample selected is not representative of the population being studied.
B. Convenience sampling
Convenience sampling involves picking a sample that is simply available; that is,
convenient. Where researchers have limited funds they may choose to collect data from
the most accessible and cheapest source. For example, in selecting 200 students out of
1,000 students, it may be easier for the researcher to access those students that are 16
years old and above because parental consent to be involved in the research is not
necessary, which would otherwise result in the study taking longer to complete, as well as
require the purchase of 200 letters and their associated postage cost. However, while
convenient it would not be possible to make generalizations about the 1,000 students
from the sample of 200 students with any acceptable degree of accuracy.
Definition: The standard deviation of the distribution of sample means is called the
standard error of X. The standard error measures the standard amount of difference one
should expect between X and simply due to chance.
The sample size It should be intuitively reasonable that the size of a sample should
influence how accurately the sample represents its population. Specifically, a large sample
22
Batangas State University
FUNDAMENTALS OF STATISTICS
should be more accurate than a small sample. In general, as the sample size increases, the
error between the sample mean and the population mean should decrease. This rule is
also known as the law of large numbers.
CHAPTER TEST
II. For the following items, identify the type of data (quantitative or qualitative and the
level of measurement) on the characteristics of household-beneficiaries of the Pantawid
Program under the DSWD.
8. number of family members
9. highest educational attainment of the household head
10. sources of income
11. family form (nuclear, extended family..)
12. type of house dwelling (concrete, wood,...)
13. average family income
14. religious affiliation
23
Batangas State University
FUNDAMENTALS OF STATISTICS
i 1 2 3 4 5
x -1 2 1 0 5
y 0 2 3 -1 2
! !
2. !!! 𝑥! !!! 𝑦!
!
3. !!! 𝑦!
2
! ! ! !
4. ! 𝑥! + ! 𝑦!
!
5. 3 !!! 𝑥! + 𝑦!
24
Batangas State University
FUNDAMENTALS OF STATISTICS
Chapter 2
Descriptive Statistic
Once you have collected data, what will you do with it? Data can be described and
presented in many different formats.
In this chapter, you will study numerical and graphical ways to describe and
display your data. This area of statistics is called "Descriptive Statistics". You will learn to
calculate, and even more importantly, to interpret these measurements and graphs.
The purpose of putting results of experiments into graphs, charts and tables is two-
fold. First, it is a visual way to look at the data and see what happened and make
interpretations. Second, it is usually the best way to show the data to others. Reading lots
of numbers in the text puts people to sleep and does little to convey information. From an
educational standpoint, students at most levels are required to learn various data
presentation methods, and learning to graph data one has collected oneself from oneís
own experiments is considerably more engaging and motivating than learning to graph
using data is given by the teacher.
Learning Objectives
The aim of this section is for students to demonstrate how to organize and summarize
data and explain the graphical form or tabular presentation. Also, the learners should be
able to calculate numerical measures, such as central tendency, variability, and measures
of location and explain the derived numerical measures.
Definition 2.1.1: Raw data-data sheets are where the data are originally recorded.
Original data are called raw data. Data sheets are often hand drawn, but they can also be
printouts from database programs like Microsoft Excel. The printout is a blank with labels
for the variables and other necessary items of information.
25
Batangas State University
FUNDAMENTALS OF STATISTICS
Definition of Terms:
Range (R) = Highest value- lowest value
A frequency is the number of times a given datum occurs in a data set.
A relative frequency is the fraction of times an answer occurs. To find the
relative frequencies, divide each frequency by the total number. Relative
frequencies can be written as fractions, percent, or decimals.
Cumulative relative frequency is the accumulation of the previous relative
frequencies. To find the cumulative relative frequencies, add all the previous
relative frequencies to the relative frequency for the current row.
Class limits – are the lowest and highest data values for a class.
Class width – (largest entry – smallest entry) / number of classes
Class boundaries – are the average of the upper limit of one class and the lower
limit of the next class
Relative frequency distribution – is a table listing the relative frequencies
Percentage distribution – if each relative frequency is multiplied by 100%
Consider the given data below on the weights(in kgs) of 25 BS Accountancy students
enrolled in Business Statistics:
45 46 46 47 47 47 47
48 48 48 49 49 50 50
26
Batangas State University
FUNDAMENTALS OF STATISTICS
50 50 50 51 52 52 52
53 53 53 54 (n=25)
The steps in grouping a large set of data into a frequency distribution may be
summarized as follows:
1. Decide on the number of class intervals required or use the formula below to
determine the number of subclasses.
Number of classes (k) = 25 = 5
2. Determine the range.
Range (R) = Highest value – Lowest value
= 54 – 45 =9
3. Divide the range by the number of classes to estimate the approximate width of
the interval.
Class width (c ) = 9/ 5 = 1.8 ≈ 2
4. List the lower class limit of the bottom interval and then the lower class
boundary. Add the class width to the lower class boundary to obtain the upper class
boundary. Write down the upper class limit.
Table 1
Frequency distribution of BS Accountancy students based on their weights
Class Interval Observations Number of Observations
(Frequency)
45-46 45, 46, 46 3
47-48 47, 47, 47, 47, 48, 48, 48 7
49-50 49, 49, 50, 50, 50, 50, 50 7
51-52 51, 52, 52, 52 4
53-54 53, 53, 53, 54 4
n=25
From the given table above, the class interval of 45-46 is considered the lowest
interval while 57-59 as the highest interval. In the following classes: 45-46, 47-48, 49-50,
51-52 and 53-54; these numbers represent the beginning (lower limit) and end (upper
limit) of each class and so are known as the class limits for that class.
27
Batangas State University
FUNDAMENTALS OF STATISTICS
5. Determine the class marks of each interval by averaging the class limits or the class
boundaries.
Class Interval Number of Observations Class marks
(Frequency) (xi)
45-46 3 (45+46)/2= 45.5
47-48 7 (47+48)/2=47.5
49-50 7 ..
51-52 4 ..
53-54 4 (53+54)/2=53.5
n=25
6. Determine the cumulative frequencies (less than and greater than)
-for less than basis (<Cf), simply add the frequencies starting from the lowest class
interval to the highest interval
-for greater than basis (>Cf), add the frequencies starting from the highest interval
to the lowest interval
7. Determining the true class boundaries for each class, by dividing the difference between
upper limit and lower limit of two consecutive subclasses by two. The obtained value will
be subtracted from the lower limit and added to the upper limit of each class.
28
Batangas State University
FUNDAMENTALS OF STATISTICS
Example:
Suppose we collect data on the peso amount that each student in a class spent on
textbooks this semester. The 36 amounts are as follows:
205 233 195 214 225 247 198 186 202 236 227 214
226 231 257 207 221 188 218 225 245 208 197 232
190 186 204 162 215 226 186 207 236 275 220 205
A statistical graph is a tool that helps you learn about the shape or distribution of a
sample. The graph can be a more effective way of presenting data than a mass of numbers
because we can see where data clusters and where there are only a few data values.
Newspapers and the Internet use graphs to show trends and to enable readers to compare
facts and figures quickly.
Statisticians often graph data first in order to get a picture of the data. Then, more
formal tools may be applied.
Some of the types of graphs that are used to summarize and organize data are the dot
plot, the bar chart, the histogram, the stem-and-leaf plot, the frequency polygon (a type of
29
Batangas State University
FUNDAMENTALS OF STATISTICS
broken line graph), pie charts, and the boxplot. In this chapter, we will briefly look at the
different graphs.
To construct a bar graph we start with horizontal and vertical axes and label the quantity
being studied horizontally from left to right. The marking along the horizontal axis should
correspond to the limits of the classes in the above frequency distribution. The
corresponding frequency in each class is measured vertically upward. A vertical bar is
then drawn across each class interval with height equal to the frequency for that class.
Teams may choose from three types of bar charts, depending on the type of data they have
and what they want to stress:
Grouped bar charts divide data into groups within each category and show comparisons
between individual groups as well as between categories. (It gives more useful
information than a simple total of all the components.)
Stacked bar charts, which, like grouped bar charts, use grouped data within categories.
(They make clear both the sum of the parts and each group’s contribution to that total.)
30
Batangas State University
FUNDAMENTALS OF STATISTICS
Illustrations:
Consider the given table:
Table 1
Frequency distribution of DepEd Teachers based on their Financial Self-Efficacy
Level of FSE/(Class Number of Observations %
Interval) (Frequency)
Figure 1:
31
Batangas State University
FUNDAMENTALS OF STATISTICS
Step 1. Taking the data to be charted, calculate the percentage contribution for each
category. First, total all the values. Next, divide the value of each category by the total.
Then, multiply the product by 100 to create a percentage for each value.
Step 2. Draw a circle. Using the percentages, determine what portion of the circle will be
represented by each category. This can be done by eye or by calculating the number of
degrees and using a compass. By eye, divide the circle into four quadrants, each
representing 25 percent.
Step 3. Draw in the segments by estimating how much larger or smaller each category is.
Calculating the number of degrees can be done by multiplying the percent by 3.6 (a circle
has 360 degrees) and then using a compass to draw the portions.
Step 4. Provide a title for the pie chart that indicates the sample and the time period
covered by the data. Label each segment with its percentage or proportion (e.g., 25
percent or one quarter) and with what each segment represents (e.g., people who returned
for a follow-up visit; people who did not return).
Caution
Be careful not to use too many notations on the charts. Keep them as simple as possible
and include only the information necessary to interpret the chart.
Do not draw conclusions not justified by the data. For example, determining whether a
trend exists may require more statistical tests and probably cannot be determined by the
chart alone. Differences among groups also may require more statistical testing to
determine if they are significant.
Whenever possible, use bar or pie charts to support data interpretation. Do not assume
that results or points are so clear and obvious that a chart is not needed for clarity.
A chart must not lie or mislead! To ensure that this does not happen, follow these
guidelines:
32
Batangas State University
FUNDAMENTALS OF STATISTICS
Bar and pie charts can be used in defining or choosing problems to work on, analyzing
problems, verifying causes, or judging solutions. They make it easier to understand data
because they present the data as a picture, highlighting the results. This is particularly
helpful in presenting results to team members, managers, and other interested parties.
Bar and pie charts present results that compare different groups. They can also be used
with variable data that have been grouped. Bar charts work best when showing
comparisons among categories, while pie charts are used for showing relative proportions
of various items in making up the whole (how the "pie" is divided up).
2.3Line graph. Line graphs are used to show data points over time. Each line is for a
single treatment (independent variable). The x-axis shows the time interval and the y-
axis depicts the values of the dependent variable. The graph can have data points shown
(Graph A) or just the lines (as in Graph B, below).
33
Batangas State University
FUNDAMENTALS OF STATISTICS
3. Histogram –is plotted by using the class boundaries (y-axis) versus the frequency (x-
axis). The histogram differs from a bar chart in that bases of each bar are the class
boundaries rather than the class limits. The use of class boundaries for the bases
eliminates the spaces between the bars to give the solid appearance.
To construct a histogram, first decide how many bars or intervals represent the data.
Many histograms consist of from 5 to 15 bars or classes for clarity. Choose the starting
point to be
less than the smallest data value. A convenient starting point is a lower value carried out
to one more decimal place than the value with the most decimal places. For example, if
the value
with the most decimal places is 6.1, a convenient starting point is 6.05. We say that 6.05
has
more precision. If the value with the most decimal places is 2.23, a convenient starting
point is
2.225. Also, when the starting point and other boundaries are carried to one additional
decimal
place, no data value is likely to fall on a boundary.
Ϲ
zero frequency. These two points will enable us to connect both ends to the horizontal
axis, resulting in a polygon. We can obtain the frequency polygon very quickly from the
histogram by joining the midpoints of the tops of adjacent rectangles and then adding the
two intervals at each end.
6. Stem-and-Leaf Plot
One simple graph, the stem-and-leaf graph or stem plot, comes from the field of
exploratory data analysis. It is a good choice when the data sets are small. To create the
plot, divide each observation of data into a stem and a leaf. The leaf consists of one digit.
For example, 23 has stem 2 and leaf 3. Four hundred thirty-two (432) has stem 43 and
leaf 2. Five thousand four hundred thirty-two (5,432) has stem 543 and leaf 2. The
decimal 9.3 has stem 9 and leaf 3. Write the stems in a vertical line from smallest the
largest. Draw a vertical line to the right of the stems. Then write the leaves in increasing
order next to their corresponding stem.
Example 1
For Susan Dean's spring pre-calculus class, scores for the first exam were as follows
(smallest to largest):
33; 42; 49; 49; 53; 55; 55; 61; 63; 67; 68; 68; 69; 69; 72; 73; 74; 78; 80; 83; 88; 88; 88;
90; 92; 94; 94; 94; 96; 100
Stem-and-Leaf Diagram
Stem Leaf
3 3
35
Batangas State University
FUNDAMENTALS OF STATISTICS
4 2,9,9
5 3,5,5
6 1,3,7,8,8,9,9
7 2,3,4,8
8 0,3,8,8,8
9 0,2,4,4,4,6
10 0
The stem plot shows that most scores fell in the 60s, 70s, 80s, and 90s. Eight out of the
31 scores or approximately 26% of the scores were in the 90's or 100, a fairly high number
of As.
The stem plot is a quick way to graph and gives an exact picture of the data. You want to
look for an overall pattern and any outliers. An outlier is an observation of data that does
not fit the rest of the data. It is sometimes called an extreme value. When you graph an
outlier, it will appear not to fit the pattern of the graph. Some outliers are due to mistakes
(for example, writing down 50 instead of 500) while others may indicate that something
unusual is happening. It takes some background information to explain outliers. In the
example above, there were no outliers.
2.00 2 . 34
9.00 2 . 567788999
7.00 3 . 0023444
11.00 3 . 55566777889
11.00 4 . 00011223334
5.00 4 . 55568
18.00 5 . 000001112222334444
5.00 5 . 55677
1.00 6 . 0
1.00 6 . 8
36
Batangas State University
FUNDAMENTALS OF STATISTICS
7. Box-plot
Also called box-and-whisker plots or box-whisker plots give a good graphical image of
the concentration of the data. They also show how far the extreme values are from most
of the data. A box plot is constructed from five values: the minimum value, the first
quartile, the median, the third quartile, and the maximum value. We use these values
to compare how close other data values are to them.
37
Batangas State University
FUNDAMENTALS OF STATISTICS
Practice Exercises
length of time in months patients live once starting the treatment. Two researchers each
follow a different set of 40 AIDS patients from the start of treatment until their deaths.
The following data (in months) are collected.
Researcher 1: 3; 4; 11; 15; 16; 17; 22; 44; 37; 16; 14; 24; 25; 15; 26; 27; 33; 29; 35; 44; 13;
21; 22; 10; 12; 8; 40; 32; 26; 27; 31; 34; 29; 17; 8; 24; 18; 47; 33; 34
Researcher 2: 3; 14; 11; 5; 16; 17; 28; 41; 31; 18; 14; 14; 26; 25; 21; 22; 31; 2; 35; 44; 23; 21;
21; 16; 12; 18; 41; 22; 16; 25; 33; 34; 29; 13; 18; 24; 23; 42; 33; 29
Organize the Data
Complete the tables below using the data provided.
Researcher 1
Survival Length (in months)
Frequency
Relative Frequency
Cumulative Rel. Frequency
2. Below are scores in the Mathematics examination of fourth year students from
Batangas National High School
48 83 89 52 60 70 66 68 77 88 56
41 50 59 92 96 58 60 74 97 62 76
47 86 71 49 67 98 91 87 66 96 84
77 51 60 57 80 91
Class interval
48 83 89 52 60 70 66 68 77 88 56
41 50 59 92 96 58 60 74 97 62 76
47 86 71 49 67 98 91 87 66 96 84
77 51 60 57 80 91 96 100 49 48 50
55 56 62 69 75 86 76 79 84 98 92
49 58 79 86 59 66 69 68 78 81 85
Class interval Frequency <Cf >Cf <RCf >RCf
40
Batangas State University
FUNDAMENTALS OF STATISTICS
(6)-167 5 9 36 12.5
178-(8) 1 1 40 (20)
4.
177; 205; 210; 210; 232; 205; 185; 185; 178; 210; 206; 212; 184; 174; 185; 242;
188; 212; 215; 247; 241; 223; 220; 260; 245; 259; 278; 270; 280; 295; 275; 285;
290; 272; 273; 280; 285; 286; 200; 215; 185; 230; 250; 241; 190; 260; 250; 302;
265; 290; 276; 228; 265
41
Batangas State University
FUNDAMENTALS OF STATISTICS
ACTIVITY 1
Data Collection
The activity below will give the students an actual experience on data collection of their
classmates’ personal profile. Each student will be required to have information of at least
10 students in their class.
PERSONAL DATA SHEET
NAME: _________________________________________________
COURSE:________________________________________________
Weighted Average last Semester:________
Classification of Students: _____Regular _____Irregular
Age as of last birthday:
Religion:
Number of Members in the Family: ___3-5 ___6-8
___9-11 ___12 and above
Birth Order: ____First ____Second ____Third ____Fourth _____Others(Please
Specify)
Height: (in cm)_______
Weight: (in kgs) ________
Blood Type:_______
Daily Allowance:
_____below P100 _____P100-P149 _____P150-P199
_____P200-P249 _____P250-P299 _____P300 and above
Communication Network Use: (example Smart, Globe, etc.) ___________
Social Networking Site
Monthly Income of the Family:
_______below P10,000 _______P10,000-P14,999
_______P15,000-P19,999 _______P20,000-P24,999
_______P25,000-P29,999 _______P30,000 and above
Educational Attainment of Parents:
Father Mother
____Elementary Graduate ____Elementary Graduate
____HighSchool Undergraduate ____High School Undergraduate
____HighSchool Graduate ____High School Graduate
42
Batangas State University
FUNDAMENTALS OF STATISTICS
____Vocational ____Vocational
____College Undergraduate ____College Undergraduate
____College Graduate ____College Graduate
____with Masteral Degree ____with Masteral Degree
____with Doctoral Degree ____with Doctoral Degree
Occupation of Parents
Father Mother
__________ Self-employed __________
__________ Government employee __________
__________ non-Government employee __________
__________ Unemployed __________
Part II. Assess the familiarity of students of the following school officials:
School officials
School Officials Yes No
University President
Vice-president for Academic Affairs
Director of Office of Student Affairs
University Registrar
University Librarian
Accountant
Part III. Assess the level of satisfaction of the students on the following offices of the
university
Offices Highly Satisfied Moderatel Least
satisfied y satisfied satisfied
Library
IGP
ICT
TAO
Registrar
Cashier
Scholarship
43
Batangas State University
FUNDAMENTALS OF STATISTICS
Dean’s office
Practice Exercises
A. Graphical Presentation
1. 162 186 186 186 188 190 195 197 198 202 204 205
205 207 207 208 214 214 215 218 220 221 225 225
226 226 227 231 232 233 236 236 245 247 257 275
3. Compute for the less than and greater than basis of the given table below and then
construct the Ogives.
Packages Number of bags
(kgs)
120-129 14
110-119 46
100-109 58
90-99 76
80-89 68
70-79 62
60-69 48
50-59 22
40-49 6
n=
4. Test scores for a college statistics class held during the day are:
99; 56; 78; 55.5; 32; 90; 80; 81; 56; 59; 45; 77; 84.5; 84; 70; 72; 68; 32; 79; 90
Test scores for a college statistics class held during the evening are:
44
Batangas State University
FUNDAMENTALS OF STATISTICS
98; 78; 68; 83; 81; 89; 88; 76; 65; 45; 98; 90; 80; 84.5; 85; 79; 78; 98; 90; 79; 81; 25.5
Descriptive statistics are used to describe, or summarize, data in ways that are
meaningful and useful. For example, it would not be useful to know that all of the
participants wore blue shoes. However, it would be useful to know how spread out the
anxiety ratings was. Descriptive statistics is at the heart of all quantitative analysis.
So how do we describe data? There are two ways in which we describe data: measures of
central tendency and measures of variability, or dispersion
In describing a set of data, one must compute some of its numerical values. These
numerical values are descriptive measures. The most important and useful descriptive
measures are the measures of central tendency such as mean, median, and mode. The
focus of discussion in this section is the measures of central tendency for ungrouped data
or a set data with a total number of observations (N) less than or equal to 30.
Definition of Terms
1. Mean. The average score in the distribution. It is also called as arithmetic mean or
weighted mean and is denoted by (read as “x bar”).
2. Median. The middle score in the distribution. It is denoted by (read as “x curl”).
3. Mode. The most frequent score or commonly appearing score in the distribution. It
is denoted by (read as “x hut”). The kinds of mode are (a) No mode – mode does not
exist, (b) Unimodal – single mode, (c) Bimodal – two modes, (d) Trimodal – three
modes, and so on.
45
Batangas State University
FUNDAMENTALS OF STATISTICS
x̄ = (x1 + x2 + x3 + … + xn) / N
where:
x1 = first observation
x2 = second observation
x3 = third observation
.
.
.
xn = last observation
N = total number of observations.
where:
x1 , x2 , x3 , … , and xn = entries or scores
w1 , w2 , w3 , … , and wn = number of times the individual
entry occur
2. Median
The following Steps can be considered in finding the median.
(a) Arrange the raw scores from highest to lowest or vice versa.
(b) Locate the middle score/s. For odd raw scores, the middle score is the
median. For even raw scores, get the sum of the two middle scores then divide
that by 2 in which the result is the median.
3. Mode
Just find out the most frequent score in the given raw data. The most frequent score is
the mode.
46
Batangas State University
FUNDAMENTALS OF STATISTICS
Solution: (Note: Have the students show solutions to the above problem, then
check whether they correctly apply/follow the principles/techniques involved. Let
them discuss their answer in class.)
Mean = ?
x̄ = (31 + 54 + 85 + 19 + 27 + 73 + 88) ÷ 7
= 377 ÷ 7
x̄ = 53.86. Thus, the obtained value of 53.86 is the mean of the set of coded
defected computers in a general inventory by a technician.
Median = ?
Arrange the codes from highest to lowest or vice versa.
88
85
73
54
31
27
19
Locate the middle code. Thus, = 54 is the middle coded of the set of defected
computers in a general inventory by a technician.
Mode = ?
There is no mode. Thus, the mode does not exist in the given set of coded defected
computers in a general inventory by a technician.
Practice Exercises
Consider the following sets of data to compute for mean, median, and mode and
interpret the obtained numerical values.
1. Ages of 12 faculty members in the CAS department of a certain University.
25 40 27 31
30 34 26 36
35 38 28 39
47
Batangas State University
FUNDAMENTALS OF STATISTICS
4 6 6 5 7 7 6 6 5 5
8 5 8 2 1 0 0 4 2 3
6 5 5 5 4 6 5 4 6 4
3 8 9 6 7 4 1 9 3 5
11 7 22 35 9 8 23 35 32 47
50 32 25 20 18 45 18 28 33 26
15 46 29 28 44
Sigma Notation
For a given universe, suppose a variable, say X. We may denote the first value as x1, the
second x2, and so on. In general, xi is the observation for variable X made on the ith
individual.
Given a set of N observations represented by x1, x2, …, xn, we express their sum as
n
∑xi = x1 + x2 + x3 + … + xn
i=1
where:
∑ = summation symbol
i = index of the summation
xi = summand
1 = lower limit of the index
48
Batangas State University
FUNDAMENTALS OF STATISTICS
= 8(x3 + x4 + x5 + x6 + x7) .
2. If c is a constant, then
n
∑c = nc .
i=1
Example 2:
10
∑5 = 10(5)
i=1
= 50 .
Example 3:
3
∑(2xi – 4yi) = ? for x1 = 9 , x2 = 11 , x3 = 4
i=1 y1 = -8 , y2 = 3 , y3 = 0 .
Solution:
3 3 3
∑(2xi – 4yi) = 2∑xi – 4∑yi
i=1 i=1 i=1
Practice Exercises
7
a) Expand ∑(xi + 3) in the simplest form.
i=1
x̄ = ∑fM / N
where:
f = frequency
M = midpoint
N = total number of frequency
Practice Exercises
(a) Sales in Php of computer store owners for the months of June and July in a day.
Sales frequency
7,000 – 7,999 3
8,000 – 8,999 7
9,000 – 9,999 15
10,000 – 10,999 12
11,000 – 11,999 8
Compute the arithmetic mean.
2.2 Skewness
It is the degree of a symmetry or departure from symmetry of a distribution. The
useful formulas are given below.
Sk1 = (Mean – Mode) / SD
Sk2 = 3(Mean – Median) / SD
2. Negatively Skewed – skewed to the left. The numerical value of a mean is less than
the values of median and mode. There are more low scores in a distribution. The value
of skewness is negative.
52
Batangas State University
FUNDAMENTALS OF STATISTICS
Measures of Skewness
1. Quartile Coefficient of Skewness (SQC)
SQC = (Q3 – 2Q2 + Q1) / (Q3 – Q1)
Practice Exercises
Consider the data below and then do what is asked in each case or situation.
A. Given the set of scores: 2 , 7 , 4 , 7 , 8 , 9 , 5.
1. Determine the skewness(Sk1).
2. Find SQC and SPC .
B. Suppose a committee has 15 members and the heights in centimeter are as follows:
181 205 189 185 190 191 191 200 192 185 186 188
181 202 186.
Find the following:
3. Skewness (Sk2)
4. SQC
5. SPC
2.4 Measures of the Spread of Data
1. Quartiles – are values that divide the set of data into 4 equal parts. These values are
denoted by Q1 , Q2 , and Q3 in which the 25% of the data falls below Q1 , 50% falls
below Q2 , and 75% falls below Q3.
2. Deciles – are values that divide the set of data into 10 equal parts. These values are
denoted by D1 , D2 , D3 , … , and D9 in which the 10% of the data falls below D1 , 20%
falls below D2 , 30% falls below D3 , … , and 90% falls below D9.
53
Batangas State University
FUNDAMENTALS OF STATISTICS
3. Percentiles – are values that divide the set of data into 100 equal parts. These values
are denoted by P1 , P2 , P3 , … , and P99 in which the 1% of the data falls below P1 , 2%
falls below P2 , 3% falls below P3 , … , and 99% falls below P99.
Equivalent Values:
a) Q1 = P25
b) Q2 = D5 = P50 = Median
c) Q3 = P75
2. Mean Absolute Deviation – is the mean of the absolute deviations from the average
score taken in a given set of data. It is denoted by MAD.
4. Standard Deviation – is the square root of the mean of squared deviations from the
average score in a given distribution. It is considered as the most important and
reliable measure of dispersion and is denoted by SD.
7. Variation Ratio – it tells how homogeneous or heterogeneous the given data are. This
is denoted by VR.
N ≤ 30
1. Quartiles
Steps:
Qn = nN ÷ 4
where:
n = quartile rank
1.3 Round off the result after using the formula in Step 1.2 to the nearest whole
number. Locate the quartile to the array using the rounded off value by starting
to count from the lowest score going up.
2. Deciles
Steps:
Dn = nN ÷ 10
where:
n = decile rank
2.3 Round off the result after using the formula in Step 2.2 to the nearest whole
number. Locate the decile to the array using the rounded off value by starting
to count from the lowest score going up.
3. Percentiles
Steps:
Pn = nN ÷ 100
where:
n = percentile rank
3.3 Round off the result after using the formula in Step 3.2 to the nearest whole
number. Locate the percentile to the array using the rounded off value by
starting to count from the lowest score going up.
4. Range
R = HV – LV
where:
HV = highest value
LV = lowest value
where:
x̄ = mean
N = total number of observations
6. Quartile Deviations
IR = Q3 – Q1
where:
Q1 = first quartile
Q3 = third quartile
where:
Q1 = first quartile
Q3 = third quartile
56
Batangas State University
FUNDAMENTALS OF STATISTICS
DD = D9 – D1
where:
D1 = first decile
D9 = ninth decile
PD = P90 – P10 or PD = DD
where:
7. Standard Deviation
s= !
!!!(𝑥! − x ̄ )! /(𝑁 − 1)
where:
x ̄ = mean
8. Variance
n
s 2 or V = ∑(xk –x ̄)2/(N – 1)
k=1
where:
x ̄= mean
7. Coefficient of Variation
CV = s/ x ̄
where:
57
Batangas State University
FUNDAMENTALS OF STATISTICS
s = standard deviation
x ̄= mean
8. Variation Ratio
VR = 1 – HV/No
where:
HV = highest value
Solution: (Note: Have the students show solutions to the above problem, then check whether they correctly apply/follow the
principles/techniques involved. Let them discuss their answer in class.)
Mean = ?
x ̄= (31 + 54 + 85 + 19 + 27 + 73 + 88) ÷ 7
= 377 ÷ 7
x ̄= 53.86 Ans.
Median = ?
88
85
73
54 Median code
31
27
19
Mode = ?
58
Batangas State University
FUNDAMENTALS OF STATISTICS
Q2 = ?
88
85
73
54 Q-2
31
27
19
Count 4 from the lowest code going up. The one that is marked is the second quartile.
Thus, Q2 = 54.
D7 = ?
88
85
73 D7
54
31
27
19
Count 5 from the lowest code going up. The one that is marked is the seventh decile. Thus,
D7 = 54.
P41 = ?
59
Batangas State University
FUNDAMENTALS OF STATISTICS
88
85
73
54
31 P41
27
19
Count 3 from the lowest code going up. The one that is marked is the 41st percentile. Thus,
P41 = 31.
Range = ?
R = HV – LV
= 88 – 19
R = 69 Ans.
MAD = ?
MAD = ∑|xk – x ̄| / N
k =1
x ̄= 53.86
xk |xk –x|
88 34.14
85 31.14
73 19.14
54 0.14
31 22.86
60
Batangas State University
FUNDAMENTALS OF STATISTICS
27 26.86
19 34.86
MAD = 169.14/7
Quartile Deviations
a) IR = ?
IR = Q3 – Q1
Q3 = ?
Q1 = ?
88
85
73 Q3
54
31
27 Q1
19
Count 5 from the lowest code going up. The one that is marked is the third quartile. Count
2 from the lowest code going up. The one that is marked is the first quartile. Thus, Q3 = 73
and Q1 = 27.
IR = 73 – 27
IR = 46 Ans.
61
Batangas State University
FUNDAMENTALS OF STATISTICS
b) SIR = ?
SIR = IR/2
= 46/2
SIR = 23 Ans.
c) DD = ?
DD = D9 – D1
D9 = ?
D1 = ?
88
85 D9
73
54
31
27
19 D1
Count 6 from the lowest code going up. The one that is marked is the ninth decile. Count 1
from the lowest code going up. The one that is marked is the first decile. Thus, D9 = 85
and D1 = 19.
DD = 85 – 19
DD = 66 Ans.
d) PD = ?
PD = P90 – P10 or PD = DD
PD = 66 Ans
s=?
62
Batangas State University
FUNDAMENTALS OF STATISTICS
s= !
!!!(𝑥! − x ̄ )! /(𝑁 − 1)
(xk –x) 2
1165.54
969.70
366.34
0.02
522.58
721.46
1215.22
s = 4,960.86/(7 − 1)
s = 4,960.86/6
s = 826.81
s = 28.75 Ans.
s2 = ?
s2 = (s)2
s2 = (28.75)2
s2 = 826.56 Ans.
CV = ?
CV = SD/x
CV = 28.75/53.86
CV = 0.53 Ans.
63
Batangas State University
FUNDAMENTALS OF STATISTICS
VR = ?
VR = 1 – HV/No
VR = 1 – 88/377
VR = 1 – 0.23
VR = 0.77 Ans.
Example 2. The estimated radiation levels in milliroentgens per hour are as follows: 0.08,
0.22 , 0.34 , 0.13 , 0.25 , 0.31 , 0.10 , 0.13 , 0.08 , and 0.20 in the display areas of 10
computer stores in a certain City. Compute the measures of central tendency, Q3 , D6 , P70
, Range, MAD, SIR, SD, V, CV, and VR.
Solution: (Note: Have the students show the solution to the above problem, then check whether they correctly apply/follow the
principles/techniques involved. Let them discuss their answer in class.)
Mean = ?
Median = ?
Arrange the estimated radiation levels from highest to lowest or vice versa.
0.34
0.31
0.25
0.22
0.20
0.13
0.13
0.10
0.08
0.08
64
Batangas State University
FUNDAMENTALS OF STATISTICS
Locate the two middle estimated radiation levels. Add them and then divide the result by
2.
= 0.33 ÷ 2
Mode = ?
𝑥1 = 0.13
𝑥2 = 0.08
Q3 = ?
Arrange the estimated radiation levels from highest to lowest or vice versa.
0.34
0.31
0.25 Q3
0.22
0.20
0.13
0.13
0.10
0.08
0.08
Count 8 from the lowest estimated radiation level going up. The one that is marked is the
third quartile. Thus, Q3 = 0.25.
D6 = ?
nN/10 = 6(10)/10 = 6
65
Batangas State University
FUNDAMENTALS OF STATISTICS
Arrange the estimated radiation levels from highest to lowest or vice versa.
0.34
0.31
0.25
0.22
0.20 D6
0.13
0.13
0.10
0.08
0.08
Count 6 from the lowest estimated radiation level going up. The one that is marked is the
sixth decile. Thus, D6 = 0.20.
P70 = ?
Arrange the estimated radiation levels from highest to lowest or vice versa.
0.34
0.31
0.25
0.22 P70
0.20
0.13
0.13
0.10
0.08
0.08
Count 7 from the lowest estimated radiation level going up. The one that is marked is the
70th percentile. Thus, P70 = 0.22.
66
Batangas State University
FUNDAMENTALS OF STATISTICS
Range = ?
R = HV – LV
R = 0.34 – 0.08
R = 0.26 Ans.
MAD = ?
n
MAD = ∑|xk –x| / N
k =1
x = 0.18
xk |xk –x|
0.34 0.16
0.31 0.13
0.25 0.07
0.22 0.04
0.20 0.02
0.13 0.05
0.13 0.05
0.10 0.08
0.08 0.10
0.08 0.10
MAD = 0.80/10
SIR = ?
67
Batangas State University
FUNDAMENTALS OF STATISTICS
Q3 = ?
Arrange the estimated radiation levels from highest to lowest or vice versa.
0.34
0.31
0.25 Q3
0.22
0.20
0.13
0.13
0.10 Q1
0.08
0.08
Count 8 from the lowest estimated radiation level going up. The one that is marked is the
third quartile. Thus, Q3 = 0.25.
Q1 = ?
Count 3 from the lowest estimated radiation level going up. The one that is marked is the
first quartile. Thus, Q1 = 0.10.
= 0.15/2
s=?
s= !
!!!(𝑥! − x ̄ )! /(𝑁 − 1)
68
Batangas State University
FUNDAMENTALS OF STATISTICS
(xk –x)2
2.56 x 10-2
1.69 x 10-2
4.90 x 10-3
1.60 x 10-3
4 x 10-4
2.50 x 10-3
6.40 x 10-3
1 x 10-2
1 x 10-2
s = 0.0808/(10 − 1)
s = 0.0808/9
s = 0.00898
s2 = ?
s2 = (s)2
s2 = (9.48 x 10-2)2
CV = ?
CV = SD/x)
CV = 9.48 x 10-2/0.18
CV = 0.53 Ans.
VR = ?
69
Batangas State University
FUNDAMENTALS OF STATISTICS
VR = 1 – HV/No
VR = 1 – 0.34/1.84
VR = 1 – 0.18
VR = 0.82 Ans.
Practice Exercises
Ungrouped Data (N ≤ 30). Consider the following sets of data to compute for mean, median,
mode, Q3 , D8 , P45 , range, MAD, s, s2 , CV, VR, PD, DD, IR, and SIR.
1. Ages of 12 faculty members in the College of Arts and Sciences department of a certain
University.
25 40 27 31
30 34 26 36
35 38 28 39
48 65 68 52 71 70 60 64 52 53
63 58 59 56 47 64 51 49 63 45
11 7 22 35 9 8 23 35 32 47
50 32 25 20 18 45 18 28 33 26
70
Batangas State University
FUNDAMENTALS OF STATISTICS
15 46 29 28 44
CHAPTER TEST
A. Identification
Identify the terms that best describe the sentences below.
1. Statistical table that can be obtained if you group the observations into non-overlapping
classes and show the member of observations occurring in each class.
2. The original score obtained when scoring a test.
3. A type of the graph that makes use of a geometric figure, usually a circle representing a
whole and is divided into parts whose size is proportional to their values.
4. Proportion of observations falling in a class, obtained by dividing the class frequency by
the total number of observations.
5. They remove discontinuity between classes and consider the true range of values.
6. These are the average of the upper limit of one class and the lower limit of the next
class.
7. The ___ of a class is the total of all class frequencies up to and including the present
class.
8. The difference between the largest and lowest value in a given sample.
9. It gives us the number of occurrence of a measurement
10. It is the difference between two consecutive lower class or two consecutive higher class
boundaries.
11. It consists of horizontal scales for values of data being represented.
12. It is obtained by taking the average of the lower and upper limit of a given subclass.
13. The value obtained by dividing the range by the class interval.
B. Multiple Choice
Choose the letter corresponding to the correct answer for each item.
1. The measure of central tendency that denotes the most popular value in a set of observations is
called:
a. mean b. median
c. mode d. cannot be determined
71
Batangas State University
FUNDAMENTALS OF STATISTICS
4. The measure of central tendency that is directly affected by extreme values is the:
a. mean b. mode
c. range d. median
7. Consider a set of test scores from last year's Stat 103 final examination. Suppose that the data
were doubled, which of the following were doubled, which of the following will change? Median,
mean or standard deviation
9. Based on the daily Farm prices Survey by the Bureau of Agricultural Statistics, yesterday price
(in pesos) of galunggong according to 5 stall owners are: 30, 75, 80, 75, 75 per kilo. For this data
set, the most appropriate measure of central tendency is:
a. mean b. median
c. mode d. range
10. Which of the following does the mean is not an appropriate measure of central tendency?
a. average daily temperature in Batangas City
72
Batangas State University
FUNDAMENTALS OF STATISTICS
11. Paul got a final grade in his three subjects as follows: 1.5, 2.0, 1.25 which are 3, 5 and 2 units
respectively. Find the weighted average of Paul.
a. 1.58 b. 1.8
c. 1.7 d. 1.65
12. The following are temperature in degree centigrade of key cities in the world:
25.5, 30.2, -10.4, 20.5, -0.4, 13.2, -5.4, 21.4, -4.0
What is the median temperature?
a. -5.4 b. 21.2
c. 13.2 d. 30.2
13. The average weight of 10 contestants in the supermodel search is 114 pounds. If 9 contestants
have weights of 101, 125, 118, 128, 106, 115, 99, 118, and 109 pounds. What must be the other
weights in pounds?
a. 111 b. 121
c. 120 d. 131
15. The monthly salaries of a sample of 225 NSCB employees in Makati City ranged from as low as
P7,041 to as high as P24, 548. The FDT(frequency distribution table) of these monthly salaries has
a class size equal to:
a. 1091.1 b. 1408.2
c. 1167 d. 1636
7-9 2
10-12 10
13-15 24
16-18 43
19-21 50
73
Batangas State University
FUNDAMENTALS OF STATISTICS
Total
2. Find the following based on the student’s scores on a 15-point test presented in the given
distribution below: (5points)
Scores Number of students
70-72 1
67-69 4
64-66 8
61-63 5
58-60 2
Solution:
Standard deviation, cv, P90-P10, Q3-Q1
Compute for the skewness and kurtosis
74
Batangas State University
FUNDAMENTALS OF STATISTICS
Chapter 3
Probability
In mathematics itself, the fundamental principles or theories of probability provide the
foundation of Statistics. Probability is widely used in everyday life. The concepts of
probability can be useful to the students in making decisions when they do not know for
sure what the outcome will be. They should have a better understanding of language
patterns needed in the study of probability. Let us try to consider some of these principles
involving probability.
Learning Objectives
The aim of this section is for the students to clarify the rules on probability and familiarize
with the special types of discrete probability distributions such as binomial, hyper
geometric, and Poisson.
n1 · n2 · … · nk ways .
Example 1.
How many 2-digit numbers could be formed from the digits 2, 3, 4, 5, 6, 7, and 8, if
repetition is not allowed ?
Solution:
n1 = 7
n2 = 6
n1 · n2 = 7 · 6
= 42 numbers.
75
Batangas State University
FUNDAMENTALS OF STATISTICS
By definition 0! = 1.
Example 1.
Solution:
a) 1! = 1
b) 2! = 2(1) = 2
c) 3! = 3(2)(1) = 6
d) 4! = 4(3)(2)(1) = 24
Example 2.
Compute 8! · 3! .
Solution:
8! · 3! = 8(7)(6)(5)(4)(3)(2)(1)·(3)(2)(1)
= 241,920 Ans.
Example 3.
Compute 4!/2! .
Solution:
4!/2! = 4(3)(2!)/2!
= 12 Ans.
Example 4.
Simplify (n + 1)!/n!.
Solution:
= (n + 1)(n!) / n!
76
Batangas State University
FUNDAMENTALS OF STATISTICS
= (n + 1) Ans.
3.3 Permutations
Example 1.
In the word THEORY, how many different arrangements of letters can be made ?
Solution:
6P6 = 6!
= 6(5)(4)(3)(2)(1)
= 720 arrangements.
Example 1.
In how many ways can a teacher assign the 4 key positions to organize a classroom
activity to 7 equally qualified students ?
Solution:
n=7
r =4
= 7! / 3!
= (7)(6)(5)(4)(3!) / 3!
= 840 ways.
3. Circular Permutation
(n – 1) P(n – 1) = (n – 1)!
Example 1.
How many ways can 7 players be standing inside a circle with seven markings ?
Solution:
(7 – 1) P(7 – 1) = (7 – 1)!
= 6!
= (6)(5)(4)(3)(2)(1)
= 720 ways.
where:
N = n1 + n2 + … + nk
Example 1.
How many permutations of the letters in the word PROBABILITY can be made?
Solution:
P = 11!/(1! · 1! · 1! · 2! · 1! · 2! · 1! · 1! · 1!)
= (11)(10)(9)(8)(7)(6)(5)(2)(3)
= 9,979,200 permutations.
3.4 Combinations
Combination is a selection of objects with no attention given to the order of the objects.
n Cn =1
Example 1.
Solution:
78
Batangas State University
FUNDAMENTALS OF STATISTICS
8 C8 = 1 committee.
n Cr = n!/(n – r)!r!
Example 1.
The TIMES Organization is forming a group of seven to be made up of four from the
males and three from the females. How many ways are there of selecting the group if
seven nominees come from the males and six nominees come from the females ?
Solution:
Males Females
n=7 n=6
r =4 r =3
= (7!/3!4!)(6!/3!3!)
= (35)(20)
= 700 ways
3. Combination in a Series
n C1 + nC2 + … + nCr = 2n – 1
Example 1.
In how many ways can a president of the class assign at most five of his classmates to
clean the room every Friday?
Solution:
2n – 1 = 25 – 1
= 32 – 1
= 31 ways.
Example 1.
Tossing a coin 8 times. If for the first eight tosses, the head turns up six times, then what
is the probability that the head occurs ?
Solution:
P(head) = 6/8
= 3/4
= 0.75 or 75%
Expected Probability. Probability may also be expressed as the ratio of the number of
desired outcomes to the total number of possible expected outcomes in an experiment.
Example 1.
Consider a single die. What is the probability of getting 2 dots ?
Solution:
Note:
For convenient and easy recall of the formula, experimental probability and expected
probability can be denoted by P(E) and it is defined by the formula,
Marginal Probability. The term refers to a probability of the occurrence of a single event
or an event satisfying only one characteristic.
Example 1.
Let us consider a card being picked in ordinary well-shuffled cards. The probability of a
queen is called a marginal probability.
Joint Probability. The term refers to a probability of two events occurring simultaneously
in a single trial. It is also the probability of one event satisfying two or more
characteristics. Let P(E1 ∩ E2) denote the joint probability of E1 and E2 .
Example 1.
Let us consider a card being picked in ordinary well-shuffled cards. The probability of a
card that is both a jack and a diamond is a joint probability.
80
Batangas State University
FUNDAMENTALS OF STATISTICS
Mutually and Non-mutually Exclusive Events. Two events E1 and E2 are mutually
exclusive if it is impossible for both E1 and E2 to occur simultaneously in a single trial, i.e.,
the joint probability of E1 and E2 is zero. If E1 and E2 can occur simultaneously, in a single
trial, then they are not mutually exclusive events.
Example 1.
Let us consider a card being drawn in the ordinary well-shuffled cards. The event of a king
and the event of a queen are mutually exclusive while the event of a jack and the event of a
diamond are non-mutually exclusive.
Conditional Probability. The probability that an event E2 will occur given that some event
E1 has already occurred is called conditional probability, symbolized by
Example 1.
The Lot owner estimates that the probability that he will sell Lot A is 0.86, the probability
that he will sell Lot B is 0.80, and the probability that he will sell both Lots A and B is
0.48. What is the probability that the owner will sell Lot B, given that he already sold Lot
A?
Solution:
= 0.48 / 0.86
Conjunction Probability. The term is associated with events happening together, meaning
one event and another event occurring at the same time. Events, however, may be
independent or dependent on each other.
When the occurrence of one event does not influence the probability of the occurrence of
the other event, these events are said to be independent. Referring back to our knowledge
of sets, these are the counterparts of disjoint sets.
If we would like to find the probability that two or more independent events will happen,
we follow the formula:
Example 1.
The probability that Rose will win a contest is 40% and the probability that May will win
in another contest is 60%. What is the probability that Rose and May will win ?
81
Batangas State University
FUNDAMENTALS OF STATISTICS
Solution:
P(Rose and May will win) = P(Rose will win) · P(May will win)
= (0.40)(0.60)
Disjunction Probability. The term is associated with several events that happen either
separately or simultaneously. It is concerned with “either-or” relationships.
The probability that one or the other event will occur is equal to the sum of their
individual probabilities. It is represented by P(E1 or E2) = P(E1) + P(E2).
Example 1.
What is the probability that in a single toss of two dice, the sum will be 8 or 10 ?
Solution:
Sample points which give the sum of 8 are: (3 , 5), (5 , 3), (2 , 6), (6 , 2), and
(4 , 4). Thus, the total number of sample points in the sample space that give the sum of 8
is 5.
Sample points which give the sum of 10 are: (6 , 4), (4 , 6), and (5 , 5). Thus, the
total number of sample points in the sample space that give the sum of 10 is 3.
= 5/36 + 3/36
= 8/36
Practices Exercises
Solve the following problems completely.
A. Principles of Counting
1. There are five consecutive odd numbers from 1 to 9. How many different 3-digit
numbers can be formed?
2. How many different 5-digit numbers can be formed from the 9 digits 1, 2, 3, 4, 5,
6, 7, 8, and 9?
3. If the bowling coach has already selected the team of six members, then how
many ways can he prepare a throwing order?
82
Batangas State University
FUNDAMENTALS OF STATISTICS
4. Obtain the number of different arrangements that can be formed from five
Identification Cards?
5. In how many ways can a fly enter the house by one window and leave by a
different window if five windows are to be kept open?
6. A newly opened building has 8 doors providing access, in how many ways can a
president enter the building by one door and leave by a different door?
7. How many different arrangements, each consisting of five different letters, can
be formed from the letters of the word “counters” if each arrangement is to
begin and end with a vowel?
8. Suppose that 4 black umbrellas and 4 white umbrellas are arranged in a row.
How many different arrangements of 8 umbrellas can be made in a row, if
umbrellas (a) of the same color are to be kept together and (b) of alternate color
are considered?
9. How many ways a red ball and a blue ball can be selected in an urn with 10 balls
of different colors?
10. How many possible 4-digit numbers can be made up from the integers 4, 3, 2,
1, 8 if no integer is to be repeated?
B. Factorial Notation
a) 5! b) 6! c) 7!
d) 8! e) 9!
a) n! / (n + 3)!
b) (n + 2)! / n!
I. Permutations
1. How many motor vehicle number plates can be made (a) for old plate numbers,
if each plate contains 3 different letters followed by 3 different digits? ; (b) for
old plate numbers, if the first digit cannot be zero? ; (c) for new standardized
plate numbers, if each plate contains 2 different letters followed by 4 different
digits?; and (d) for new standardized plate numbers, if the first digit cannot be
zero?
83
Batangas State University
FUNDAMENTALS OF STATISTICS
2. (a) Obtain the number of ways in which 4 students can sit in a row. (b) How
many ways are there if two of the students insist on sitting next to one another?
3. How many ways can 5 candles of different colors be placed in a circular holder?
4. There are 7 letters in the word COMPANY. How many 4 letter words can be
permuted of (a) different letters?, (b) consonants only?, (c) begin and end in a
consonant?, (d) begin with a vowel?, (e) contain the letter C?, (f) begin with M
and end in a vowel?, (g) begin with N and also contain Y?, and (h) contain both
vowels?
5. How many different marks, each consisting of 7 asterisks marked in a letter can
be made from 3 black, 1 red, and 3 blue asterisks?
6. Determine the number of permutations that can be formed from all the letters of
each country’s name: Mississippi, Tennessee, Alaska, and Philippines.
7. How many ways can 9 teachers be seated on a bench if there are only 5 seats
available?
8. Determine the number of ways in which 7 stars of different sizes are hung in a
circle with strings
10. How many positive integers with 2 different digits in which 0 cannot be the first
digit that are (a) less than 50?, (b) greater than 80?, (c) divisible by 2?, (d)
even?, (e) odd?, and (f) in all?
II. Combinations
1. Determine the number of ways can a group of 4 boys and 3 girls be selected from 6
boys and 5 girls
4. Find the number of ways a coach can make a choice of one or more players from 8
eligible players?
5. A group has 8 businessmen and 3 housewives. (a) In how many ways can the group
manager choose a representative of 5? (b) How many of them will contain at least one
housewife? (c) How many of them will contain exactly one housewife?
84
Batangas State University
FUNDAMENTALS OF STATISTICS
6. (a) In how many ways can a sales lady with 15 associates invite 7 of them to attend the
party?, (b) if 3 of them are neighbors from far away and will not attend separately?, and
(c) if 3 of them will not attend together?
7. The points N, O, P, Q, R, S, T, U, V, and W are in a plane for which only two are on the
same line. (a) How many lines are determined by the given points? (b) How many of these
lines do not pass through S or T? (c) How many quadrilaterals are formed by the given
points? (d) How many of these quadrilaterals contain the point R? (e) How many of these
quadrilaterals contain the side VW?
8. A person is to guess the content of 8 out of 11 small boxes. (a) How many guesses has
he? (b) How many if he must guess the content of the first 2 boxes? (c) How many if he
must guess the content of the first or second box but not both? (d) How many if he must
guess exactly 3 the content of the first 5 boxes? (e) How many if he must guess at least 3
the content of the first 5 boxes?
9. A bingo player has 5 different cards. How many ways can he select different cards?
10. There are 26 letters in the English alphabet of which 21 are consonants. How many 7
letter words can be formed (a) if there are 4 different consonants and 3 different vowels?,
(b) if these contain the letter b ?, and (c) if these contain the letters d and n?
1. A coin is tossed three times. What is the probability that at least 1 tail will occur?
2. A die is loaded in such a way that an odd number is twice as likely to occur as an even
number. If E is the event that a number less than 5 occurs on a single toss of the die, then
what is the P(E)?
3. Seven different designs of plates are placed at random in a dish drainer with space for
each plate. What is the probability that the rose and the fruit designs will be next to each
other?
4. A paper clip is picked up from a small box with 4 yellow clips, 3 green clips, and 7
white clips. Obtain the probability of each of the following events: (a) the clip is yellow;
(b) the clip is green; and (c) the clip is either green or white.
6. A basket contains 3 orange balls, 2 blue balls, and 1 green ball. Another basket
contains 8 white balls, 2 red balls, and 2 brown balls. What is the probability of obtaining
a blue ball from the first basket and a white ball from the second basket in a single draw
from each basket?
7. From a deck of 52 cards, a single card is being drawn. Obtain the probability of each of
the following events: (a) queen, (b) club, and (c) a king or a heart.
8. When a pair of dice is rolled, what is the probability of getting each of the following
events: (a) sum of 5 and (b) sum of 8?
85
Batangas State University
FUNDAMENTALS OF STATISTICS
9. In a row of 7 circles drawn on the floor, the 7 players are assigned to stand at random
order. Determine the probability that the 3 players will be standing next to each other.
10. The data below shows the random distributions of freshman students according to
gender and course where they enrolled in a certain University
Elementary Education 10 25
Secondary Education 8 18
Information Technology 22 32
Computer Science 15 31
Accounting Management 25 14
What is the probability if the chosen one is: (a) a male?, (b) a female?, (c) a male
Accounting Management?, (d) a female Computer Science?, (e) an Accounting
Management?, (f) a male Secondary Education?, (g) a male Information Technology?,
and (h) an Elementary Education?
11. Data below shows the gender distribution and employment status of 500 persons
selected at random.
Male 220 80
Female 40 160
a) What is the probability that the man is chosen given that he is employed?
b) What is the probability that the woman is chosen given that she is not
employed?
12. The regular schedule of a bus in a certain terminal that leaves on time has probability
P(L) = 0.85, that arrives on time has probability P(A) = 0.96, and that leaves and arrives
on time has probability P(LÇA) = 0.77. Determine the probability that a bus (a) arrives on
time given that it left on time and (b) leaves on time given that it arrived on time.
13. Consider the data on educational attainment of 300 persons selected at random.
86
Batangas State University
FUNDAMENTALS OF STATISTICS
Elementary 56 46
High School 45 35
College 36 26
Masters 28 10
Doctoral 10 8
14. The Family A attends a birthday party has probability P(A) = 0.52 while the Family B
attends a birthday party has probability P(B) = 0.44. The Family A attends the party given
that the Family B does has probability P(A/B) = 0.76. What is the probability that: (a) the
Families A and B will attend the party ?, (b) the Family B will attend the party given that
the Family A does ?, and (c) at least 1 family will attend the party ?
15. The probability that a student will submit the project on time is 0.87, the probability
that a teacher will check it on time is 0.83, and the probability that it will submit on time
and it will check on time is 0.48. Determine the probability of the following situations: (a)
the project will submit on time given that it will be checked on time and (b) the project
will check on time given that it will be submitted on time.
B(x) = nCx px qn – x
87
Batangas State University
FUNDAMENTALS OF STATISTICS
where:
p = probability of successes
q = 1 – p = probability of failures
n = number of trials
Example 1.
Solution:
p=½
q=1–p
=1–½
=½
x=2
n=3
B(x) = nCx px qn – x
= 3C2 (1/4)(1/2)
= (3)(1/4)(1/2)
where:
n = random sample
k = possible successes
Example 1.
In an ordinary deck of 52 cards, 5 cards are picked. Determine the probability that three
will be hearts.
Solution:
N = 52
n=5
k = 13
x=3
= (13C3)(39C2) / (52C5)
= 0.08 Ans.
In a Poisson distribution, usually the events occur continuously. Let us say for example,
the number of successful calls received by a call center agent within a given period of
time. The useful formula in this discrete probability distribution is given by,
P(x) = (e-𝜆)(𝜆x) / x!
where:
89
Batangas State University
FUNDAMENTALS OF STATISTICS
Note:
n = random sample
p = probability
Example 1.
Assuming that a user of a brand new cellular phone receives an average of 7 messages per
minute, what is the probability that exactly three messages will be received in a randomly
selected minute?
Solution:
x=3
𝜆=7
P(x) = (e-𝜆)(𝜆x) / x!
= (0.000912)(343) / 6
= 0.312816 / 6
= 0.052 Ans.
90
Batangas State University
FUNDAMENTALS OF STATISTICS
Practice Exercises
A. Binomial Distribution
1. If an ordinary die is tossed 4 times, then what is the probability of getting exactly three
3’s?
2. The need for some amount of monthly salary to save in a certain bank is given as the
reason for 10% of all employees. Determine the probability that precisely 3 of the next 5
employees need to save a certain amount of his monthly salary.
3. A lottery player has a probability of 0.89 in winning. What is the probability that
exactly 5 of the next 7 players will win in a lottery?
4. A study was conducted by student researchers in a certain University about eating fried
calamari. The study revealed that approximately 85% believe that “Fried calamari is a safe
street food”. What is the probability that precisely 3 of the next 5 people selected at
random will be of the opinion that “Fried calamari is not a safe street food.” ?
5. As surveyed by an inspector, the residents in a certain City showed that 54% preferred
dark gray telephone over any other color available. What is the probability that exactly 13
of the next 24 telephones installed in this city will be dark gray?
B. Hypergeometric Distribution
1. There are 3000 telephones installed in a new subdivision, 1000 have pushbuttons.
What is the probability that exactly 3 will be talking on dial telephones if 10 people are
called at random?
2. Out of 10000 residents in a village, 6000 are against a new policy. If 13 residents
of the said village are selected at random and asked their opinion, then what is the
probability that exactly 6 favor the new policy?
91
Batangas State University
FUNDAMENTALS OF STATISTICS
4. There are only 30 female employees out of 150 in a certain Company. If 10 are
chosen at random to attend the seminars, then what is the probability that exactly 3
females are selected?
5. An eco-bag contains 3 green balls, 2 blue balls, and 4 red balls. In a random
sample of 5 balls, find the probability that accurately 2 red balls are chosen.
C. Poisson Distribution
1. During the rainy season the average number of days school is closed due to flood
in a certain City is 3.5 per week. What is the probability that the schools in the said
City will close for 5 days per week during the rainy season?
4. A patient dying from an infected disease has a probability of 0.003. What is the
probability that less than 3 of the next 3000 so infected will die?
5. An individual in every 300 students makes erasures in filling up his proposal slip
during enrolment. If 3000 forms are examined at random, then what is the probability
that 3, 4, or 5 forms will have erasures?
92
Batangas State University
FUNDAMENTALS OF STATISTICS
Chapter 4
Normal Probability Distribution and the Central Limit
Theorem
This chapter introduces you to the most important continuous probability
distribution – the normal distribution – and its applications, including the sampling
distribution that approximates to a normal distribution through the central limit theorem.
The chapter includes four lessons: normal distribution, standard normal distribution,
applications of the normal curve, and central limit theorem. Before the presentation of
lessons, the intended learning outcomes are enumerated and these will serve as your
guide on what should be learned in this module. Each lesson also starts with the
objectives, followed by a concise discussion of the concepts, then by examples – problems
with solutions, and finally by exercises, which are parallel to the given examples. After the
last lesson is the “end of section test”, which you are required to take.
The exercises presented after each lesson are formative assessments to check your
learning progress. You will not be graded in these exercises but you are expected to
complete all of these. A few days after you have submitted your answers or solutions, the
correct solutions and feedback to your answers will be given to you by your instructor.
The Chapter Test is a summative assessment and your score in this test will be recorded
and will be part of your final grade (see the syllabus for details on the grading system).
In case there are concepts and examples that you fail to understand in this chapter, do not
worry – that is NORMAL. Just ask and you will be answered. Always remember, there is
no CENTRAL LIMIT in learning.
Learning Objectives
The aim of this section is for students to recognize the characteristics of a normal
distribution and learn how to use the empirical rule in solving normal distribution
problems.
93
Batangas State University
FUNDAMENTALS OF STATISTICS
At the end of this section, the students should be able to familiarize themselves with the
characteristics of a normal distribution and apply the concepts of normal distribution to
solve some problems.
Main Discussion
µ
Figure 4.1. A normal distribution
The term normal distribution is technically referring to a family of infinitely many normal
distributions. Normal distributions differ in their means and standard deviations. The
normal curve also depends on these two factors, mean and standard deviation. Figure 4.2
shows three normal distributions with different means and standard deviations. The
mean determines the location of the center of the curve while the standard deviation
determines its height and width. A smaller standard deviation has a tall and narrow curve
94
Batangas State University
FUNDAMENTALS OF STATISTICS
(left-most curve in green) while a bigger standard deviation has a short and wide curve
(right-most curve in black).
Figure 4.2. Normal distributions with different means and standard deviations
The normal distribution is mathematically defined by a normal equation (but don’t worry
about this equation – you will not be using this). The precise definition is as follows:
Definition 4.1
95
Batangas State University
FUNDAMENTALS OF STATISTICS
The probability distribution corresponding to the density function for the normal curve
with parameters μ and σ is called the normal distribution with mean μ and standard
deviation σ. This is given by
The notation is read, the random variable X is normally distributed with mean µ
and standard deviation σ.
Every normal curve, regardless of its mean and standard deviation, conforms to the
following:
· About 68% of the area under the curve (or of the data) falls within one standard
deviation of the mean
· About 95% of the area under the curve (or of the data) falls within two standard
deviations of the mean
· About 99.7% of the area under the curve (or of the data) falls within three
standard deviations of the mean
Collectively, the above is known as the empirical rule or the 68-95-99.7 rule (see also
Figure 4.3).
Examples
96
Batangas State University
FUNDAMENTALS OF STATISTICS
The empirical rule can be used to solve some probability problems. However, since the
rule is an approximation or estimation, use this only if explicitly stated to do so. Some
examples are given below.
Example 4.1.
The result of an examination that was found to be closely approximated by a normal
distribution had a mean score of 65 and standard deviation of 10. If 120 students took
the exam, use the empirical rule to find how many of them got:
a. higher than 85?
b. lower than 55?
c. between 75 and 95?
Solution:
35 45 55 65 75 85 95
Figure 4.4. Graph for Example 4.1
a. With the mean score of 65 and standard deviation of 10, the score of 85 is 2
standard deviations above the mean. By empirical rule in a normal distribution,
47.5% of the data lie between the mean and two standard deviations above the
mean (i.e. half of 95% or 34%+13.5% as can be seen in Figure 4.4). Another
property of the normal distribution is that 50% of the data falls below the mean.
Adding 50% to 47.5% and subtracting from 100% resulted to 2.5%. Therefore,
approximately
2.5%(120) = 0.025(120) = 3
97
Batangas State University
FUNDAMENTALS OF STATISTICS
b. With the mean score of 65 and standard deviation of 10, the score of 55 is 1
standard deviation below the mean. By empirical rule in a normal distribution,
34% of the data lie between the mean and 1 standard deviation below the mean
(i.e. half of 68% as can be seen in Figure 4.4). Another property of the normal
distribution is that 50% of the data falls above the mean. Adding 50% to 34%
and subtracting from 100% resulted in 16%. Therefore, approximately
16%(120) = 0.16(120) = 19
c. With the mean score of 65 and standard deviation of 10, the scores 75 and 95
are, respectively, 1 and 3 standard deviations above the mean. By empirical rule
in a normal distribution, 34% (half of 68%) of the data lie between the mean
and 1 standard deviation above the mean while 49.85% (half of 99.7%) lie
between the mean and 3 standard deviations above the mean. Subtracting 34%
from 49.85% resulted in 15.85% (which is the same as 13.5%+2.35% as can be
seen in Figure 4.4). Therefore, approximately
15.85%(120) = 0.1585(120) = 19
Example 4.2.
The average loan of faculty members from the cooperative of a large university is
PHP25,000 with a standard deviation of PHP8,000. If 1,630 of these faculty members
have loans between PHP17,000 and PHP41,000, how many faculty members have
loans? (Assume that the data are normally distributed. Use the empirical rule.)
Solution:
98
Batangas State University
FUNDAMENTALS OF STATISTICS
With the average loan of PHP25,000 and standard deviation of PHP8,000, the loan
PHP17,000 is 1 standard deviation below the mean and PHP41,000 is 2 standard
deviations above the mean. By empirical rule in a normal distribution, 34% (half of
68%) of the data lie between the mean and 1 standard deviation below the mean while
47.5% (half of 95%) lie between the mean and 3 standard deviations above the mean.
Adding 34% and 47.5% resulted in 81.5% (which is the same as 34%+34%+13.5% as
can be seen in Figure 4.5). Thus, 81.5% of the faculty members have loans between
PHP17,000 and PHP41,000. Letting X be the total number of faculty members with
loans,
81.5%(X) = 1,630
X = 1,630/.815
X = 2,000
Exercises
1. A survey of 150 adults on the time they spent on social media per day is
approximately normally distributed with a mean of 142 minutes and standard
deviation of 28 minutes. Use the empirical rule to find how many of them
spent:
a. between 58 and 198 minutes.
b. more than 86 minutes.
c. less than 170 minutes.
99
Batangas State University
FUNDAMENTALS OF STATISTICS
The aim of this section is for students to learn what the standard normal distribution is
and how to use the table of areas under the standard normal curve to determine
probabilities and z-scores.
At the end of this section, the students should be able to describe the characteristics of the
standard normal distribution and use the table of areas under the standard normal curve
to determine the probability given the z-scores and the z-scores given the area or
probability.
Main Discussion
A normal distribution of whatever mean and standard deviation can be standardized with
mean equal to zero and standard deviation equal to one (see Figure 4.6). This is done by
converting data values X to z-scores using the formula:
100
Batangas State University
FUNDAMENTALS OF STATISTICS
Definition 4.2
The standard normal distribution is the normal distribution with a mean of 0 and a
standard deviation of 1.
The graph of the standard normal distribution is called the standard normal curve
(see Figure 4.7).
The conversion from any normal distribution to the standard normal distribution is often
done to solve normal probability distribution problems with greater ease. Calculators and
table of areas under the standard normal curve are used to solve such problems. The
approximate area of any portion under the standard normal curve from the mean to a
particular z-score had already been computed (see Table 4.1). It has to be noted that
providing tables for the infinite number of normal distributions is impossible. Hence
standardized z-scores and areas under the standard normal curve were constructed and
these can now be used to determine the probabilities for all normal distributions.
101
Batangas State University
FUNDAMENTALS OF STATISTICS
The given values (main part of the table) are areas under the standard normal curve from
the mean to the positive z-score. Since the normal curve is symmetric about the mean,
these areas are also equivalent to the corresponding symmetric areas from the mean to
the negative z-score. A particular area is also the same as the probability that a score of
interest falls between 0 and a specific or given z-score.
Note that in using the table, the given z-score is equal to the value under the z-column
plus the value in the z-row. Then simply look for the area in the intersection of the column
and row values. For example, to find the probability that z falls between 0 and 1.96, in
symbol P(0<z<1.96) or the area under the standard normal curve from 0 to 1.96, in
symbol P(0 to 1.96), locate the intersection of 1.9 (z-column) and 0.06 (z-row) since 1.96
= 1.9 + 0.06. In there, the area is 0.4750 and hence P(0<z<1.96) = 0.4750 = 47.5%. The
area under the standard normal between 0 and 1.96 is illustrated in Figure 4.8.
Figure 4.8. Area under the standard normal curve from z=0 to z=1.96
Note further that Table 4.1 includes z-scores up to 3.09 but this doesn’t mean that this is
the highest z-score. In the standard normal curve, the z-scores go infinitely to both
directions and are impossible to be included all in the table. And as mentioned earlier, the
mean plus or minus 3 standard deviations practically include all data values in a
distribution. In terms of area shown in Table 4.1, the area between 0 and 3.99 is already
0.4990 that is practically half of the total area under the curve.
102
Batangas State University
FUNDAMENTALS OF STATISTICS
It should also be cautiously noted that not all standard normal distribution tables that can
be found from different references, especially the internet, have the same presentation.
There are other standard normal distribution tables that present areas to the right or to
the left of z-score, instead of areas between 0 and positive or negative z-score, as
discussed above. Hence, be extra careful on using such areas in problem solving, which
may be different from the examples presented herein.
Examples
The following are examples on using the table of areas under the standard normal curve:
Example 4.3
Find the area under the standard normal curve:
a. between z = 0 and z = 2.54
b. between z = -0.87 and z = 0
c. to the right of z = -1.20
d. to the right of z = 1.33
e. to the left of z = 2.49
f. to the left of z = -1.58
g. between z = -1.82 and z = 1.50
h. between z = -1.62 and z = -1.00
i. between z = 1.43 and z = 2.01
Solution:
From Table 4.1, the area in the intersection of 2.5 (z-column) and 0.04 (z-row) is 0.4945.
Hence, P(0<z<2.54) = 0.4945. This area is illustrated in Figure 4.9.
0 2.54
103
Batangas State University
FUNDAMENTALS OF STATISTICS
From Table 4.1, the area in the intersection of 0.8 (z-column) and 0.07 (z-row) is 0.3078,
which is the area between 0 and 0.87. Due to symmetry, this area is the same as the area
between 0 and -0.87. Hence, P(-0.87<z<0) = 0.3078. This area is illustrated in Figure
4.10.
0.87 0
Figure 4.10. Area between z = -0.87 and z = 0
From Table 4.1, the area in the intersection of 1.2 (z-column) and 0.00 (z-row) is 0.3849,
which is the area between 0 and 1.20. Due to symmetry, this area is the same as the area
between 0 and -1.20. Moreover, the area from 0 to the right is 0.5, which is to be added to
0.3849 to find the required area. Thus, P(z>-1.20) = 0.3849 + 0.5 = 0.8849. This area is
illustrated in Figure 4.11.
-1.20 0
Figure 4.11. Area to the right of z = -1.20
required area. Thus, P(z>1.33) = 0.5 – 0.4082 = 0.0918. This area is illustrated in Figure
4.12.
0 1.33
Figure 4.12. Area to the right of z = 1.33
From Table 4.1, the area in the intersection of 2.4 (z-column) and 0.09 (z-row) is 0.4936,
which is the area between 0 and 2.49. Moreover, the area from 0 to the left is 0.5, which is
to be added to 0.4936 to find the required area. Thus, P(z<2.49) = 0.4936 + 0.5 = 0.9936.
This area is illustrated in Figure 4.13.
0 2.49
Figure 4.13. Area to the left of z = 2.49
f. Area to the left of z = -1.58 or P(z<-1.58)
From Table 4.1, the area in the intersection of 1.5 (z-column) and 0.08 (z-row) is 0.4429,
which is the area between 0 and 1.58. Due to symmetry, this area is the same as the area
between 0 and -1.58. However, this is not yet the required area. Further, the area from 0
to the left is 0.5, where 0.4429 is to be subtracted to find the required area. Thus, P(z<-
1.58) = 0.5 – 0.4429 = 0.0571. This area is illustrated in Figure 4.14.
-1.58 0
Figure 4.14. Area to the left of z = -1.58
From Table 4.1, the area in the intersection of 1.8 (z-column) and 0.02 (z-row) is 0.4656,
which is the area between 0 and 1.82. Due to symmetry, this area is the same as the area
between 0 and -1.82. Further, the area in the intersection of 1.5 (z-column) and 0.00 (z-
row) is 0.4332, which is the area between 0 and 1.50, and this is to be added to 0.4656 to
105
Batangas State University
FUNDAMENTALS OF STATISTICS
find the required area. Hence, P(-1.82<z<1.50) = 0.4656 + 0.4332 = 0.8988. This area is
illustrated in Figure 4.15.
-1.82 0 1.50
Figure 4.15. Area between z = -1.82 and z = 1.50
From Table 4.1, the area in the intersection of 1.6 (z-column) and 0.02 (z-row) is 0.4474,
which is the area between 0 and 1.62. Due to symmetry, this area is the same as the area
between 0 and -1.62. Further, the area in the intersection of 1.0 (z-column) and 0.00 (z-
row) is 0.3413, which is the area between 0 and 1.00. Due to symmetry, this area is the
same as the area between 0 and -1.00. The required area is the difference between 0.4474
and 0.3413. That is, P(-1.62<z<-1.00) = 0.4474 – 0.3413 = 0.1061. This area is illustrated
in Figure 4.16.
-1.62 -1.00 0
Figure 4.16. Area between z = -1.62 and z = -1.00
From Table 4.1, the area in the intersection of 1.4 (z-column) and 0.03 (z-row) is 0.4236,
which is the area between 0 and 1.43. Further, the area in the intersection of 2.0 (z-
column) and 0.01 (z-row) is 0.4778, which is the area between 0 and 2.01. The required
area is the difference between 0.4778 and 0.4236. That is, P(1.43<z<2.01) = 0.4778 –
0.4236 = 0.0542. This area is illustrated in Figure 4.17.
0 1.43 2.01
Figure 4.17. Area between z = 1.43 and z = 2.01
Example 4.4
106
Batangas State University
FUNDAMENTALS OF STATISTICS
Given the following areas under the standard normal curve between 0 and a z-score:
a. 0.4949; find the z-score at the right of 0.
b. 0.4908; find the z-score at the right of 0.
c. 0.3461; find the z-score at the left of 0.
d. 0.3275; find the z-score at the left of 0.
e. 0.4861; find the z-scores.
f. 0.4765; find the z-scores.
Solution:
It is not difficult to look for areas in the table of areas under the standard normal curve
since the areas are presented from lowest (0.0000 for z = 0.00) to highest (0.4990 for z =
3.09). From Table 4.1, the area 0.4949 corresponds to z = 2.57. This answer is illustrated
in Figure 4.18.
0 z
(2.57)
Figure 4.18. Area between z = 0 and z = 2.57
If the given area is not in the table of areas under the standard normal curve, the two
areas that sandwiched this should be looked into. The area 0.4908 is not in Table 4.1.
However, the area that is immediately lower than 0.4908 is 0.4906, which corresponds to
z = 2.35 while the area that is immediately higher than this is 0.4909, which corresponds
to z = 2.36. The area that is closer to 0.4908 gives the required z-score. Subtractions of
the areas give:
0.4908 – 0.4906 = 0.0002
0.4909 – 0.4908 = 0.0001
Therefore, the required z-score is 2.36. This answer is illustrated in Figure 4.19.
107
Batangas State University
FUNDAMENTALS OF STATISTICS
0 z
(2.36)
Figure 4.19. Area between z = 0 and z = 2.36
Note that a more accurate z-score can be found through a process called interpolation.
However, if the answer is to be rounded to the nearest hundredths, then the answer will
be the same as the above.
From Table 4.1, the area 0.3461 corresponds to z = 1.02. But since the required z-score is
at the left of 0, then by symmetry, z = -1.02. This answer is illustrated in Figure 4.20.
z 0
(-1.02)
Figure 4.20. Area between z = -1.02 and z = 0
The area 0.3275 is not in Table 4.1. However, the area that is immediately lower than
0.3275 is 0.3264, which corresponds to z = 0.94 while the area that is immediately higher
than this is 0.3289, which corresponds to z = 0.95. Subtractions of the areas give:
Hence, 0.3264 is closer to 0.3275 and this corresponds to z = 0.94. But since the required
z-score is at the left of 0, then by symmetry, z = -0.94. This answer is illustrated in Figure
4.21.
108
Batangas State University
FUNDAMENTALS OF STATISTICS
z 0
(-0.94)
Figure 4.21. Area between z = -0.94 and z = 0
There are two z-scores that correspond to a particular area under the standard normal
curve. One is at the left of 0 and the other is at the right of 0. From Table 4.1, the area
0.4861 corresponds to z = 2.20, but by symmetry, this also corresponds to z = -2.20.
These answers are illustrated in Figure 4.22.
z1 0 z2
(-2.20) (2.20)
Figure 4.22. Area between z = -2.20 and z = 0 and between z = 0 and z =
2.20
The area 0.4765 is not in Table 4.1. However, the area that is immediately lower than
0.4765 is 0.4761, which corresponds to z = 1.98 while the area that is immediately higher
than this is 0.4767, which corresponds to z = 1.99. Subtractions of the areas give:
0.4765 – 0.4761 = 0.0004
0.4767 – 0.4765 = 0.0002
Hence, 0.4767 is closer to 0.4765 and this corresponds to z = 1.99. By symmetry, this
also corresponds to z = -1.99. These answers are illustrated in Figure 4.23.
z1 0 z2
(-1.99) (1.99)
Figure 4.23. Area between z = -1.99 and z = 0 and between z = 0 and z =
1.99
109
Batangas State University
FUNDAMENTALS OF STATISTICS
Practice Exercises
2. Given the following areas under the standard normal curve between 0 and a z-score:
a. 0.3686; find the z-score at the right of 0.
b. 0.2720; find the z-score at the right of 0.
c. 0.4732; find the z-score at the left of 0.
d. 0.4350; find the z-score at the left of 0.
e. 0.4495; find the z-scores.
f. 0.3781; find the z-scores.
The aim of this section is for students to apply the concepts learned in solving probability
problems involving normal distribution in real-life situations.
At the end of this section, the students should be able to solve probability problems
involving normal distribution in real-life situations.
110
Batangas State University
FUNDAMENTALS OF STATISTICS
Main Discussion
The applications on finding the area of a portion under the standard normal curve can be
extended into finding the area of a portion under any normal curve. These involve the
computation of z-scores that corresponds to the given X-scores (using the formula given
earlier), conversion of the normal curve into the standard normal curve, and using the
table of areas under the standard normal curve.
If the area or probability is given and the requirement is to find the X-scores, the solution
involves finding first the z-scores and then converting these z-scores into X-scores by
using the formula:
X = µ + zσ
Examples
Example 4.5
Given that IQ scores are normally distributed with a mean of 100 and standard
deviation of 15, what is the probability that a randomly selected person has an IQ score
between 110 and 120?
Solution:
Using the formula, , convert the given X-scores into their corresponding z-scores
as follows:
111
Batangas State University
FUNDAMENTALS OF STATISTICS
From Table 4.1, the area in the intersection of 0.6 (z-column) and 0.07 (z-row) is 0.2486,
which is the area between 0 and 0.67. Further, the area in the intersection of 1.3 (z-
column) and 0.03 (z-row) is 0.4082, which is the area between 0 and 1.33. The required
area is the difference between 0.4778 and 0.4236. That is, P(0.67<z<1.33) = 0.4082 –
0.2486 = 0.1596. Hence, the probability that a randomly selected person has an IQ score
between 110 and 120 is:
Example 4.6
The Department of Public Health employs a large number of encoders to enter COVID-
19 data into a computer. The time it takes for new encoders to learn the computer
system is known to have a normal distribution with a mean of 90 minutes and standard
deviation of 15 minutes. What is the proportion of new encoders who take less than one
hour to learn the computer system?
Solution:
Using the formula, convert the given X-score (1 hour = 60 minutes) into z-score as
follows:
Sketching the normal curve (see Figure 4.25), it is clear that P(X<60) = P(z<-2.00).
60 90 X
112
Batangas State University
FUNDAMENTALS OF STATISTICS
-2.00 0 z
Figure 4.25. Area to the left of X = 60 or of z = -2.00
From Table 4.1, the area in the intersection of 2.0 (z-column) and 0.00 (z-row) is 0.4772,
which is the area between 0 and 2.00 or between 0 and -2.00. The area at the left of 0 is
0.5 from which 0.4772 is to be subtracted to find the required area. That is, P(z<-2.00) =
0.5 – 0.4772 = 0.0228. Hence, the proportion of new encoders who takes less than one
hour to learn the computer system is:
Example 4.7
To qualify for a college scholarship of a certain foundation, the applicant must be
included in the top 10% in an examination administered annually. It was found that the
mean score for the exam is 84 with a standard deviation of 7. Assuming that the scores
are normally distributed, find the minimum score to qualify for the scholarship.
Solution:
After sketching the normal curve (Figure 4.26), bear in mind that the shaded region has
an area of .1000 (10%). Moreover, the area to the right of 0 is 0.5. Hence, the area
between 0 and z is 0.4000 (that is, 0.5 – 0.1000).
Now, in Table 4.1, look for the z-score that corresponds to the area equal to or closest to
0.4000. The closest value to 0.4000 is 0.3997 with the z-score of 1.28. This is also
illustrated in Figure 4.26.
82 X
0 z
(1.28)
Figure 4.26. Area to the right of X or of z = 1.28
113
Batangas State University
FUNDAMENTALS OF STATISTICS
Therefore, to qualify for the scholarship (belonging to top 10%), the applicant should have
a score of at least 93 in the examination.
Practice Exercises
Solve the following problems.
1. If the IQ scores are normally distributed with a mean of 100 and standard
deviation of 15, what is the probability that a randomly selected person has an IQ
score between 75 and 95?
3. Scores on a 300-item college admission test are normally distributed with a mean
score of 160 and standard deviation of 30. The university only admits applicants
whose test score belongs to the top 20% per semestral examination. What should
be the minimum score of the applicant to be considered for admission?
The aim of this section is for students to learn the basic concepts on the central limit
theorem and how to apply the theorem on determining the probabilities of selecting
possible sample means from a specified population.
At the end of this section, the students should be able to explain the central limit theorem
and apply the theorem to find probabilities of selecting sample means from a given
population.
Main Discussion
Sampling almost always results in what is termed sampling “error”. This, however,
doesn’t refer to the error in using a sampling method. Rather, sampling error is the
difference between a sample statistic (e.g. sample mean, sample standard deviation) and
114
Batangas State University
FUNDAMENTALS OF STATISTICS
The means for samples of a specified size taken from a population also vary from sample
to sample. If the means of all possible samples of a specified size are organized into a
probability distribution, the sampling distribution of the sample mean is obtained. This is
formally defined as follows:
Definition 4.3
The sampling distribution of the sample mean is the probability distribution of all
possible sample means of a specified sample size.
As such, if all possible random samples are taken from a population and for each sample,
the sample mean is computed; the following important relationships between the
population distribution and the sampling distribution of the sample mean can be noted:
1. The mean of the sample means is exactly equal to the population mean.
2. The dispersion of the sampling distribution of the sample means is narrower
than the population distribution.
3. The sampling distribution of the sample means tends to approximate the
normal probability distribution.
However, it is very cumbersome and sometimes almost impossible to include all possible
random samples for this sampling distribution. For example, if the population consists of
30 data values and 5 data values are to be taken randomly (simple random sampling with
replacement) for each sample (i.e. sample 1 has 5 data values taken randomly from 30;
sample 2 has 5 data values taken randomly also from 30; etc.), there will be a total of
142,506 possible samples (i.e. combination of 30 observations taking 5 observations at a
time) and so there are 142,506 sample means to be organized into a probability
distribution.
Thanks to the central limit theorem that it is not anymore necessary to include all possible
random samples. The formal statement of the theorem is as follows:
Theorem 4.1
115
Batangas State University
FUNDAMENTALS OF STATISTICS
The central limit theorem states that if random samples of a particular size are
selected from any population, the sampling distribution of the sample mean is
approximately a normal distribution. This approximation improves with larger samples.
The question now is what should be the number of random samples to be taken from the
population or of sampling means to be included in the sampling distribution for it to
approximate a normal distribution to a greater extent. One rule of thumb for the number
of samples necessary to use the central limit theorem is to recognize that the more skewed
the population distribution is, the more samples are needed to obtain a normal
distribution.
Moreover, it has been shown that if the population is normally distributed, then for any
number of samples, the sampling distribution of the sample mean will also be normal. If
the population distribution is skewed, it may require 30 or more samples to observe the
normality feature. The central limit theorem further indicates that, regardless of the
shape of the population distribution, the sampling distribution of the sample mean will
move toward the normal probability distribution and the larger the number of
observations in each sample, the stronger the convergence.
Take note of the difference between the number of samples and the number of
observations. The number of samples may be less than 30 if the population is known to
have a normal distribution. But the number of samples is recommended to be 30 or more
if the population is skewed or if the population distribution is unknown. Each sample
should have the same specified size, which is called number of observations (n) or data
values taken from the population data values randomly (single random sampling with
replacement, which means a population data value may belong to more than one sample
or set of values). It is this number of observations that should be made larger to have a
stronger convergence or for the sampling distribution to approximate very closely the
normal distribution.
The central limit theorem does not say anything about the dispersion of the sampling
distribution of the sample mean or about the comparison of the mean of the sampling
distribution of the sample mean to the mean of the population. However, it can be
demonstrated that the mean of the sampling distribution is the population, i.e., and if the
standard deviation of the population is , the standard deviation of the sample means is ,
116
Batangas State University
FUNDAMENTALS OF STATISTICS
where n is the number of observations in each sample. That is, , which is called standard
error of the mean (its longer name is standard deviation of the sampling distribution of
the sample mean).
1. The mean of the distribution of the sample means will be exactly equal to the
population mean if all possible samples of the same size from a given
population are included. That is,
Even if not all possible samples are included, it can still be expected that the mean
of the distribution of the sample means is close to the population mean.
2. There will be less dispersion in the sampling distribution of the sample mean
than in the population. The standard deviation of the distribution of the sample
means is
It has to be noted also that the standard error of the mean decreases if the size of the
sample is increased.
In this formula, X is the value of the random variable, µ is the population mean and σ is
the population standard deviation.
However, in case of the distribution of , the sample mean, instead of X, the value of one
observation, the formula for finding the z-score is
when the population standard deviation is known. In the above formula, the numerator is
the sampling error while the denominator is the standard error of the mean.
When the population standard deviation is unknown, the formula for z-score is
117
Batangas State University
FUNDAMENTALS OF STATISTICS
where s is the sample standard deviation. If there are at least 30 samples, the sample
standard deviation is used as an estimate of the population standard deviation.
Carefully note further that there are three different means in the above formulas: the
population mean, the sample mean, and the mean of the sample means,, as well as three
different standard deviations: the population standard deviation , the sample standard
deviation s, and the standard deviation of the sample means,.
Examples
Example 4.8
The tax value of all registered vehicles in the country has a mean of PHP675,000 and a
standard deviation of PHP210,000. Suppose 30 random samples of size 100 vehicles
each are drawn from the population of vehicles. What are the mean and standard
deviation of the sampling distribution?
Solution:
With the central limit theorem, the sampling distribution of 30 sample means is said to
approximate a normal distribution. Hence the mean and standard deviation of the
sampling distribution of the sample mean are determined as follows:
Example 4.9
118
Batangas State University
FUNDAMENTALS OF STATISTICS
amount contained in the bottles is 60.1mL. What is the probability of finding a sample of
25 bottles that contain a mean amount of 60.1mL or more?
Solution:
From Table 4.1, the area in the intersection of 2.5 (z-column) and 0.00 (z-row) is 0.4938,
which is the area between 0 and 2.50. The area to the right of z = 2.50 is P(z>2.50) = 0.5
– 0.4938 = 0.0062. The graph is shown in Figure 4.27.
Hence, the probability of finding a sample of 25 bottles that contain a mean amount of
60.1mL or more is only 0.0062 or 0.62%.
Example 4.10
The Department of Labor states that the mean daily wage of construction workers in a
region is PHP470. A survey of 100 construction workers in the region showed that their
mean daily wage is PHP455. If the sample standard deviation is PHP70, what is the
probability of selecting a sample consisting of 100 construction workers with a mean daily
wage of less than PHP455?
119
Batangas State University
FUNDAMENTALS OF STATISTICS
Solution:
From Table 4.1, the area in the intersection of 2.1 (z-column) and 0.04 (z-row) is 0.4838,
which is the area between 0 and 2.14 or between 0 and -2.14. The area to the left of z = -
2.14 is P(z<-2.14) = 0.5 – 0.4838 = 0.0162. The graph is shown in Figure 4.28.
Hence, the probability of selecting a sample consisting of 100 construction workers with a
mean daily wage of less than PHP455 is 0.0162 or 1.62%.
Practice Exercises
1. Thirty samples with 100 vehicles for each sample were selected from the list of all
registered vehicles in the country using simple random sampling with replacement. The
tax values of the selected vehicles for each sample were recorded and the sample means
were computed. Suppose the mean and standard deviation of the sample means are
120
Batangas State University
FUNDAMENTALS OF STATISTICS
respectively, PHP700,000 and PHP30,000, find the population mean and population
standard deviation.
2. Suppose that in Example 4.9, a random sample of 25 bottles has a mean amount of
59.6mL. What is the probability of finding a sample of 25 bottles that contain a mean
amount of 59.6mL or less?
3. Government data shows that the mean daily wage of factory workers in a particular
city is PHP490. A survey of 49 factory workers in that city revealed that their mean daily
wage is PHP512. If the sample standard deviation is PHP70, what is the probability of
selecting a sample consisting of 49 factory workers with a mean daily wage of more than
PHP512?
121
Batangas State University
FUNDAMENTALS OF STATISTICS
Chapter 5
Confidence Intervals
INTRODUCTION
This module introduces you to the concepts of point estimates and confidence intervals.
The chapter begins with the definition of a point estimate and the confidence interval.
Then, it proceeds to (1) confidence interval for the population mean with a known
population or a large sample, (2) confidence interval for the population mean with an
unknown population standard deviation and a small sample; (3) confidence interval for a
population proportion; and (4) finite population correction factor. The last lesson is about
the factors being considered in choosing an appropriate sample size. Each lesson starts
with the objectives and followed by a concise discussion of the concepts. Sample problems
with solutions are also given together with exercises, which are parallel to the given
examples. After the last lesson is the “end of module test”, which you are required to take.
The list of references is also provided at the end of the chapter. The references are
carefully selected to help you learn the contents of this module and these references are
all available for free on the internet.
The exercises presented after the lessons are formative assessments to check your
learning progress. You will not be graded in these exercises but you are expected to
complete all of these. A few days after you have submitted your answers or solutions, the
correct solutions and feedback to your answers will be given to you by your instructor.
The End of Module Test is a summative assessment and your score in this test will be
recorded and will be part of your final grade (see the syllabus for details on grading
system).
After finishing this chapter, your CONFIDENCE INTERVAL will be wide enough for you
to learn the next modules and eventually succeed in this course.
Learning Objectives
The aim of this section is for students to learn the concept of estimation and confidence
intervals.
122
Batangas State University
FUNDAMENTALS OF STATISTICS
At the end of this section, the students should be able to define a point estimate and a
confidence interval.
Main Discussion
Estimation is the process of estimating the population parameter using the information
drawn from the sample. Whence, a point estimate is a particular value used to estimate a
population value. For example, thousands of students have taken an examination but
their average score in the exam is not known (or is impractical to compute). However,
there is a need to report at least an estimate of such average results. Hence, 100 students
were selected at random, their scores were taken, and the average score was computed.
Such average score is called sample mean and this can be used as point estimate of the
unknown population mean.
However, a point estimate is only a single value. Oftentimes, a more informative approach
is to present a range of values in which the population parameter is expected to be
included. Such a range of values is called a confidence interval.
Definition 5.1
Point estimate is the statistic, computed from sample information, which is used to
estimate the population parameter.
The sample mean, sample standard deviation, and sample proportion are point estimates
of population mean, population standard deviation, and population proportion,
respectively. However, these point estimates only tell a part of the story. Although a point
estimate is expected to be close to a population parameter, more often than not, there is a
need to measure how close it really is. The confidence interval serves this purpose.
123
Batangas State University
FUNDAMENTALS OF STATISTICS
Definition 5.2
Confidence interval is a range of values constructed from sample data so that the
population parameter is likely to occur within that range at a specified probability,
which is called level of confidence.
For example, if the mean daily wage of construction workers in the country is PHP470,
the range of this estimate might be from PHP450 to PHP490. The level of confidence that
the population means is within the range or interval can be described by a probability
statement. For instance, one might state that: “I am 90% sure that the mean daily wage of
all construction workers in the country is between PHP450 and PHP490.”
Through the information developed about the shape of a sampling distribution of the
sample mean, an interval that has a specified probability of containing the population
mean can be located. From the results of the central limit theorem, the following
statements can be deduced:
1. Ninety-five percent of the sample means will be within 1.96 standard deviations of
the population mean.
2. Ninety-nine percent of the sample means will be within 2.58 standard deviations of
the population mean.
The standard deviation mentioned above is the standard deviation of the sampling
distribution of the sample mean, which is usually called, the standard error. The intervals
computed through the above are respectively called the 95 percent confidence interval
and the 99 percent confidence interval.
The values 1.96 and 2.58 are z-scores, which can be easily determined using the table of
areas under the standard normal curve. The 95 percent and 99 percent refer to the
percent of similarly constructed intervals that would include the parameter being
estimated. For example, the 95 percent refers to the middle 95% of the observations.
Dividing 95% by 2, the result is 47.5% or 0.4750, which is the area from 0 to z-score in the
standard normal curve. Using the table of areas under the standard normal curve (see
Table 4.1 in Chapter 4), the corresponding z-score is 1.96. This is illustrated in Figure 5.1.
124
Batangas State University
FUNDAMENTALS OF STATISTICS
Thus, the probability of being in the interval -1.96 to 1.96 is 0.9500 or 95%.
Practice Exercises
Show the (a) 90% confidence interval and (b) 99% confidence interval in the standard
normal curve.
Learning Objectives
The aim of this section is for students to learn how to compute and interpret confidence
intervals for estimating the mean of a population with known standard deviation or with
an unknown standard deviation but the sample is large.
At the end of this section, the students should be able to compute and interpret the
confidence interval for population mean with known standard deviation or a large sample.
Main Discussion
Here, take the case of determining the 95% confidence interval. Assume that there is a
research about the monthly school expenses of college students. Computations revealed
that the sample mean is PHP2,880 and the standard deviation (i.e. the “standard error”)
of the sample mean is PHP240. Also assume that the sample is large enough to
approximate the normal distribution. Hence, the 95% confidence interval is between
PHP2.409.60 and PHP3,350.40, computed by PHP2,880 ± 1.96(PHP240). Moreover, if
200 samples of the same size were selected from the population and the corresponding
125
Batangas State University
FUNDAMENTALS OF STATISTICS
200 confidence intervals were determined, it is expected to find the population mean in
about 190 of the 200 confidence intervals.
In the above example, the standard error of the sampling distribution of the sample mean
was given (result of computation) as PHP240. Recall that this is the standard error of the
sample means discussed in the previous topic (Central Limit Theorem).
For the case when the population standard deviation is known, recall that the formula for
determining this standard error is
However, in most situations, the population standard deviation is unknown and in such
cases, the standard error is estimated as follows:
The size of the standard error is affected by two values: the standard deviation and the
sample size (i.e. the number of observations in a sample). If the standard deviation is
large, then the standard error will also be large. As the sample size is increased, the
standard error decreases, which means that there is less variability in the sampling
distribution of the sample mean. This conclusion is logical since the estimate with a large
sample is more precise than the estimate from a small sample.
As provided by the central limit theorem, using a large sample will make the sampling
distribution of the sample mean approximate the normal distribution. If the sample mean
is normally distributed, then the standard normal curve and the z-scores can be used in
the computations.
When the number of observations in the sample is at least 30, the 95 percent confidence
interval is computed as follows:
126
Batangas State University
FUNDAMENTALS OF STATISTICS
while the confidence interval for population mean with unknown population standard
deviation but using a large sample is computed by:
Examples
The following example shows the details for determining the confidence interval of the
population mean with unknown population standard deviation but using a large sample
and interpreting the result.
Example 5.1
The Education Department wants to have information on the mean family income of the
families of students in public elementary schools. A random sample of 100 families
revealed a sample mean of PHP22,000 with a standard deviation of PHP2,000. The
department seeks answers for the following:
2. What does the reasonable range of values for the population mean? (Assume that
the university decides to use 95% level of confidence.)
Solution:
1. For a large sample, the sample mean is a point estimate of the population mean.
Thus, the population mean family income is estimated to be PHP22,000.
127
Batangas State University
FUNDAMENTALS OF STATISTICS
2. If the Education Department decides to use the 95% level of confidence, the
corresponding confidence interval is computed as follows:
Thus, the confidence interval or the reasonable values of the population mean family
income is from PHP21,800 to PHP22,200. (Note: These values are called confidence
limits; PHP21,800 is the lower confidence limit and PHP22,200 is the upper
confidence limit.)
3. Suppose many samples of 100 families each were selected. Then for each sample,
the mean, standard deviation and 95% confidence interval were computed. It is
expected that about 95% of these confidence intervals contain the true population
mean. Thus, about 5% of the intervals do not contain the population mean and this is
attributed to the so-called sampling error, which is the risk assumed when selecting
the level of confidence.
Practice Exercises
The mean daily sales of a fast food outlet is PHP780,000 for a sample of 60 days. The
standard deviation of the sample is PHP125,000.
1. What does the estimate mean daily sales of the population? What is this estimate
called?
128
Batangas State University
FUNDAMENTALS OF STATISTICS
Learning Objectives
The aim of this section is for students to recognize the characteristics of the t distribution
and learn how to compute confidence intervals for estimating the mean of a population
with an unknown standard deviation and a small sample.
At the end of this section, the students should be able to describe the t distribution and
compute the confidence interval for population mean with unknown standard deviation
and a small sample.
Main Discussion
In the previous section, the z-scores and the standard normal curve were used to
determine the confidence interval for a particular level of confidence. That is applicable
because either:
2. The shape of the population and the population standard deviation are unknown
but the number of observations in the sample is at least 30.
129
Batangas State University
FUNDAMENTALS OF STATISTICS
With the assumption that the population of interest is normal or nearly normal, the
following are the characteristics of the t distribution:
3. The t distribution is more spread out and flatter at the center than the standard
normal distribution. But as the sample size increases, the t distribution approaches the
standard normal distribution because the errors in using s to estimate σ decreases with
larger samples.
Because the t distribution has a greater spread than the z distribution, the value of t for a
given level of confidence is larger in magnitude than the corresponding z value. This is
illustrated in Figure 5.3 using a 95% level of confidence.
130
Batangas State University
FUNDAMENTALS OF STATISTICS
Using the t distribution, the confidence interval for population mean with unknown
population standard deviation and using a small sample is computed by:
The value of t in the above formula is determined using the Student’s t distribution table
found in the appendix. How to find this value of t from the table is explained in the next
example.
Example 5.2
The Higher Education Commission wants to estimate the mean daily school expenses of
college students. A sample of 16 students revealed a sample mean of PHP90 with a
standard deviation of PHP12. Construct a 95% confidence interval for the population
mean.
Solution:
131
Batangas State University
FUNDAMENTALS OF STATISTICS
The first thing to take note in using t distribution is that there is a need to assume that the
population distribution is normal. Although there is no clear evidence, the assumption
that the students’ daily school expense of students is reasonable (many students are
expected to be spending on the average and few to very few will be spending either higher
or lower than the average). Since the population standard deviation is unknown and the
sample size is small, the use of z distribution is inappropriate. Further, given a small
sample size, if the assumption for a normal population is unreasonable both the z and the
t distributions are inappropriate and therefore an appropriate nonparametric test should
be used. In this case, however, since the assumption of normal population distribution is
reasonable and that the sample standard deviation has been known or computed, the t
distribution can be used.
To find the value of t in the t distribution table (see Table 5.1), there is a need to
determine the number of degrees of freedom or df. The number of degrees of freedom is
the number of observations in the sample minus the number of samples, written n – 1. In
this case, df = n – 1 = 16 – 1 = 15. The df in Table 5.1 is in the first column and in there,
locate the row df = 15. Next, locate the column for 95% confidence interval. Then look for
the value of t at the intersection of “row 15” and “column 95%” and that value is 2.131.
The 95% confidence interval for the population mean is now computed as follows:
Hence the confidence interval for the mean daily school expenses of all students is
between PHP83.60 and PHP96.40.
Practice Exercises
2. Construct the 90% confidence interval of the population mean for the case given in
Example 52.
132
Batangas State University
FUNDAMENTALS OF STATISTICS
Learning Objectives
The aim of this section is for students to learn how to compute confidence intervals for
estimating the proportion of a population and interpret the results.
At the end of this section, the students should be able to compute the confidence interval
for population proportion and interpret the results.
Main Discussion
Proportion refers to the fraction, ratio or percent indicating the part of the sample or
the population having a particular trait of interest. For example, a recent survey indicated
that 89 out 100 students in a particular university favored the reopening of the university
through online classes during the time of COVID-19 pandemic. The sample proportion is
89/100, or .89, or 89%. If p is the sample proportion, X is the number of “successes”, and
n is the number of items sampled; then the sample proportion is as follows:
The population proportion (in symbol, π) refers to the percent of successes in the
population. To develop a confidence interval for a population proportion, the following
assumptions should be met:
b. There are only two possible outcomes, usually labeled as “success” and
“failure”.
c. The probability of success remains the same from one trial to the next.
d. The trials are independent, which means the outcome of one trial does not
affect the outcome of another.
133
Batangas State University
FUNDAMENTALS OF STATISTICS
2. The values n π and n(1 – π) should both be greater than or equal to 5. By this
condition, the standard normal distribution can be used to complete a confidence
interval.
where is the standard error of the proportion, which measures the variability in the
sampling distribution of the sample proportion. This standard error is computed by
Hence, a confidence interval for a population proportion can be constructed using the
formula:
Example 5.3
The government is considering the restoration of death penalty for drug crimes.
Lawmakers are considering drafting and passing a law if at least two-thirds of the
population favors death penalty. A survey on 1,200 adult citizens revealed that 68%
favored the death penalty for drug crimes.
134
Batangas State University
FUNDAMENTALS OF STATISTICS
Solution:
1. The sample proportion, 68% or .68 is the point estimate of the population
proportion.
2. With the z-score of 1.96 corresponding to the 95% level of confidence, the 95%
confidence interval is computed by
Thus, the confidence interval for the population proportion is between .6536 to .7064 (or
roughly between 65% and 71%).
3. The confidence limits or endpoints of the interval are .6536 and .7064. The lower
endpoint is .6536 that is less than .6667 (two-thirds). Hence, it is not likely that
lawmakers will pass a law on death penalty.
Practice Exercises
135
Batangas State University
FUNDAMENTALS OF STATISTICS
Learning Objectives
The aim of this section is for students to learn how to adjust the confidence interval using
the finite population correction factor.
At the end of this section, the students should be able to compute the confidence interval
of a population parameter using the information from a sample taken from a finite
population.
Main Discussion
The discussion and examples in the previous sections involve populations of interest that
are unknown and considered very large or “infinite”. A population that has a fixed known
upper bound is finite. If the sample was taken from a finite population, there is a need to
make some adjustments in the computation of the standard error of the sample means or
the standard error of the sample proportions.
For a finite population, where the total number of objects is N and the size of the sample
is n, the following adjustment is made to the standard error:
The standard error of the sample mean, with a finite population correction factor is
computed by:
The standard error of the sample proportion, with a finite population correction factor is
computed by:
Suppose the population is 1,000 and the sample is 100. Then the fpc is computed as
follows:
This implies that when the standard error is multiplied by this fpc, the standard error is
reduced by about 5% (i.e. 1 – .9492 = .0508). The reduction in the size of the standard
error yields a smaller range of values in estimating the population mean or population
proportion. If the sample is increased to 300, then the fpc is:
which will reduce the standard error by about 16% (i.e. 1 – .8371 = .1629) and will further
make the range of values smaller and the estimation better.
which will reduce the standard error by about 1.5% only. Similar computation for the
sample size of 10 will result in a reduction of the standard error by a negligible 0.45%.
When the sample size is less than 5% of the population, the effect of the fpc is quite small.
Hence, the usual rule is to apply the fpc if the fraction of the sample to the population is at
least 5%. For n/N < .05, the fpc may not be applied. In this case of 1,000 as population,
the fpc may be ignored if the sample size is less than 50 (since 50/1,000 = .05).
Examples
Example 5.4
There are 240 students in Mathematics in the Modern World who took the midterm
examination. A random sample of 30 students revealed that the mean score in the exam
is 82 with a standard deviation of 7. Construct a 99% confidence interval for the
population mean score.
137
Batangas State University
FUNDAMENTALS OF STATISTICS
Solution:
Since the population is finite and the sample constitutes more than 5% of the population
(i.e. n/N = 30/240 = 0.125 = 12.5%), use the formula for constructing confidence interval
for the population mean with finite population correction factor. The computation is as
follows:
Thus, the 99% confidence interval for the population mean score is between 79 and 85.
Example 5.5
The study given in Example 5.4 also revealed that 24 out of the 30 randomly selected
students passed the exam. Construct the 95% confidence interval for the population
proportion.
Solution:
The sample proportion is 24/30 = .80 (or 80%). The 95% confidence interval for the
population proportion is computed by:
Hence, the 95% confidence interval for population proportion is between .67 and .93 or
between 67% and 93%.
Practice Exercises
There are 960 applicants who took the entrance examination for an academic program. A
random sample of 100 students revealed that the mean score in the exam is 77 with a
standard deviation of 10.
138
Batangas State University
FUNDAMENTALS OF STATISTICS
2. Suppose 65 out of the 100 students passed the exam, construct a 99% confidence
interval for the population proportion.
Learning Objectives
The aim of this section is for students to learn how to choose an appropriate sample size
for a statistical study.
At the end of this section, the students should be able to compute the appropriate sample
size using the formulas for determining the sample size for estimating the population
mean or the population proportion.
Main Discussion
When designing a particular statistical study, an important concern that usually arises is
the number of observations that should be included in the sample, which is called
sample size. If the size of the sample is too large, money is wasted in collecting the data.
But if the sample size is too small, the resulting conclusions will be uncertain.
The first factor is the level of confidence. The researcher is the one who decides and
selects a level of confidence. Technically, any value from 0 to 100 percent can be chosen,
but the most commonly used levels of confidence are 95% and 99%. The 95% level of
confidence corresponds to a z-score of 1.96 and the 99% level of confidence corresponds
139
Batangas State University
FUNDAMENTALS OF STATISTICS
to a z-score of 2.58. The higher the level of confidence selected, the larger the size of the
corresponding sample.
The second factor is the allowable error. The maximum allowable error (in symbol, E or
e), is the amount that is added and subtracted to the sample mean or sample proportion
to determine the confidence limits or endpoints of the confidence interval. It is the
amount of error the researcher is willing to tolerate. It is also one-half the width of the
corresponding confidence interval. A small allowable error will require a large sample
while a large allowable error can permit a smaller sample.
The third factor is the population standard deviation. If the population is widely
dispersed, a large sample is required. But if the population is “concentrated” or
homogenous, the sample size may be smaller. If the population standard deviation is
unknown, it may be necessary to use an estimate. The following are the suggestions in
finding an estimate of the population standard deviation:
1. Use a comparable study. Use this approach when there is an estimate of the
dispersion available from another study. Information from government agencies who
regularly sampled a population of interest may be useful to provide an estimate of the
population standard deviation. If a standard deviation observed in a previous study is
thought to be reliable, it may also be used in a current study to help in determining an
approximate sample size.
3. Conduct a pilot study. This is a common method that is used to determine the
validity and reliability of a questionnaire. A pilot study usually has a small sample. From
this small sample, the standard deviation may be computed and used to determine the
appropriate sample size.
The interaction among these three factors and the sample size is express as:
140
Batangas State University
FUNDAMENTALS OF STATISTICS
Solving this equation for n yields the following formula for finding the sample size for
estimating the population mean:
where n is the sample size; z is the standard normal score corresponding to the desired
level of confidence; s is an estimate of the population standard deviation; and E is the
maximum allowable error.
The result of the computation for n is not always a whole number. The usual practice is to
round up the fractional result to the next whole number. For example, 176.14 should be
rounded up to 177.
The procedure described above can be adapted to determine the sample size with regard
to population proportion. The formula for determining the sample size estimating the
population proportion is:
Example 5.6
141
Batangas State University
FUNDAMENTALS OF STATISTICS
Solution:
The sample size for determining the 95% confidence interval for the population mean is
computed as follows:
The computed value of 216.09 is rounded up to 217. Thus, the sample size should be 217
to meet the specifications.
The sample size for determining the 99% confidence interval for the population mean is
computed as follows:
The computed value of 374.42 is rounded up to 375. Thus, the sample size should be 375
to meet the specifications.
Note in the above example that an increase in the level of confidence requires a larger
sample.
Example 5.7
Suppose study in Example 5.6 also intends to estimate the proportion of entry-level
accountants who want to enroll in a Masters program in the following year. The student
wants the estimate to be within .10 of the population proportion. The desired level of
confidence is 95% and no estimate of the population proportion is available. What
should be the sample size?
Solution:
Practice Exercises
1. A study is intended to determine the mean amount of time the teachers are
spending in watching television during weekdays. A pilot survey indicated that the mean
time per week is 10 hours with standard deviation of 3 hours. It is desired to estimate the
mean viewing time to be within 15 minutes (one-fourth hour). How many teachers should
be surveyed if the study will use a 95% level of confidence?
2. Suppose the President wants an estimate of the proportion of the adult population
who support the government’s policy on war against drugs and directs an agency to
conduct a survey. The President wants the estimate to be within .05 of the true
proportion. The President’s political allies who conducted their own survey estimated the
proportion supporting the policy to be 80% or .80. Assuming a 95% level of confidence:
143
Batangas State University
FUNDAMENTALS OF STATISTICS
Chapter 6
Fundamentals of Hypothesis
This chapter will introduce the students to the major topic of Inferential Statistics, using
sample statistics to estimate values of population parameters. The formal steps of
hypothesis testing will be discussed and how it will be used in testing claims about a
population using real-world problems. The students will be acquainted with the
assumptions of each statistical test and how each test may be used in testing claims about
a mean of small or large samples. At the end of the chapter the students are expected to
answer the practice exercises in order to reinforce the learning acquired.
Learning Objectives
The aim of this section is for students to calculate simple statistics in testing hypotheses
made about population parameters. To explain the assumptions of each statistical test
and be able to analyze the results as applied to real life situations.
The following statements are examples of hypotheses that can be tested by the procedures
presented in this chapter.
● A Human Resource researcher claims that the job satisfaction of an employee
working in a Manufacturing company affects their productivity.
● An insurance company claims that an average lifespan of an individual in this
generation is equal to 65.
● The teacher of a public school claims that using technology in teaching Statistics
will increase students' performance.
144
Batangas State University
FUNDAMENTALS OF STATISTICS
written with the symbol =, ≤, or ≥). For the mean the null hypothesis will be stated in only one of
the three possible forms.
Example:
Ho: The mean performance of General Engineering in Mathematics in the Modern
World is equal to 90. (Ho: μ=90) or
Ho: The mean performance of General Engineering in Mathematics in the Modern World
is less than or equal to 90. (Ho: µ ≤ 90) or
Ho: The mean performance of General Engineering in Mathematics in the Modern World
is greater than or equal to 90. (Ho: µ ≥ 90)
Definition 6.3 Alternative Hypothesis (denoted by Ha) is the statement that must
be true if the null hypothesis is false. For the mean, the alternative hypothesis will be
stated in only one of three possible forms.
Ha: The mean performance of General Engineering in Mathematics in the Modern World
is not equal to 90. (Ho: µ≠90) or
Ha: The mean performance of General Engineering in Mathematics in the Modern
World is greater than 90. (Ho: μ >90) or
Ha: The mean performance of General Engineering in Mathematics in the Modern
World is less than 90. (Ho: μ <90).
Definition 6.4 Critical Region is the set of all values of the test statistic that would
cause to reject the null hypothesis.
Two-Tailed, One tailed (Right -Tailed or Left Tailed) and Decision Rule
Two-tailed test
145
Batangas State University
FUNDAMENTALS OF STATISTICS
Right-tailed test
Left-tailed test
Definition 6.5 Critical Value or values that separate the critical region from the
values of the test statistic that would not lead to rejection of the null hypothesis. The
critical values depend on the nature of the null hypothesis, the relevant sampling
distribution, and the level of significance .
Example:
Suppose the null hypothesis is, Ho: An individual found at the crime scene was judged
innocent. If it is true and the sentenced is a death penalty, then the prosecutor is correct
on their judgment of releasing the convict.
Type I error: If the prosecutor thinks the convict is guilty then the convict will be
sentenced to a death penalty.
146
Batangas State University
FUNDAMENTALS OF STATISTICS
Type II error: If the prosecutor thinks the convict is innocent, when in fact the convict is
guilty, then he will be released from prison.
Between the two errors the type I errors pose a greater consequence than the type II error.
Start
Identify the specific claim or hypothesis to be tested and put it in symbolic form
Give the symbolic form that must be true when the original claim is false
Of the two symbolic expressions obtained so far, let the null hypothesis Ho be the one that contains the
condition of equality. Ha is the other statement
Select the significance level alpha based on the seriousness of a type I error. Make alpha small if the
consequence of rejecting a true Ho are severe. The values of .05 or .01 are very common
Identify the statistic that is relevant to this test and its sampling distribution
Determine the test statistic, the critical values, and the critical region. Draw a graph and include the test
statistic, critical value(s) and critical region
Reject Ho if the test statistic is in the critical region. Fail to reject Ho if the test statistic is not in the
critical region
Stop
147
Batangas State University
FUNDAMENTALS OF STATISTICS
Start
Yes
Do you The sample data
reject support the claim
Ho? that ...
(Reject Ho)
No
(Fail to
reject Ho)
There is not
sufficient sample
evidence to support
the claim that...
148
Batangas State University
FUNDAMENTALS OF STATISTICS
A. One-Sample Z-Test
A one-sample z-test is used to test whether a population parameter is
significantly different from some hypothesized value.
Example:
Test the claim that a population mean exceeds 40. You have a sample of 50 items for
which the sample mean is 42 and sample standard deviation is 8. Use a significance level
of .05.
Following the methods of hypothesis testing:
Solution: Given μ =40, n=50, mean= 42, s=8
Step 1: The population mean exceeds 40.
Step 2: Ho: μ > 40 , Ha: μ <40 (left-tailed)
Step 3: Significance level (α) is .05
Step 4: Use one-sample z-test since n>30, the distribution is assumed to be normal by
central limit theorem.
Step 5: Test statistic
149
Batangas State University
FUNDAMENTALS OF STATISTICS
Rejection
Region
-1.65
Step 7: Since the test statistics of 1.768 is higher than the critical value of -1.65, hence
Failed to Reject the Null hypothesis. There is no sufficient evidence to warrant that the
population mean exceeds 40.
Example :
Listed below are the waiting time (in minutes) for customers in order to be assisted by
bank employees:
3.5 4.3 5.7 10 5.8 6.2 7.4 8.2 9.4
sample mean= 6.722 s=2.207
150
Batangas State University
FUNDAMENTALS OF STATISTICS
The bank claims that the mean waiting time for customers is 6.0 mins. At .01
significance level, test the bank’s claim.
Rejection
Region Rejection
Region
-3.3554 +3.3554
Step 7: Since the test statistics of 2.341 is less than the absolute value of 3.3554 , hence
Fail to Reject the Null hypothesis. There is no sufficient evidence to warrant that the claim
of the bank that the mean waiting time of customers is 6.0 minutes.
151
Batangas State University
FUNDAMENTALS OF STATISTICS
Test Statistic for Test of Means, Varying Sample Size , Population Standard
deviation known or unknown
Example
n 40 40
79.6 84.2
s 12.4 12.2
152
Batangas State University
FUNDAMENTALS OF STATISTICS
!".! !
zc=(79.6-84.2)-0/ !"
+ (12.2! /40) = 1.13
Rejection
Rejection
Region
Region
-1.65 +1.65
Step 7: Since the test statistics of 1.13 is less than the absolute value of 1.65 , hence Fail to
Reject the Null hypothesis. There is no sufficient evidence to warrant rejection of the
claim that the mean difference is equal to zero; that there is no sufficient evidence to
warrant rejection of the claim that the training has no effect on the weight of the
participants.
153
Batangas State University
FUNDAMENTALS OF STATISTICS
Sample n Mean s2
A 10 200 50
154
Batangas State University
FUNDAMENTALS OF STATISTICS
B 10 185 25
Rejection
Region Rejection
Region
-2.1006 +2.1006
155
Batangas State University
FUNDAMENTALS OF STATISTICS
Step 7: Since the test statistics of 5.477 is higher than the absolute value of 2.1006 ,
hence Reject the Null hypothesis. There is sufficient evidence to warrant that the claim is
true. It appears that whether participants are in group A or group B does have an effect on
the variability of their score.
n 40 20
s 8.65 4.93
156
Batangas State University
FUNDAMENTALS OF STATISTICS
Rejection
Region Rejection
Region
-2.3924 +2.3924
Step 7: Since the test statistics of 1.896 is lower than the absolute value of 2.3924 , hence
Failed to Reject the Null hypothesis. There is no sufficient evidence to warrant that the
claim is true. The mean times of all long distance calls made in the two divisions is the
same.
Example:
157
Batangas State University
FUNDAMENTALS OF STATISTICS
Consider the paired sample data given below. The sample of pre training weights and the
sample of post training weights are dependent samples because each pair is matched
according to the person involved.
Subject A B C D E F
Before 99 62 74 59 70 73
After 94 62 66 58 70 76
The sample mean of the differences (d) is 1.83 and standard deviation of
differences is 3.97.
158
Batangas State University
FUNDAMENTALS OF STATISTICS
Rejection
Region Rejection
Region
-2.5706 +2.5706
Step 7: Since the test statistics of 1.13 is less than the absolute value of 2.5706 , hence
Fail to Reject the Null hypothesis. There is no sufficient evidence to warrant rejection of
the claim that the mean difference is equal to zero; that there is no sufficient evidence to
warrant rejection of the claim that the training has no effect on the weight of the
participants.
Chapter Test
I. Identification
1. A statement or prediction of the relationship between or among variables.
2. A set of values of the test statistic that is chosen before the experiment to define the
conditions under which the null hypothesis will be rejected.
3. The test statistic for independent samples when population variances are known.
4. The rejection of the null hypothesis when in fact it is true.
5. The acceptance of null hypothesis when it is false.
6. It is used when the critical region is located on both sides of the distribution or range of
values for the test statistic.
159
Batangas State University
FUNDAMENTALS OF STATISTICS
7. An assertion that does not indicate as to whether the difference falls within the positive
or negative end of the distribution.
8. These merely imply that there is no sufficient statistical evidence to believe otherwise.
9. This kind of statistics is concerned more with generalizing information or making
inferences about population.
10. It is used when the critical region is located at only one extreme of distribution or
range of values for the test statistic.
90 85 80 78
89 86 82 76
88 84 83 75
94 83 81 77
93 88 80 75
160
Batangas State University
FUNDAMENTALS OF STATISTICS
2. A leading brand of powdered orange juice claims that the Vitamin C content of their
product is 60 mg per serving on the average. To test the claim 9 samples were analyzed at
random by a group of Biochemistry students and yielded the following results.
Determination 1 2 3 4 5 6 7 8 9
No.
3. A recent survey found out that high school students spend an average of 6.8 hours per
week watching television. A random sample of 36 high school students revealed that the
mean number of hours they watched TV during the past week is 6.2 hours with a standard
deviation of 0.5 hours. Test the hypothesis that the mean number of hours spent by the
high school students is not significantly lower than 6.8 hours. Use a 0.05 level of
significance.
4. Test the hypothesis that the average content of containers of particular lubricant is 10
liters if the contents of a random sample of 10 containers are 10.2, 9.7, 10.1, 10.3, 10.1,
9.8, 9.9, 10.4, 10.3, and 9.8 liters. Use a 0.01 level of significance and assume that the
distribution of contents is normal.
5. In a study conducted at the Virginia Polytechnic Institute and State University, the
plasma ascorbic acid levels of pregnant women were compared for smokers versus
nonsmokers. Thirty-two women in the last three months of pregnancy, free of major
health disorders, and ranging in age from 15 to 32 years were selected for the study. Prior
to the collection of 20 ml of blood, the participants were told to avoid food high in
ascorbic acid content. From the blood samples, the following plasma ascorbic acid values
of each subject were determined in milligrams per 100 milliliters.
161
Batangas State University
FUNDAMENTALS OF STATISTICS
Nonsmokers Smokers
0.97 0.48
0.72 0.71
1.0 0.98
0.81 0.68
0.62 1.18
1.32 1.36
1.24 0.78
0.99 1.64
0.74 1.24
0.88 1.18
0.94
1.16
0.86
0.85
0.58
0.57
0.64
0.98
1.09
0.92
0.78
162
Batangas State University
FUNDAMENTALS OF STATISTICS
Chapter 7
The Chi-square distribution
The Chi-Square (𝜒2) Test is a distribution free or a non-parametric test. It is used
to test significance for data presented in frequencies or nominal forms.
Learning Objectives
The aim of this section is for the students to test significance for data presented in
frequencies or nominal forms.
𝜒2 = 𝛴[(O – E)2 / E]
where:
O = observed frequency
E = expected frequency
Degrees of Freedom(df)
df = k – 1
where :
In comparing the Chi-square value with the Chi-square tabular, refer to the table in the
appendix .
Example 1.
163
Batangas State University
FUNDAMENTALS OF STATISTICS
The Librarian of a certain University decided to find out whether the Mathematics
books were equally borrowed throughout the day. Apply the steps in hypothesis testing
using 0.01 level of significance.
Algebra 8
Calculus 5
Probability 10
Statistics 10
Trigonometry 12
Solution:
S3. At a = 0.01
df = k – 1
=5–1
=4
1 8 9 -1 1 0.11
2 5 9 -4 16 1.78
3 10 9 1 1 0.11
4 10 9 1 1 0.11
164
Batangas State University
FUNDAMENTALS OF STATISTICS
5 12 9 3 9 1
𝜒2c = 3.11
Ho is accepted.
H1 is rejected.
S6. Interpretation
The test of independence is also called a test of proportion or a two-way contingency table
with rows and columns must be considered. This is used when two- dimension variables
are involved. Each variable consists of two or more categories. The formula is given by,
𝜒2 = 𝛴[(O – E)2 / E]
where:
O = observed frequency
E = expected frequency
E = (TR x TC) / T
where:
TR = total rows
TC = total columns
df = (R – 1)(C – 1)
where:
R = number of rows
C = number of columns
165
Batangas State University
FUNDAMENTALS OF STATISTICS
Example 1.
O 2 5 7
VS 28 30 58
S 16 12 28
US 3 1 4
Total 49 48 97
Formulate the null and the alternative hypotheses, then use the steps in hypothesis
testing at 5% level of significance.
Solution:
Ho: There is no significant relationship in the civil status and performance rating of
teachers.
H1: There is a significant relationship in the civil status and performance rating of
teachers.
S3. At a = 0.05
df = (R – 1)(C – 1)
= (4 – 1)(2 – 1)
= (3)(1)
=3
166
Batangas State University
FUNDAMENTALS OF STATISTICS
𝜒2c = 2.9161
Ho is accepted.
H1 is rejected.
S6. Interpretation
There is no significant relationship in the civil status and performance rating of teachers.
Thus, the civil status does not affect the performance rating of teachers.
Practice Exercises
Consider the following situations below. Apply the steps in hypothesis testing at a
specified level of significance.
1. A sales agent sells three models of house. In a recent sales period, he sold 21 units
of row houses, 32 units of bungalow houses, and 29 units of a two-storey house. At
= 0.01, find out whether the home owners (buyers) have the same preference for the
three models.
2. a) The 25 coated peanuts of five different colors such as green, orange, purple, red,
and yellow are placed in a canister. At random, a coated peanut is picked 100 times with
replacement and its color is observed. The results are as follows:
Colors Frequency
Green 20
Orange 18
Purple 15
Red 17
167
Batangas State University
FUNDAMENTALS OF STATISTICS
Yellow 30
Determine whether the following coated peanuts of 7 green, 8 orange, 3 purple, 2 red, and
5 yellow are inside the canister at = 5%.
Face Frequency
1 23
2 17
3 53
4 36
5 24
6 27
4. The number of students who passed and failed an examination given to classes A
and B are given below. Is there any difference in the performance of two classes at 0.05
level of significance?
30 35 Passed
10 15 Failed
5. Random samples of students are chosen from the public high school and the
parochial high school of a certain community. These are then classified into five
socioeconomic classes according to the parent’s occupation. The 30 students from the
parochial school included 2 whose fathers were classified professional or managerial, 0
semi-professional, 12 skilled workers, 14 semi-skilled, and 2 unskilled. The 60 students
from the public school were classified 4 professional or managerial, 9 semi-professional,
18 skilled workers, 22 semi-skilled, and 7 unskilled. Are the students from public and
parochial high schools different in terms of socioeconomic classes according to parents’
occupation at 1% level of significance?
168
Batangas State University
FUNDAMENTALS OF STATISTICS
Chapter 8
The F- distribution
Comparison of two population means and variances have learned in the Measure of
Difference using t-Test. Hence, researchers often need to compare more than two
population means. Like the need to compare or evaluate different teaching methods,
product designs, market strategies, etc. In this case, it is not advisable to do comparisons
by taking the samples two at a time, i.e.; if there are 5 samples, then 10 tests are needed to
conduct. Moreover, standard deviation for the difference between two sample means
should be considered by pairs.
At this point; instead of using comparison in pairs for achieving the purpose of comparing
several populations, an analysis of variance can be considered for which a single test is
done. Analysis of variance is a technique in inferential statistics designed to test whether
or not more than two samples or groups are significantly different from each other. This
test is done simultaneously taking the samples all at a single time. It was developed by Sir
Ronald A. Fisher. The F-test used in analysis of variance(ANOVA) is named after him. It
was first used for agricultural research. Today, it is applicable to almost any field of
discipline. In this chapter, one-way analysis of variance(ANOVA 1) or F-test is the focus of
discussion.
Learning Objectives
The aim of this section is for the students to learn and use by comparison in pairs for
achieving the purpose of comparing several populations in a single test.
The one-way analysis of variance (ANOVA 1) or F-test is used when there is only one
category being considered as an independent variable. A hypothesis that can be tested is a
null hypothesis in which there is no significant difference among the samples. The
formula used in this test is given by,
F = MSSb / MSSw
where:
169
Batangas State University
FUNDAMENTALS OF STATISTICS
MSSb = SSb/dfb
MSSw = SSw/dfw
where:
dfT = RC – 1
dfb = C – 1
where:
R = number of rows
C = number of columns
where:
where:
Between column
Within column
Total
171
Batangas State University
FUNDAMENTALS OF STATISTICS
Example 1.
1 80 66 86
2 86 71 91
3 88 86 96
4 92 76 94
Establish the null and the alternative hypotheses by employing the steps in
hypothesis testing at a = 5%.
Solution:
Ho: There are no significant differences in the scores obtained by the 3 teams.
H1: There are significant differences in the scores obtained by the 3 teams.
S3. At a = 0.05
dfb = C – 1
=3–1
=2
dfT = RC – 1
= 4(3) – 1
= 12 – 1
= 11
= 11 – 2
=9
172
Batangas State University
FUNDAMENTALS OF STATISTICS
= 86302 – (1012)2/12
= 86302 – 85345.33
= 956.67
= 343806/4 – 85345.33
= 85951.5 – 85345.33
= 606.17
173
Batangas State University
FUNDAMENTALS OF STATISTICS
= 956.67 – 606.17
= 350.50
dfT = RC – 1
= 4(3) – 1
= 12 – 1
= 11
dfb = C – 1
=3–1
=2
= 11 – 2
=9
MSSw = SSw/dfw
= 350.50/9
= 38.94
MSSb = SSb/dfb
= 606.17/2
= 303.09
Fc = MSSb/MSSw
= 303.09/38.94
Fc = 7.78
174
Batangas State University
FUNDAMENTALS OF STATISTICS
S5. Decision
Ho is rejected.
H1 is accepted.
S6. Interpretation
There are significant differences in the scores obtained by the 3 teams. Thus, the team
with the highest score is considered the winner.
Practice Exercises
Consider the following situations below. Apply the steps in hypothesis testing at a
specified level of significance.
1. Four groups of 3 players each were having their bowling competition. Listed below
are their bowling scores. Determine whether there is unusual variation among the 4
groups at 1% level of significance.
Player Group 1 Group 2 Group 3 Group 4
1 92 94 81 84
2 72 89 86 87
3 87 84 99 89
175
Batangas State University
FUNDAMENTALS OF STATISTICS
2. Enumerated are the mileage obtained after several road tests were run using 5
different brands of gasoline on a certain automobile car. (Use = 0.05)
Test Brand A Brand B Brand Brand Brand E
C D
1 32 58 35 62 53
2 28 60 51 57 66
3 39 47 41 54 67
4 45 39 57 52 47
3. Use = 0.01 to find the significant differences in the book allowance received by
the group of 8 college students from 3 different year levels during the first
semester.
Year Level
I II III
176
Batangas State University
FUNDAMENTALS OF STATISTICS
23 8 7 11 1
13 29 8 0 0
11 29 7 3 0
9 28 4 9 0
11 22 7 10 0
177
Batangas State University
FUNDAMENTALS OF STATISTICS
Chapter 9
Linear Regression and Correlation
Learning Objectives
The aim of this section is for students to explain the direction and strength of a linear
correlation between two factors, be able to calculate the correlation coefficient,
simple linear regression equation and the coefficient of determination, and
analyze the results of test for significance.
Definition 9.1.1 A correlation exists between two variables when one of them is
related to the other in some way.
Assumptions
1. The sample of paired (x,y) data is a random sample
2. The pairs of (x,y) data have a bivariate normal distribution.
Definition 9.1.2 The linear correlation coefficient r measures the strength of the
linear relationship between the paired x and y values in a sample.
A scatter plot displays the strength, direction, and form of the relationship between two
quantitative variables. A correlation coefficient measures the strength of that
relationship. Calculating a Pearson correlation coefficient requires the assumption that
the relationship between the two variables is linear.
178
Batangas State University
FUNDAMENTALS OF STATISTICS
https://www.westga.edu/academics/research/vrc/assets/docs/scatterplots_and_correlatio
n_notes.pdf
Facts about Correlation
1. The order of variables in a correlation is not important
2. Correlations provide evidence of association not causation
3. r has no units and does not change when the units of measure of x , y or both are
changed
4. positive r values indicates positive association between the variables, and negative
r values indicate negative associations
5. The correlation r is always a number between -1 and 1
179
Batangas State University
FUNDAMENTALS OF STATISTICS
Pearson r: Assumptions
+1.0 Perfect
0 No Association
180
Batangas State University
FUNDAMENTALS OF STATISTICS
Example:
A study was conducted to investigate the effects of students’ performance in their basics
subjects to their performance in their major subjects.
1 89 83
2 78 75
3 92 89
4 83 80
5 87 82
6 94 88
Solution:
2 2
Student Basic (x) Major (y) xy
r= 6(43471)-(523)(497)
181
Batangas State University
FUNDAMENTALS OF STATISTICS
= 895/(32.388)(28.443)
The result shows that students who have a high performance in their basic
subjects tend to also have a high performance in their major subjects.
R-squared (R2) is a statistical measure that represents the proportion of the variance for a
dependent variable that's explained by an independent variable or variables in a
regression model.
The coefficient of determination is such that 0 < r 2< 1, and denotes the strength of the
linear association between x and y.
9. .2 REGRESSION
The most commonly used form of regression is linear regression, and the most common
type of linear regression is called ordinary least squares regression.
Linear regression uses the values from an existing data set consisting of measurements of
the values of two variables, X and Y, to develop a model that is useful for predicting the
value of the dependent variable, Y for given values of X.
● Y is the value of the Dependent variable (Y), what is being predicted or explained
● a or Alpha, a constant; equals the value of Y when the value of X=0
● b or Beta, the coefficient of X; the slope of the regression line; how much Y changes
for each one-unit change in X.
● X is the value of the Independent variable (X), what is predicting or explaining the
value of Y
182
Batangas State University
FUNDAMENTALS OF STATISTICS
● e is the error term; the error in predicting the value of Y, given the value of X (it is
not displayed in most regression equations).
!)( ! ! !( !) ( !")
a = ! ( ! ! )! ( !)!
! !" !( !) ( !)
b = ! ( ! ! )! ( !)!
Example:
Given:
3522 0.20
3597 0.22
4171 0.23
4258 0.29
183
Batangas State University
FUNDAMENTALS OF STATISTICS
3993 0.31
3971 0.33
4042 0.33
4053 0.32
Solution:
Y = a + bX +e
!)( ! ! !( !) ( !")
a = ! ( ! ! )! ( !)!
! !" !( !) ( !)
b = ! ( ! ! )! ( !)!
184
Batangas State University
FUNDAMENTALS OF STATISTICS
= .00012
y= -0.2102 + .00012X
The linear model reflects a positive effect of cigarette consumption to the
psychiatric admissions of individuals. For every capita of cigarettes consumed by an
individual there is an increase of .00012 percentage points in psychiatric admissions.
To find the predicted percentage of psychiatric admissions given per capita
cigarette consumption of 3650 (equivalent to 10 cigarettes per day
Given x=3650, y=?
y = -.2102 + .0012 (3650) = .2415 percentage points in psychiatric admissions
2. A teacher would like to determine whether the students score in Algebra has an effect
on their scores in Calculus. What will be the estimated score in Calculus if the student got
a score of 25 in Algebra?
1 17 73
2 21 66
3 11 64
4 16 61
5 15 70
6 11 71
7 24 90
8 27 68
9 19 84
10 8 52
Solution:
Y = a + bX +e
!)( ! ! !( !) ( !")
a = ! ( ! ! )! ( !)!
185
Batangas State University
FUNDAMENTALS OF STATISTICS
! !" !( !) ( !)
b = ! ( ! ! )! ( !)!
10 8 52 416 64 2704
y = 52.6898 + 1.018 X
Given that the student got a score of 25 in Algebra the estimated score in Calculus
is: y= 52.6898 + 1.018 ( 25) = 78.1398
Practice Exercises
186
Batangas State University
FUNDAMENTALS OF STATISTICS
1. Definition: The average annual percent change in the population, resulting from a
surplus (or deficit) of births over deaths and the balance of migrants entering and leaving
a country. The rate may be positive or negative. The growth rate is a factor in determining
how great a burden would be imposed on a country by the changing needs of its people for
infrastructure (e.g., schools, hospitals, housing, roads), resources (e.g., food, water,
electricity), and jobs. Rapid population growth can be seen as threatening by neighboring
countries.
http://www.indexmundi.com/philippines/population_growth_rate.html#sthash.ENIsbs
IW.dpuf
Country 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012
Philippines 2.07 2.03 1.99 1.92 1.88 1.84 1.8 1.76 1.99 1.96 1.93 1.9 1.87
Country 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012
Japan 0.18 0.17 0.15 0.11 0.08 0.05 0.02 -0.09 -0.14 -0.19 -0.24 -0.28 -0.08
Country 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012
Australia 1.02 0.99 0.96 0.93 0.9 0.87 0.85 0.82 1.22 1.2 1.17 1.15 1.13
a. Construct a scatter plot diagram of the population growth rate of the Philippines,
Japan and Australia. Explain the trend as revealed in the scatter plot diagram.
2. Definition of Inflation rate (consumer prices): This entry furnishes the annual
percent change in consumer prices compared with the previous year's consumer prices.
Inflation is when the prices of most goods and services continue to creep upward. When
this happens, your standard of living falls. That's because each peso buys less, so you have
to spend more to get the same goods and services.
If inflation is mild, it can actually spur further economic growth. If prices rise slowly and
gradually, it can encourage people to buy now and avoid future price increases. This
187
Batangas State University
FUNDAMENTALS OF STATISTICS
increases demand, driving further economic growth. In this way, a healthy economy can
usually sustain a 2% inflation rate.
Country 1999 200 200 200 200 200 200 200 200 200 200 201 2011
0 1 2 3 4 5 6 7 8 9 0
Philippines 6.8 5 6 3.1 3.1 5.5 7.6 6.2 2.8 9.3 3.2 3.8 4.8
Definition of Birth rate: This entry gives the average annual number of births during a
year per 1,000 persons in the population at midyear; also known as crude birth rate. The
birth rate is usually the dominant factor in determining the rate of population growth. It
depends on both the level of fertility and the age structure of the population.
Country 2000 2001 2002 200 200 2005 2006 2007 2008 2009 2010 2011 2012
3 4
Philippines 27.85 27.37 26.88 26.3 25.8 25.31 24.89 24.48 26.42 26.01 25.68 25.34 24.98
Definition of Industrial production growth rate: This entry gives the annual
percentage increase in industrial production (includes manufacturing, mining, and
construction
Country 1999 200 200 200 200 200 200 200 200 201 2011
0 3 4 5 6 7 8 9 0
b. Relate the following economic indicators namely, inflation rate, birth rate, and
industrial production growth rate to the GDP per capita of the Philippines. Give your
insights.
188
Batangas State University
FUNDAMENTALS OF STATISTICS
EXERCISE 9.2
1. A study was made on the amount of converted sugar in a certain process at various
temperatures. The data were coded and recorded as follows
1.0 8.1
1.1 7.8
1.2 8.5
1.3 9.8
1.4 9.5
1.5 8.9
1.6 8.6
1.7 10.2
2. A study was made by Citimart Incorporation to determine the relation between their
weekly advertising expenditures and sales. The following data were recorded.
40 385
20 400
25 395
20 365
30 475
50 440
40 490
20 420
50 560
189
Batangas State University
FUNDAMENTALS OF STATISTICS
40 525
25 480
50 510
2. The marketing manager of a large supermarket chain would like to use shelf space
to predict the sales of goods. A random sample of 10 equal-sized stores groceries is
selected, with the following results:
2 5 2.2
3 5 1.4
4 10 1.9
5 10 2.4
6 10 2.6
190
Batangas State University
FUNDAMENTALS OF STATISTICS
7 15 2.3
8 15 2.7
9 15 2.8
10 20 2.6
b. Use the least square method to find the regression coefficients a and b
(y=ax+b).
d. Predict the weekly sales (in hundreds of pesos) of pet food for stores with 8
feet of shelf space for pet food.
4. The following data represent the value of exports and imports in from year
2001-2010 in the Philippines for various countries:
191
Batangas State University
FUNDAMENTALS OF STATISTICS
b. Compute for r
c. What conclusion can you reach about the relationship between exports and
imports.
Practice Exercises
1. The following data represents the value of exports and imports of the Philippines
from 2001-2005. Compute for the correlation coefficient r . What conclusion can be
made on the effect of exports to imports? ( in thousands). Use the data analysis in
excel application.
192
Batangas State University
FUNDAMENTALS OF STATISTICS
What is the predicted productivity of an employee given the increase in salary (in
thousands Php?
416 11.9
375 7.3
237 10.6
207 22.9
200 6.5
193 15.2
193
Batangas State University
FUNDAMENTALS OF STATISTICS
156 18.2
155 21.7
140 31.5
38 4
42 3
29 11
31 5
28 9
15 6
24 14
17 9
19 10
11 15
8 19
19 17
3 10
14 14
6 18
194
Batangas State University
FUNDAMENTALS OF STATISTICS
SEMESTRAL PROJECT
Republic Act (RA) No. 10625 or the Philippine Statistical Act of 2013 mandates the
Philippine Statistics Authority (PSA) to prepare, in consultation with the PSA Board, a
Philippine Statistical Development Program (PSDP). Specifically, section 24 of RA 10625
states that the PSDP shall consist of all statistical activities to be undertaken by the
Philippine Statistical System (PSS) in response to the requirements of government
planning and policy formulation. Part of the goals of the PSDP is to provide adequate,
timely, reliable and relevant statistics for evidence-based decision making. It also intends
to increase awareness, understanding, appreciation, and trust of the general public in
statistics. Some of the outputs of PSDP are Demographic and Social Statistics, Economic
Statistics, Environment and Multi-domain Statistics.
Official statistics are numerical data-sets, produced by official governmental
agencies mainly for administrative purposes, including the Census, crime figures, health
data, income and employment rates, as well as those based on government-sponsored
social surveys. Official statistics comply with international classifications and
methodologies and meet the principles of impartiality, reliability, relevance, cost-
effectiveness, confidentiality and clarity.
Students enrolled in Stat 101 will be required to submit a statistical report applying
various statistical concepts and methodologies using Official Statistics. The statistical
report is a way of presenting large amounts of data in a convenient form. Hence, students
will be applying their statistical analysis skills, learn methods and tools, and skill of
writing to make the report readable.
195
Batangas State University
FUNDAMENTALS OF STATISTICS
Date due:
Percent equivalent in the final grade: 20%
Task:
Prepare a statistical report utilizing Official Statistics in the Philippines. The final report
will be presented in the class for evaluation.
Specific Guidelines
1. Begin with collecting data in the PSA website, Philippine Statistical Yearbook
https://psa.gov.ph/products-and-services/publications/philippine-statistical-
yearbook
2. Prepare the statistical Report
2.1 Introduction of the Statistical Report
In the Introduction, you should explain why you took this topic. If you wanted
to answer some questions or prove some hypotheses, mention this. Also, give a
description of the data collected. Mention also the importance of your work in
this context.
2. 4 Conclusion
196
Batangas State University
FUNDAMENTALS OF STATISTICS
Here you give a summary of your results and explain their meaning
and context in your study. You need to mention also if you reject or
fail to reject your hypothesis.
2.5 Bibliography
● Use APA Citation Style to format references in your critique, and be sure to
cite page numbers for all quoted passages. Also see the web link:
http://www.apastyle.org/.
2.6 Appendix
Present the computations used in the statistical analysis.
Evaluation Criteria:
Use the evaluation criteria below as a checklist for ensuring that you meet the assignment
requirement before you submit your report.
1. Do the tables and graphs presented are complete and consistent with the
obtained data and information?
2. Is your description of the tables and graphs consistent with the values
presented?
197
Batangas State University
FUNDAMENTALS OF STATISTICS
Appendix
198
Batangas State University
FUNDAMENTALS OF STATISTICS
199
Batangas State University
FUNDAMENTALS OF STATISTICS
200
Batangas State University
FUNDAMENTALS OF STATISTICS
Continuation f distribution
Degrees Degrees of freedom between columns
of
freedom
within 1 2 3 4 5 6 7 … ∞
columns
201
Batangas State University
FUNDAMENTALS OF STATISTICS
Continuation f distribution
Degrees Degrees of freedom between columns
of
freedom
within
columns 1 2 3 4 5 6 7 … ∞
202
Batangas State University
FUNDAMENTALS OF STATISTICS
Freedom (df) 0.995 0.99 0.978 0.95 0.90 0.10 0.05 0.025 0.01 0.005
1 0.000039 0.00016 0.00098 0.0039 0.0158 2.71 3.84 5.02 6.63 7.88
2 0.0100 0.0201 0.0506 0.1026 0.2107 4.61 5.99 7.38 9.21 10.60
3 0.0717 0.115 0.216 0.352 0.584 6.25 7.82 9.49 11.34 12.84
4 0.207 0.297 0.484 0.711 1.064 7.78 9.35 11.14 13.28 14.86
5 0.412 0.554 0.831 1.15 1.61 9.24 11.07 12.83 15.09 16.75
6 0.676 0.872 1.24 1.64 2.20 10.64 12.59 14.45 16.81 18.55
7 0.989 1.24 1.69 2.17 2.83 12.02 14.07 16.01 18.48 20.28
8 1.34 1.65 2.18 2.73 3.49 13.36 15.51 17.53 20.09 21.96
9 1.73 2.09 2.70 3.33 4.17 14.68 16.92 19.02 21.67 23.59
10 2.16 2.56 3.25 3.94 4.87 15.99 18.31 20.48 23.21 25.19
11 2.60 3.05 3.82 4.57 5.58 17.28 19.68 21.92 24.73 26.76
12 3.07 3.57 4.40 5.23 6.30 18.55 21.03 23.34 26.22 28.30
13 3.57 4.11 5.01 5.89 7.04 19.81 22.36 24.74 27.69 29.82
14 4.07 4.66 5.63 6.57 7.79 21.06 23.68 26.12 29.14 31.32
15 4.60 5.23 6.26 7.26 8.55 22.31 25.00 27.49 30.58 32.80
16 5.14 5.81 6.91 7.96 9.31 23.54 26.30 28.85 32.00 34.27
18 6.26 7.01 8.23 9.39 10.86 25.99 28.87 31.53 34.81 37.16
20 7.43 8.26 9.59 10.85 12.44 28.41 31.41 34.17 37.57 40.00
203
Batangas State University
FUNDAMENTALS OF STATISTICS
24 9.89 10.86 12.40 13.35 15.66 33.20 36.42 39.36 42.98 45.56
30 13.79 14.95 16.79 18.49 20.60 40.26 43.77 46.98 50.89 53.67
40 20.71 22.16 24.43 26.51 29.05 51.81 55.76 59.34 63.69 66.77
60 35.53 37.48 40.48 43.19 46.46 74.40 79.08 83.30 88.38 91.95
120 83.85 86.92 91.58 95.70 100.62 140.23 146.57 152.21 158.95 163.64
204
Batangas State University