Download as pdf or txt
Download as pdf or txt
You are on page 1of 55

Statistical thinking will one day be as

necessary for efficient citizenship as the

ability to read and write.

Chapter One
Statistics, for some time now, I keep saying the sexy
job in the next ten years
has been tagged as the sexiest
will be statisticians.
field there is. This owes it to the - Varian, 2009
various disciplines the field can
transcend to. With utmost value
for data, the field surely reaches
all. All of us need processed
data, called information, to
further analyze, make plans,
and ultimately decide.

[Statistics|101] Introduction
Learning Objectives
By the end of this module, each student should be able:

To define and explain the basic concepts of Statistics

To know and differentiate the different fields of Statistics
To be familiar with the many applications of Statistics
To follow properly the steps in a statistical inquiry
To have an appreciation or interest in the field of

[Statistics|101] Introduction
In its singular sense, In its plural sense,

Statistics is the art and Statistics is a set

science of collection, of numerical
organization, analysis, data.
interpretation, and
presentation of data.

[Statistics|101] Introduction
DATA empowers!
massive data pertinent
volumes information

Data Decision
Processing Making

[Statistics|101] Introduction
Statistics is the most important science in the whole world; for upon it
depends the practical application of every science and of every art; the
one science essential to all political and social administration, all
education, and all organization based on experience, for it only gives
results of our experience. F l o r e n c e N i g h t i n g a l e

Welcome to the world of Statistics

of numbers, of meaning, of proper decision making.

S t a t i s t i c s i s t h e a r t a n d s c i e n c e o f u n c e r t a i n t y.

[Statistics|101] Introduction
The original meaning of the word statistics is science of states, and in its early
existence it was also called political arithmetic. Although the use of the term
started as late as the 18th century, the practice of collecting and analyzing data
dates back to the early biblical times.

It was then first used in statisticum collegium (Modern Latin) meaning lecture
course on state affairs. The first statisticians are statistas (Italian) which means
one skilled in statescraft or the politicians. In 1770, it was formally defined as a
science dealing with data about the condition of a state or a community
(political arithmetic). The German word statistik was coined by political scientist
Gottfried Aschenwall (1719-1772) in his paper Vorbereitung zur
Staatswissenschaft. At present, the applications of statistics have expanded
from the political science to almost all fields of knowledge.

[Statistics|101] Introduction
probability theory
Probability theory was inspired mainly by games of chance. The first mathematical analysis of
games of chance were undertaken by Italian mathematicians in the 16th century. The main results
were initially obtained by Gerolamo Cardano (1501-1576) about 1565. There were many wrong
propositions in this book. Although Cardano later realized the errors and eventually corrected
them, he didnt remove thestatements in the book. This book false was never published because
Cardano kept it secret. Accused of being a heretic in 1570, he was arrested, dismissed and denied
the rights to lecture publicly and to have his books printed.

The theory of probability had already been discovered by Pierre de Fermat (1601-1665), Blaise
Pascal (1623-1662), and Christiaan Huyghens (1629-1695) by the time Cardanos book was
rediscovered almost a hundred years later. C. de Mere, a notorious gambler who was a close
friend of Pascal, initiated the study of probability theory by mathematicians by posing a gambling
problem that offered the mathematicians a real challenge. This led to a lengthy correspondence
between Fermat and Pascal, and brought about not only a solution to the problem at hand, but
to other more general problems as well. In 1655, the young Dutchman, Huyghens, learned of the
results of Fermat and Pascal not the proofs and arguments, and he wrote a short book on
probability theory called How to Reason in Dice Games. It is fair to say that this book represents
the real beginning of probability theory as a mathematical subject.

[Statistics|101] Introduction
inferential statistcs
The successful development of probability theory did not immediately lead to a
theory of inferential statistics. In 1346, the world faced the most infectious and lethal
disease of all times the black plague. After that time, the plague occurred regularly
until 1712. Statistics (or political arithmetic as it was named in its first years of existence)
was then the art of deducing estimates and properties of quantities which cannot be
observed directly. The pioneer seems to be an English tradesman and haberdasher of
small wares, John Graunt (1620-1674). In 1661, he published a book called Natural and
Political Observations upon Bills of Mortality. He applied the bills to estimate birth
mortality, the number of inhabitants, the number of years to recover the former
population level after a plague epidemic, etc. His methods were bold but dubious, but
surprisingly were in accordance with later and more reliable observations. Soon the
mathematicians involved in the study of probability theory took up the challenge to
invent rigorous methods to estimate unknown quantities, particularly to compute
reliable life-tables, which became an important tool for the emerging life insurance

[Statistics|101] Introduction
Population is the collection of all
elements under consideration in a
statistical inquiry.

Sample is a part or a subset of the

population from which information is

[Statistics|101] Introduction



[Statistics|101] Introduction
A manufacturer of kerosene heaters wants to determine if customers are
satisfied with the performance of their heaters. Towards this goal, 5,000
of his 200,000 customers are contacted and each is asked, Are you
satisfied with the performance of the kerosene heater you purchased?

Population: set of all customers of the kerosene heater manufacturer

Sample: set of all contacted customers of the kerosene heater


[Statistics|101] Introduction
Another example
Research Problem:
What is the average expenditure of households in Metro Manila?

Population: set of all households in Metro Manila

Sample: set of barangays in Metro Manila

Sample: set of all households in Makati and Quezon City
Sample: set of selected streets in Metro Manila
Sample: _________________________________________________

[Statistics|101] Introduction
The elements of the population can be individuals, objects, animals,
geographic areas, and so on.

One may think: Why do we have to get a sample from the population
when the population is always available? Sampling is required,
recommended, in some cases even inevitable because of at least two
reasons: studying the whole population is very costly in terms of time
and resource and sometimes it is infeasible to study the whole

Given the fact that we are only dealing with the sample, how can we
be sure that the inferences about the population are close to the true
value or scenario of the population? The whole Statistics process is
quantitatively and rigorously done in such a way error is minimized or
measured. Statistics has its foundations in the theory of probability.

[Statistics|101] Introduction

Variable is a characteristic or attribute of

the elements in a collection that can
assume different values for the different

Observation is a realized value of a


Data is the collection of observations.

[Statistics|101] Introduction
height F 1.63 M 1.47
sex (in m) F 1.63

0 blue 1 red
no. of favorite
children color 0 blue
Stat 20 Stat 26
program age BS Stat 20
M 2.32 F 2.06

0 pink 0 green

Econ 19 History 20

[Statistics|101] Introduction
Variable Possible Observations

S = sex of a student Male, Female

N = number of members in a household n = 0,1,2,3,...

H = height (in cm) of a basketball player h>0

[Statistics|101] Introduction
The Office of Admissions is studying the relationship between the score
in the entrance examination during application and the general
weighted average (GWA) upon graduation among graduates of the
university from 2000 to 2005.

Population: set of all graduates of the university from the years

2000 to 2005

Variables of interest: score in the entrance examination and

general weighted average (GWA)

[Statistics|101] Introduction
A summary measure is a single numeric
figure that describes a particular feature
of the whole collection.

Regardless of whether we are using data collected from every

element of the population or data from a sample, it would still
be difficult to understand what all these numeric figures
convey. To give meaning to these numbers, it is necessary to
summarize and condense the information contained in this
collection of observations into a single numeric figure that
describes a particular feature of the whole collection.

[Statistics|101] Introduction
Parameter is a summary measure describing
a specific characteristic of the population. It is
computed using population data.

Statistic is a summary measure describing a

specific characteristic of the sample. It is
computed using sample data.

[Statistics|101] Introduction


[Statistics|101] Introduction
A manufacturer of kerosene heaters determined that 172,000 customers
out of the 200,000 were satisfied with the performance of their heaters. Its
competitor company wants to verify this claim by asking a sample of
10,000 customers the same question Are you satisfied with the
performance of the kerosene heater you purchased? It was revealed that
6,450 customers were satisfied.

Parameter: proportion of satisfied customers in the population

of the manufacturer

P = 172,000 / 200,000 = 0.86

Statistic: proportion of satisfied customers in the sample of the


p = 6,450 / 10,000 = 0.645

[Statistics|101] Introduction
Mr. Donaldo Chan, a candidate for Vice Mayor in Orion, Bataan,
wants to find out if there is a need to intensify his campaign efforts
against his opponents. He requested the services of a group of
students to interview 1,000 of the 3,000 registered voters of Orion,
Bataan. The survey results showed that 75% of the 1,000 voters in the
sample will vote for him as Vice Mayor.

a. Identify the population and the sample.

b. Identify the variable of interest.
c. Identify the parameter and the statistic.

[Statistics|101] Introduction
The average weekly allowance of students last year at a private high
school was Php 600.00 per week, based on an enrollment of 1,080
students. The third year students who did not have this information
interviewed 50 students and found their average weekly allowance
last year to be Php 550.00.

a. Identify the population and the sample.

b. Identify the variable of interest.
c. Identify the parameter and the statistic.

[Statistics|101] Introduction
There are two major areas or fields of Statistics.

Mathematical and
Statistics Statistics

[Statistics|101] Introduction
Mathematical (or Theoretical) Statistics is
concerned with the development of the
mathematical foundations of the methods
used in Applied Statistics.

The study of mathematical statistics permits us to

understand the rationale behind the methods we use in
analysis and to establish new theories that will validate
the use of new statistical methods or modifications of
existing statistical methods in solving research problems
that are more complex.

[Statistics|101] Introduction
Applied Statistics is concerned with the
procedures and techniques used in the
collection, presentation, organization,
analysis, and interpretation of data.

The study of applied statistics allows us to select and

properly implement the most appropriate statistical
methods that will provide solutions to the research

[Statistics|101] Introduction

In A Mu

Whats your
favorite question?

[Statistics|101] Introduction of the Course

It is further divided into two major areas.

Descriptive and Inferential

Statistics Statistics

[Statistics|101] Introduction
Descriptive Statistics comprises those
methods concerned with the collection,
description, and analysis of a set of data
without drawing conclusions or inferences
about a larger set.

The main concern is simply to describe the set of data

such that otherwise obscure information is brought out

Conclusions apply only to the data on hand.

[Statistics|101] Introduction
Given the daily sales performance for a product for the
previous year, we can draw a line chart or a column
chart to emphasize the upward/downward movement of
the series. Likewise, we can use descriptive statistics to
calculate a quantity index per quarter to compare the
sales by quarter for the previous year.

[Statistics|101] Introduction
Inferential Statistics comprises those methods
concerned with making predictions or
inferences about a larger set of data using
only the information gathered from a subset
of this larger set.
The main concern is not merely to describe but actually predict and
make inferences based on the information gathered.

Conclusions are applicable to a larger set of data which the data on

hand is only a subset. These conclusions are under conditions of
uncertainty because we only use partial information. Conclusions
will be subject to some error and probability theory will help us
understand the possible errors that can be committed.

[Statistics|101] Introduction
Election polls make use of inferential statistics to predict
the winners for the coming election based on data
collected from a sample of registered voters.

[Statistics|101] Introduction
Descriptive Statistics Inferential Statistics

A bowler wants to find his bowling A bowler wants to estimate his chance of
average for the past 12 games winning a game based on his current
season averages and the averages of his

A housewife wants to determine the A housewife would like to predict based

average weekly amount she spent on on last years grocery bills, the average
groceries in the past 3 months weekly amount she will spend on
groceries for this year

A politician wants to know the exact A politician would like to estimate, based
number of votes he received in the last on an opinion poll, his chance for
election winning in the upcoming election

[Statistics|101] Introduction
Identify whether the following situations belongs to the field of
Descriptive or Inferential Statistics.

a. A badminton player wants to know his average score for the past 10 games.
b. Wincy wants to determine the variability of his six exam scores in Algebra.
c. Pat would like to forecast the average monthly electricity bill she will pay for
the next year based on her average monthly bill in the past year.
d. Novie wants to determine the proportion spent on transportation during the
past four months using the daily records of expenditure that she keeps.
e. Vinse wishes to determine the number of families not eating three times a
day in the sample used for their survey.
f. A politician wants to determine the total number of votes his rival obtained
in the past election based on his copies of the tally sheet of electoral returns.
g. A politician wants to determine the total number of votes his rival obtained
in the sample used in the exit poll.

[Statistics|101] Introduction

Sports Social
Sciences Sciences

Tourism and


[Statistics|101] Introduction
Statistics is used in virtually any field and seen in everyday
life. Whenever there is data and numbers, there is Statistics.
There is always statistics in the news, survey results,
speeches made by politicians, stocks, and Peso-Dollar
exchange rates to name a few. We see and hear numbers
almost always in billboards, TV, and radio advertisements.
In your bachelors course, you will certainly pass through
some kinds of data. The following presents the applications
of Statistics to various fields.

[Statistics|101] Introduction

Share some studies or
research in your field
where Statistics is used.

[Statistics|101] Introduction of the Course

A statistical inquiry is a designed
research that provides information
needed to solve a research problem

What makes the statistical inquiry distinct from other

types of investigation is that it allows us to arrive at the
answers to the research problem through the objective
examination of data collected from the elements under
study. This means that in conducting a statistical inquiry,
we would have to follow an organized and systematic
process of collecting and analyzing data pertinent in
answering the stated problem.

[Statistics|101] Introduction

describe reveal
clarify justify
identify forecast
[Statistics|101] Introduction
1. describe the characteristic of the elements in the population under
study through the computation or estimation of a parameter such as
the proportion, total, and average;

2. compare the characteristics of the elements in the different

subgroups in the population through contrasts of their respective
summary measures;

3. justify an assertion made by the researcher about a particular

characteristic of the population or subgroups in the population;

4. determine the nature and strength of relationships among the

different variables of interest;

5. identify the different groups of inter-related variables under study;

[Statistics|101] Introduction
6. reveal the natural groupings of the elements in the population
based on the values of a set of variables;

7. determine the effects of one or more variables on a response


8. clarify patterns and trends in the values of a variable over time or


9. predict the value of a variable based upon its relationship with

another variable; and

10. forecast future values of a variable using a sequence of

observations on the same variable taken over time.

[Statistics|101] Introduction
Regardless of the complexity of the research problem at hand, a researcher
can complete any one of these inquiries by following these basic steps.

Identify the problem.

Plan the study.

Collect the data.

Explore the data.

Analyze the data and interpret the results.

Present the results.

[Statistics|101] Introduction
Any statistical inquiry must begin with a clearly stated research problem.

This is the heart of the whole research process.

The stated problem is the basis of all the actions that the researchers will
take in the other stages of the research process.

Brainstorm on certain issues such as:

Rationale for conducting the investigation
Significance of the study
Scope and limitations of the study
Assumptions the researchers have to make
Expected output of the research
Definition of terms
Operational definition of the exact population of interest

[Statistics|101] Introduction
Suppose the researchers want to determine if there is an association between the
price and production of lumber. They would have to answer the following questions
first to obtain a precise statement of the problem.

What kind of lumber will be included in the study? Will all types of lumber be
included or just one specific type?
Will the study include the whole production of lumber or only lumber produced
for sale?
What price for lumber will be used, the market price or the factory price?
What is the scope of the study? Are all the regions of the country included or
just a specific region or province only?
What period is covered in the study?

After answering these questions, the researchers may finally state the problem as
What is the relationship between the total mahogany production of Mindanao and
the market price of mahogany in the past 10 years?

[Statistics|101] Introduction
The statement of the research problem is usually in the form of a question.

However, there are also other ways of stating the problem. It can also be in
the form of a statement.

Another way of further refining the statement of the problem is by

formulating a hypothesis. A hypothesis is an educated guess by the
researcher, a possible answer to the research problem based on his study
of the literature, own experiences, and previous observations. The
hypothesis must be well-defined and testable.

After stating the problem, the researchers must list down all of the specific
objectives or the specific information needed that will help them answer
the stated problem.

[Statistics|101] Introduction
In the form of a question
What are the factors affecting he job performance of an employee?

In the form of a statement

This study proposes to describe the relationship among job satisfaction,
salary, quality of relationship with the supervisor, and job performance.

In the form of a hypothesis

As the salary raises, quality of relationship with the supervisor gets higher,
and job satisfaction goes up, the job performance of an employee increases.

[Statistics|101] Introduction
In coming up with a plan, the researchers need to consider all the outputs
in Step 1.

The concrete output in Step 2 is the investigators research design.

Basic Elements of a Research Design

List of variables in the study
Design of the instrument to measure the variables
Data collection method
Sampling design if data will be collected from a sample
Experimental design if data will be collected through an experiment
Methods for data analysis

[Statistics|101] Introduction

The research design is a detailed discussion

of the methods and strategies for data
collection and analysis that the investigators
plan to use in order to meet all of the
specific objectives of the study.

An effective research design is as simple as possible and,

at the same time, cost-efficient.

[Statistics|101] Introduction
Here, the investigators carry out the plans specified in the
research design of data collection.

In addition, the researchers take extra measures to ensure

the quality of the data collected.

If the collected data were incomplete, outdated, inaccurate,

or worse yet, fabricated, then it will be useless to proceed
with data analysis.

[Statistics|101] Introduction
Prior to data analysis, the investigators need to explore
and understand the essential features of their data.

This process allows them to determine if their data satisfy

the assumptions made in the derivation of the statistical
technique that they will use for analysis.

This process will also reveal to them if their data exhibit

any peculiarities that will create problems in the analysis.

[Statistics|101] Introduction

Analysis follows after collecting and organizing the data.

The investigators examine all of the results on tables, charts, estimated

summary measures, and tests of hypothesis.

They need to check that they were able to meet all of the specified
objectives and to answer the research problem and give recommendations
on how it can be useful in decision making.

The investigators also double check the results that contradict existing
theories or the earlier hypothesis made. They may have committed errors
in data collection or analysis. If not, they would have to propose possible
explanations for these results or suggest future statistical inquiries that
could help explain the inconsistency.

[Statistics|101] Introduction
After analyzing the data and interpreting the results,
the investigators must present these results in a clear
and concise manner to the users of the research.

The presentation must also include a discussion of the

whole research process. This will help the users
evaluate for themselves the reliability and credibility
of the presented information.

[Statistics|101] Introduction

Top 3 Learning Points
i th learning point

[Statistics|101] Introduction of the Course

Statistical thinking will one day be as necessary for
efficient citizenship as the ability to read and write.

Chapter One

You might also like