Download as pdf or txt
Download as pdf or txt
You are on page 1of 57

Introduction to

Statistics
LOGO

NATURE OF PROBABILITY
AND STATISTICS

cdcjaurigue
LOGO
Contents

1 Introduction

2 Descriptive and
Inferential Statistics

3 Types and Classification


of Data

4 Data Collection

5 Sampling Techniques

cdcjaurigue
LOGO
Objectives

Explain the importance of statistics.

Demonstrate knowledge of all statistical terms.

Differentiate between the two branches of statistics.

Identify and classify types of data.

Identify the four basic sampling techniques.

cdcjaurigue
LOGO
Why study Statistics?

✓ Data are everywhere


▪ The Philippine economy posted a GDP
growth of 6.4 percent in the first quarter of
2012 from 4.9 percent the previous year.
▪ With compensation of our overseas workers
on the rebound, the Net Primary Income (NPI)
grew by 4.0 percent pushing the GNI/GNP
growth to 5.8 percent from 3.5 percent in
2011.

cdcjaurigue
LOGO
Why study Statistics?
Data are everywhere
▪ GDP gained momentum growing by 2.5 percent
in the first quarter of 2012 while GNI grew by a
slower pace of 1.3 percent.
▪ The Agriculture, Hunting, Forestry and Fishery
sector posted a turnaround growth of 2.5
percent in the first quarter from two consecutive
quarters of decline.
▪ Industry slowed down to 2.2 percent growth
from 3.3 percent in the previous quarter.

cdcjaurigue
LOGO
Why study Statistics?
Data are everywhere
▪ The Services sector accelerated to 2.6 percent from 1.3
percent in the previous quarter as all its subsectors
recorded positive growth.
▪ With projected population reaching 95.2 million, per
capita
- GDP grew by 4.6 percent
- GNI grew by 4.0 percent
- HFCE grew by 4.9 percent
• HFCE (Household Final Consumption Expenditure) has been
growing robustly since the fourth quarter of 2010

Source: http://www.nscb.gov.ph
cdcjaurigue
LOGO
Why study Statistics?
✓ Statistical techniques are used to make
many decisions that affect our lives
▪ Insurance companies use statistical analysis to set
rates for home, automobile, life, and health insurance.
▪ Laguna Lake Development Authority is monitoring the
water quality of Laguna Lake. They periodically take
water samples to establish the level of contamination
and maintain the level of quality.
▪ Medical researchers study the cure rates for diseases
using different drugs and different forms of treatment

cdcjaurigue
LOGO
Why study Statistics?

✓ Knowledge of statistical methods will


help you understand how decisions are
made and give you a better
understanding of how they affect you.
▪ No matter what your career is, you will make
professional decisions that involve data

cdcjaurigue
LOGO
Why study Statistics?

▪ To make an informed decision you need to


• Determine whether the existing information is
adequate or additional information is required
• Gather additional information, if needed, in such
a way that it does not provide misleading results
• Summarize the information is a useful and
informative manner
• Analyze the available information
• Draw conclusions and make inferences while
assessing the risk of an incorrect conclusion
cdcjaurigue
LOGO
What does statistician do?

❖not just someone who calculates shooting


averages at basketball games or tabulates the
results of a poll

❖trained in statistical science


▪ trained in collecting numerical information (data),
evaluating it, and drawing conclusions from it
▪ they determine what information is relevant in a
given problem and whether the conclusions drawn
from a study are to be trusted

cdcjaurigue
LOGO
What is Statistics?

❖a branch of mathematics taking and


transforming numbers into useful
information for decision makers
❖method for processing and analyzing
numbers
❖method for helping reduce the
uncertainty inherent in decision making

cdcjaurigue
LOGO
Statistics Defined

▪ science that deals with the


• collection
• classification
• analysis and
• interpretation of information or data
to make decisions, solve problems, and
design products and processes

cdcjaurigue
LOGO
Statistics
Types
Descriptive Inferential
statistics utilizes statistics utilizes
numerical and sample data to
graphical methods make estimates,
to look for patterns decisions,
in a data set, to predictions, or
summarize the other
information generalizations
revealed in a data about a larger set
set, and to present of data or
the information in a population.
convenient form.

cdcjaurigue
LOGO
Descriptive Statistics

❖Collect data
▪ survey

❖Present data
▪ tables and graphs

❖Characterize data
▪ sample mean = X i

n
LOGO
Inferential Statistics

❖ Estimation
▪ Estimate the population mean
weight using the sample mean
weight
❖ Hypothesis testing
▪ Test the claim that the
population mean weight is 120
pounds

Drawing conclusions about a large group of


individuals based on a subset of the large group.
LOGO
Population vs. Sample

population
sample
the complete
collection of the portion of the
individuals, items, or population
data under selected for
analysis
consideration in a
statistical study

cdcjaurigue
LOGO
Population vs. Sample

a b cd
b c
ef gh i jk l m n
gi n
o p q rs t u v w
o r u
x y z
y

1-18 cdcjaurigue
LOGO
Population vs. Sample

Measures used to describe the Measures computed from


population are called sample data are called
parameters statistics
cdcjaurigue
LOGO
Population vs. Sample

All MCL engineering students Engineering students taking


IE101
The CEOs of all private The CEOs of private
companies in Laguna companies in Cabuyao

Colleges and Universities in Private colleges in


CALABARZON CALABARZON

cdcjaurigue
LOGO
Why Sample?

❖Less time consuming than a census

❖Less costly to administer than a census

❖It is possible to obtain statistical results


of a sufficiently high precision based on
samples

Strive for representative samples to reflect


the population of interest accurately!
cdcjaurigue
LOGO
Statistic
❖ a summary measure computed to describe a
characteristic of the sample drawn from the
population or to draw inferences about the population

Example.
• 100 owners of a certain car reported 85 problems
in the first 90 days of ownership.
• The statistic “85” describes the number of
problems per 100 cars during the first 90 days of
ownership.
• It suggests that the entire population of owners of
these cars experience an average of 0.85
problems per car.

cdcjaurigue
LOGO
Parameter

❖ a numerical measure that describes a


characteristic of a population

Measure used to describe the Measure computed from sample


population is called a parameter. data is called a statistic.
cdcjaurigue
LOGO
Other Terminologies

Variable Observation
2 Data
3 set
❖ a characteristic of
interest concerning ❖the value of a ❖consists of the
the individual variable for one observations of a
elements of a particular element variable for the
population or a from the sample or elements of a sample
sample population

❖often represented
by a letter such as x,
y, or z

cdcjaurigue
LOGO
Example
All engineering majors taking IE101 are polled and each one is asked if
they approve or disapprove of the student council’s policies.

Variable Observation
2 Data
3 set
❖ the opinion of the
engineering student ❖“approve” or ❖consists of the
taking IE101 of the “disapprove observation of all the
council’s policies. engineering students
taking IE101

Let
x = 0, if disapprove
0 or 1 1,0,0,1,0,...,1,1
x = 1, if approve

cdcjaurigue
LOGO
Data Types

Examples:
◼ Marital Status
◼ Political Party
◼ Eye Color
(Defined categories)

Examples: Examples:
◼ Number of Children ◼ Weight
◼ Defects per hour ◼ Voltage
(Counted items) (Measured
characteristics)

cdcjaurigue
LOGO
Classification of Data

Qualitative (or categorical) data


consist of labels, category names, ratings,
rankings, and such for which representation on
a numerical scale is not naturally meaningful

Quantitative (or numerical) data


are counts or measurements for which
representation on a numerical scale is naturally
meaningful.

cdcjaurigue
LOGO
Qualitative data

Examples
Satisfaction ratings (on a scale from “not satisfied”
to “very satisfied”) by users of a website
Party affiliation (Liberal, Nacionalista, Pwersa ng
Masang Pilipino, Lakas Kampi, Bangon Pilipinas,
etc.) of voters
Eye colors (blue, brown, or so on) of babies
Names (first and last) of a group of students who
took an exam

cdcjaurigue
LOGO
Qualitative data

Examples
Student numbers of a group of engineering
students
Foremost colors (red, yellow, orange, or so on) of
flowers in a garden
Sex (male or female) of users of a website

cdcjaurigue
LOGO
Quantitative data

Examples
Daytime temperature readings (in degrees
Fahrenheit) in a 30-day period
Heights (in centimeters) of plants in a plot of land
Number (0, 1, 2, or so on) of people attending a
conference
Distances (in miles) traveled by students
commuting to school
commuting to school
cdcjaurigue
LOGO
Quantitative data

Examples
Heights (in inches) of girls in a classroom
Number (0, 1, 2, or so on) of students in a
classroom
Number (0, 1, 2, or so on) of teachers in favor of
school uniforms
Ages (in months) of children in a preschool

cdcjaurigue
LOGO
Quantitative Data

Discrete Data
quantitative data that are countable using a
finite count, such as 0, 1, 2, and so on
integer-valued

Continuous Data
quantitative data that can take on any value
within a range of values on a numerical scale in
such a way that there are no gaps, jumps, or
other interruptions
real-valued
cdcjaurigue
LOGO
Discrete or Continuous?

Examples
Daytime temperature readings (in degrees
Fahrenheit) in a 30-day period continuous
Heights (in centimeters) of plants in a plot of
land continuous
Number (0, 1, 2, or so on) of people attending
a conference discrete
Distances (in miles) traveled by students
commuting to schoolcommuting
continuous to school

cdcjaurigue
LOGO
Discrete or Continuous?

Examples
Heights (in inches) of girls in a classroom
continuous

Number of students in a classroom discrete


Number of teachers in favor of school uniforms
discrete

Ages of MATH110 students continuous

cdcjaurigue
LOGO
Levels of Measurement

1
Ratio

2
Interval
3
Ordinal
4
Nominal
LOGO
Nominal Scale
the lowest level of data
applied to data that are used for category
identification
characterized by data that consist of names,
labels, or categories only
data cannot be arranged in an ordering
scheme
arithmetic operations are not performed for
nominal data
cdcjaurigue
LOGO
Nominal Scale
Qualitative Possible nominal level data
variable values

Blood type A, B, AB, O


Province of Laguna, Batangas, Cavite,
residence Rizal, Quezon
Type of crime misdemeanor, felony
Color of road
red, white, blue, green
signs
Religion Christian, Moslem, etc.
cdcjaurigue
LOGO
Ordinal Scale

the next higher level of data


applied to data that can be arranged in some
order, but differences between data values
either cannot be determined or are meaningless
characterized by data that applies to categories
that can be ranked
data can be arranged in an ordering scheme
arithmetic operations are not performed on
ordinal level data
cdcjaurigue
LOGO
Ordinal Scale

Qualitative Possible ordinal level


variable data values

Product rating Poor, good, excellent


Socioeconomic
Lower, middle, upper
class
None, low, moderate,
Pain level
severe

cdcjaurigue
LOGO
Interval Scale
applied to data that can be arranged in some order
and for which differences in data values are
meaningful
results from counting or measuring
data can be arranged in an ordering scheme and
differences can be calculated and interpreted
the value zero is arbitrarily chosen for interval data
and does not imply an absence of the characteristic
being measured
ratios are not meaningful for interval data
Examples: temperature, IQ scores
cdcjaurigue
LOGO
Ratio Scale

the highest level of measurement


applied to data that can be ranked and for
which all arithmetic operations including division
can be performed
results from counting or measuring
data can be arranged in an ordering scheme
and differences and ratios can be calculated
and interpreted

cdcjaurigue
LOGO
Ratio Scale

data has an absolute zero and a value of zero


indicates a complete absence of the
characteristic of interest
Examples:
wages height weight
units of production
changes in stock prices
distance between branch offices
grams of fats consumed per day
cdcjaurigue
LOGO
Data Collection

data can be obtained in a variety of ways:


Data from a published source
Data from a designed experiment
Data collected observationally
Data from a survey
− telephone
− mail questionnaires
− personal interviews
− surveying records
cdcjaurigue
LOGO
Sampling Techniques

Sampling Techniques

Nonstatistical Sampling Statistical Sampling

Simple
Convenience Systematic
Random
Judgment
Cluster
Stratified

1-44
LOGO
Nonstatistical Sampling

❖Convenience
▪ Collected in the most convenient
manner for the researcher

❖Judgment
▪ Based on judgments about who in the
population would be most likely to
provide the needed information
LOGO
Statistical Sampling

❖To obtain samples that are unbiased,


statisticians use four methods of sampling.
❖Items of the sample are chosen based on
known or calculable probabilities.

Statistical Sampling
(Probability Sampling)

Simple Random Stratified Systematic Cluster


LOGO
Simple Random Sampling

❖Every possible sample of a given size has


an equal chance of being selected
❖Selection may be with replacement or
without replacement
❖The sample can be obtained using a table
of random numbers or computer random
number generator
LOGO
Stratified Random Sampling
❖ Divide population into subgroups (called strata)
according to some common characteristic
▪ Example. gender, income level
❖ Select a simple random sample from each
subgroup
❖ Combine samples from subgroups into one

Population
Divided
into 4
strata

Sample
LOGO
Systematic Random Sampling
❖ Decide on sample size: n
❖ Divide ordered (e.g., alphabetical) frame of
N individuals into groups of k individuals:
k = N/n
❖ Randomly select one individual from the 1st
group
❖ Select every kth individual thereafter
N = 64
n=8
First Group
k=8
LOGO
Cluster Sampling
❖Divide population into several “clusters,”
each representative of the population
(e.g., regions)
❖Select a simple random sample of
clusters
▪ All items in the selected clusters can be used, or
items can be chosen from a cluster using another
probability sampling technique
Population
divided into 16
clusters. Randomly selected
clusters for sample
LOGO
Computers and Calculators

❖Computers and calculators make


numerical computation easier.
❖Many statistical packages are
available. One example is MINITAB.
Several model of calculators can now
be used to do statistical calculations.
❖Data must still be understood and
interpreted.
LOGO
Surveys

❖ A survey is a way to ask a lot of


people a few well-constructed
questions.

❖The survey is a series of unbiased


questions that the subject must
answer.
LOGO
Surveys

Advantages Disadvantages
❖ Efficient way of ❖ They depend on the
collecting information subjects’ motivation,
from a large number of honesty, memory and
people. ability to respond.
❖ Relatively easy to ❖ Answers could lead to
administer. vague data.
❖ Wide variety of
information can be
collected.
❖ They can be focused.
LOGO
Designing a Survey
❖ Surveys can take different forms. They can be
used to ask only one question or they can ask a
series of questions. We can use surveys to test
out people’s opinions or to test a hypothesis.
❖ When designing a survey, the following steps
are useful:
1. Determine the goal of the survey.
2. Identify the sample population.
3. Choose an interviewing method.
4. Decide what questions you will ask, in what
order, and how to phrase them.
5. Conduct the interview and collect the info.
6. Analyze the results
LOGO
Design of Experiment
❖ Is defined as a branch of applied statistics that
deals with planning, conducting, analyzing, and
interpreting controlled tests to evaluate the
factors that control the value of parameters or
group of parameters.

❖ Allows for multiple factors to be manipulated,


determining their effect on a desired output. By
manipulating multiple inputs at the same time,
DOE can identify important interactions that
may be missed when experimenting with one
factor at a time. All possible combinations can
be investigated or only a portion.
LOGO
References

Business Statistics, 8/E


David F. Groebner
Elementary Statistics
Patrick W. Shannon A Step by Step Approach, 8/E
Phillip C. Fry Allan G. Bluman
Kent D. Smith

Business Statistics: Schaum’s Outline of Theory and


A First Course, 5/E Problems of Statistics, 4/E
David M. Levine Murray R. Spiegel
Timothy C. Krehbiel Larry J. Stephens
Mark L. Berenson
-Sir Choy

You might also like