1.intro - Stats - Population - Samples - Inferential Stats - Distributions

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 31

Introduction to Statistics

Populations and Samples

 A Population is the set of all items or individuals of interest
 Examples: All likely voters in the next election
All parts produced today
All sales receipts for November

 A Sample is a subset of the population

 Examples: 1000 voters selected at random for interview
A few parts selected for destructive testing
Every 100th receipt selected for audit
Collection of persons, objects, or items of interest.
The population can be a widely defined category, such as “all automobiles,”
or it can be narrowly defined, such as “all Ford Mustang cars produced from
2002 to 2020.”
A population can be a group of people, such as “all workers presently
employed by TCS,”
or it can be a set of objects, such as “all dishwashers produced on February 23,
2021, by the General Electric Company ”
The researcher defines the population to be whatever he or she is

A sample is a portion of the whole and, if properly taken, is representative
of the whole.

For example, in conducting quality control experiments to determine the

average life of lightbulbs, a lightbulb manufacturer might randomly sample
only 75 lightbulbs during a production run.

Because of time and money limitations, a human resources manager might

take a random sample of 40 employees instead of using a census to measure
company morale.
Population vs. Sample
Population Sample

a b cd b c
ef gh i jk l m n gi n
o p q rs t u v w o r u
x y z y
Why Sample?
 Less time consuming than a census

 Less costly to administer than a census

 It is possible to obtain statistical results of a sufficiently high

precision based on samples.
Statistical Sampling
Items of the sample are chosen based on known or calculable

Probability Samples

Simple Stratified Systematic Cluster

Simple Random Samples
Every individual or item from the population has an equal chance of
being selected

Selection may be with replacement or without replacement

Samples can be obtained from a table of random numbers or

computer random number generators
Stratified Samples
Population divided into subgroups (called strata) according to some
common characteristic
Simple random sample selected from each subgroup
Samples from subgroups are combined into one

into 4

Systematic Samples
Decide on sample size: n
Divide frame of N individuals into groups of k individuals:
Randomly select one individual from the 1st group
Select every kth individual thereafter

N = 64
k=8 First Group
Cluster Samples
Population is divided into several “clusters,” each representative
of the population
A simple random sample of clusters is selected
All items in the selected clusters can be used, or items can be chosen
from a cluster using another probability sampling technique

divided into 16
Randomly selected
clusters for sample
Key Definitions
A population is the entire collection of things under consideration
A parameter is a summary measure computed to describe a
characteristic of the population

A sample is a portion of the population selected for analysis

A statistic is a summary measure computed to describe a characteristic
of the sample
Inferential Statistics
 Making statements about a population by examining sample
Sample statistics Population parameters
(known) Inference (unknown, but can
be estimated from
sample evidence)

Tools of Business Statistics
Descriptive statistics
Collecting, presenting, and describing data
For example, if an instructor produces statistics to summarize a class’s examination
effort and uses those statistics to reach conclusions about that class only, the
statistics are descriptive.

Inferential statistics
Drawing conclusions and/or making decisions concerning a population
based only on sample data
Descriptive Statistics
Collect data
e.g. Survey, Observation,

Present data
e.g. Charts and graphs

Characterize data
e.g. Sample mean = x i

Inferential statistics
One application of inferential statistics is in pharmaceutical research.

Some new drugs are expensive to produce, and therefore tests must
be limited to small samples of patients.

Utilizing inferential statistics, researchers can design experiments

with small randomly selected samples of patients and attempt to
reach conclusions and make inferences about the population.

One particularly useful tool for grouping data is the frequency

distribution, which is a summary of data presented in the form of class
intervals and frequencies
When constructing a frequency distribution, the business researcher
should first determine the range of the raw data.

The range often is defined as the difference between the largest and
smallest numbers.

The range for the data in Table is 9.7 (12.0–2.3).

Number of classes
The second step in constructing a frequency distribution is to determine how
many classes it will contain.
One rule of thumb is to select between 5 and 15 classes.
If the frequency distribution contains too few classes, the data summary may be
too general to be useful.
Too many classes may result in a frequency distribution that does not aggregate
the data enough to be helpful.
The final number of classes is arbitrary.
The business researcher arrives at a number by examining the range and
determining a number of classes that will span the range adequately and also be
meaningful to the user.
Class interval
After selecting the number of classes, the business researcher must
determine the width of the class interval.
An approximation of the class width can be calculated by dividing
the range by the number of classes.
For the data in Table 2.1, this approximation would be 9.7/6 = 1.62
Normally, the number is rounded up to the next whole number,
which in this case is 2.
The frequency distribution must start at a value equal to or lower
than the lowest number of the ungrouped data and end at a value
equal to or higher than the highest number.

The lowest unemployment rate is 2.3 and the highest is 12.0, so the
business researcher starts the frequency distribution at 1 and ends it at 13.

Class Midpoint
The midpoint of each class interval is called the class midpoint
and is sometimes referred to as the class mark.
It is the value halfway across the class interval and can be
calculated as the average of the two class endpoints.
For example, in the distribution of Table , the midpoint of the
class interval 3–under 5 is 4, or (3 + 5)/2.
Relative Frequency
Relative frequency is the proportion of the total frequency that is in any
given class interval in a frequency distribution.

Relative frequency is the individual class frequency divided by the total


For example, from Table , the relative frequency for the class interval 5–
under 7 is 13/60 = .2167.
Cumulative Frequency
The cumulative frequency is a running total of frequencies through the
classes of a frequency distribution.

The cumulative frequency for each class interval is the frequency for that
class interval added to the preceding cumulative total

The concept of cumulative frequency is used in many areas, including

sales cumulated over a fiscal year, sports scores during a contest
(cumulated points), years of service, points earned in a course, and costs
of doing business over a period of time.
Construct a frequency distribution for these data. Calculate and display
the class midpoints, relative frequencies, and cumulative frequencies for
this frequency distribution.
The range of the data is 1.33 (7.68–6.35).
7 classes
Thank You

You might also like