Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

10/5/2014

About This Module …


Probability quantifies the likelihood of a specific outcome or
set of outcomes of a random event and is useful for

Basic Statistics planning in uncertain environments. Probability


enumerates the possible permutations or combinations of
items from a population.
Basic statistics is a collection of techniques useful for
making decisions about a process or population based on
an analysis of information contained in a sample of that
population.1
The science of decision making under uncertainty.2
1Douglas C. Montgomery, Introduction to Statistical Quality Control, Third Edition, 1997
2 Edward Dudewicz, Juran’s Quality Control Handbook, Fourth Edition, 1988

Importance of Basic Statistics Types of Sampling


We live in a statistical world. 1. Simple Random – Every item has an equal chance
• Statistics have a pervasive influence on our lives …
2. Systematic – Sampling according to some defined norm,
– Every day there is another poll
– Sampling is being used to perform many aspects of eg. every 10th unit or every hour
the census
3. Stratified – Dividing the population into sub groups based
– All major economic indicators are based on samples
on some common characteristic. eg. select from each
– TV ratings are based on samples
– Statistics determine insurance rates pallet or shift, or machine, etc.
• Quantum physics has demonstrated that probability 4. Cluster – The population is divided into clusters that each
determines the structure and operation of everything.
• Statistics are a major enabler of Six Sigma. represent the entire population. eg. batch processing.

1
10/5/2014

Types of Data Types of Data


1. Continuous Data 2. Discrete Data

• Continuous data Results from measurement on a This results from counting the occurrences of events.
eg.
continuous scale such as length, weight, time,
1) Dents or scratches on painted parts
temperature etc.
2) No. of calls per hour in a call center
• Here the basic data is on a continuous scale. 3) Guests arriving in a hotel every hour
• Divisions of the units make sense. 4) Number of customer complaints per week/month
Example, ½ second or ½ millimeter makes sense. To identify whether data is discrete, the test is that
“Half the unit does not make sense”. ½ of scratch or ½
of a call or ½ guest does not make sense. Thus this
data is discrete

Discrete Data Measurement Scales


• Attribute or countable data
1. NOMINAL SCALE
• Examples
Data is classified into categories
– No. of units unfit for sale, no. of imperfections on an
automobile, no. of successes in n trials, good/bad, yes/no, Example: List of machines by type
pass/fail, etc
List of people by state
• Uses
This is only a nominal grouping. No order or
– Computing proportions (defects per unit, calls per associate,
or transactions per day) importance is attached.

– Categorizing counts (types of defects, calls, or transactions)

2
10/5/2014

Measurement Scales Measurement Scales


3. INTERVAL SCALE
2. ORDINAL SCALE
Differences are meaningful.
Refers to positions in a series. No absolute zero.

Order is important. Ratios do not make sense.

Precise difference in the order is not defined. o


Examples : Temperature measurements. 10 C &
Examples: First, second, third etc in race. o o
20 C is a meaningful difference but (effect of) 20 C is
o
not double (the effect of) of 10 C.

Measurement Scales Data Integrity and Accuracy

4. RATIO SCALE • Data integrity determines whether the information being

Meaningful differences
measured truly represents the desired attribute.
Absolute zero exists.
• Data accuracy determines the degree to which individual
Ratios make sense.

Example : Length 2 meters is double of 1 meter. or average measurements agree with an accepted

Time 2 hours is double of one hour.


standard or reference value.

3
10/5/2014

Descriptive Statistics Inferential Statistics


Descriptive statistics explain the characteristics of a • Inferential Statistics – Used to make decisions about a
sample or population including but not limited to :
population from sample data, or to predict future
– measure of the center of the population or sample
– measure of variability of the population or sample performance based on past data. eg.
– measure of location and frequency, and cumulative
– based on the sample, what is the average for the
distributions
population?

Examples are : – based on the sample, what is the error rate for this process?
– average processing time for this sample – based on the sample, how many customers will receive
– amount of variation in the process
defective product?
– type of distribution of data in this sample group

Summarizing Data - 1 Describing a Distribution

We need to be able to describe the sample that we obtain


• Measures of Location
from the population.

• Where is the center of the sample?


– Mean
– Location

– Mode • What is the spread of the data?

– Variability
– Median
• What is the shape of the distribution?

4
10/5/2014

Central Tendency Mean

Central tendency, a measure of most of the data’s • The population mean, μ , is the average of all

observations from a population.


location, refers to a variety of key measurements,
• The calculation of the population mean is identical to the
including mean (sample and population), median, and
calculation of the sample mean.

mode. Range, standard deviation, and variation are


• The sample mean, X, can be used as an estimate of the

measures of spread (dispersion) around the central point. population mean if the entire population is not available

to compute the population mean.

Standard Deviation Variance

Population
The standard deviation and variance of a set of data both

describe the amount of variation in the data set by

measuring, and averaging, how much each value in the


Sample

data set actually varies from the calculated mean.

5
10/5/2014

Sample vs. Population Box and Whisker Plots


• Population - Collection of all items of data under
consideration. Also called as Universe. Box-and-whisker plots use five key data points to

• Parameter - Significant values from population are


graphically compare data produced from different sources
referred to as population parameters.

• Sample - A subset of items selected from the population


(different machines, operators, work centers, etc.).
and upon which the analysis will be performed.

• Statistic – Any descriptive value calculated from the


sample group’s values.

Box and Whisker Plots Box and Whisker Plots


1. The ends of the box are the first and third quartiles, Q 1
and Q3.
2. The median forms the centerline (vertical line) within the
box.
3. The high and low data points (i.e the range) serve as
end points to lines that extend from the box (the
whiskers). Each whisker (including outliers) contains
25% of the data.
4. The box serves as the middle half of the data containing
50% of the distribution.
5. Asterisks or diamonds represent data outside the range
(outliers).

6
10/5/2014

Box and Whisker Plot Benefits

• Explores data and draws informal conclusions when two


or more variables are present
• Visually represents the center, spread, and overall range
• Provides a graphic summary of a data set
• Indicates if the distribution is skewed and offers possible
unusual observations
• Is useful with a large number of data sets
• Shows outliers
• Helpful when comparing either two non-normal datasets
or when at least one is non-normal in a comparison.

You might also like