04 - ISI Six Sigma Analysis

SIX SIGMA
216 Indian Statistical Institute,SQC & OR Unit, Mumbai

APPROACH TO ANALYZE
Hypothesis
Develop a Organize
Testing and Design of
focused problem Process Analysis potential
Regression Experiments
statement causes
Analysis

IN THE ANALYSIS PHASE YOU WILL...
Brainstorm on X’s
Find change of which X’s affect Y and in what manner
Ultimately find which X’s are critical to move the Y in the desired
direction
IN MEASURE PHASE, WE DEALT WITH Y’s.

IN ANALYSIS PHASE, WE WILL DISCOVER
& DEAL WITH X’s.
Understanding Process

UNDERSTANDING A PROCESS
To better understand your

process, you will:
− Create a flowchart of your

process.
− Identify which of your
process steps are value-
added and which are
nonvalue-added.
− Determine cycle time and identify bottlenecks.

− Look for errors or inefficiencies that contribute to defects.

FLOWCHARTS
Flowcharts are tools that make a process visible.
Start Step 1 Step 2 Step 3
Yes
Decision Step 6 End
No
Step 4 Step 5

TYPES OF FLOWCHARTS USEFUL
FOR UNDERSTANDING PROCESS FLOW
− Activity flowcharts − Deployment flowcharts

Sales Technical Shipping Coordinator

ACTIVITY FLOWCHARTS
Hotel Check-out Process Process Name
Activity flowcharts
are specific about 1
Approach front desk
2
Is there
YES 3
Wait
a line?
Clear
what happens in a direction of
NO flow (top to
process. They often Numbered 4
bottom or
steps Step up to desk left to right)
capture decision
points, rework 5
Clerk NO
6
loops, complexity, available?
Wait
Key of symbols
etc. YES
7 Consistent
Give room number
Start/End level of
detail
Action/Task
8
Check bill
Decision
9
Charges NO 10
Sequence correct? Correct charges
YES
11
Clear starting
Pay bill and ending
Date of creation points
or update &
name of creator

DEPLOYMENT FLOWCHARTS
Deployment flowcharts show People or groups

the detailed steps in a listed across the top Invoicing Process
process and which people Sales Billing Shipping Customer Elapsed

Time
or groups are involved in Steps listed in 1 Time flows
each step. column of person or
Delivers goods
down the
group doing step or 2 8
page
They are particularly useful in charge Notifies sales of
completed delivery
Receives
delivery
5 days
in processes that involve 3

9
Records receipt and
Sends invoice to
the flow of information customer
claims against this
delivery
between people or 4 10
10 days
functions, as they help Notifies billing

of invoice
Receives invoice
highlight handoff areas. 5

11
Checks invoice
Files invoice against receipt
12
Pays bill
6
Receives and
records payment
Horizontal lines
7
clearly identify
Reviews weekly
report of overdue
handoffs
accounts

WHICH FLOWCHARTING TECHNIQUE SHOULD I USE?
Basic Activity Deployment

Flowchart Flowchart Flowchart
• To identify the major • To display the • To help highlight
steps of the process complexity and handoff areas in
and where it begins decision points of a processes between
and ends process people or functions
• To illustrate where in • To identify rework • To clarify roles and

the process you will loops and bottlenecks indicate dependencies
collect data
Which flowchart do you intend to use for your project?

HOW TO CREATE FLOWCHARTS
When creating a flowchart, work with a group

so you can get multiple viewpoints.
− Brainstorm action steps
 Write these on self-stick notes or on
a flipchart
 Make sure to include the steps that
occur when things go wrong
− Arrange the steps in sequence

 Be consistent in the direction of flow—
time should always flow from top to bottom, or from left to right
 Use appropriate flowchart symbols
− Check for missing steps or decision points

− Number the steps

FOUR PERSPECTIVES
Flowcharts can map four different perspectives on a process:

− What you think the process is.
− What the process really is.
− What the process could be.
− What the process should be.
At this stage of a DMAIC project, you are trying to define the current
situation, as it is. Therefore, your flowchart(s) should map what is really
happening in the process.

PHOTOCOPYING PROCESS
Process Steps [As we think]
Put original Adjust Press Remove

Close Lid originals
on glass Settings START
and copies

COPY PROCESS
Yes
Take Original Copier Yes Wait? No Leave

in Use?
No
Place No Glass Yes

Original Dirty? Clean
Select Select Select Paper? No Find

Size Orientation Number Paper
Yes
Yes
Box No Knife? No Find Open
Open? Knife Box
Yes
Yes Paper No Find

Loaded? Help
Yes No
Start Copy Yes Quality No Stop

Copier Made? Ok? Copier Adjust?
Yes
No
Adjust Find
Help
Yes Another Fix No
Page? Problem?
No Yes
Remove Collect Staple Clear

Original Copies copies modes Leave

VALUE-ADDED AND NONVALUE-ADDED STEPS
Value-Added Step:
− Customers are willing to pay for it.
− It physically changes the product.
− It’s done right the first time.
Nonvalue-Added Step:
− Is not essential to produce output.
− Does not add value to the output.
− Includes:
 Defects, errors, omissions.
 Preparation/setup, control/inspection.
 Over-production, processing, inventory.
 Transporting, motion, waiting, delays.

EXAMPLES
Value-Added Activities Nonvalue-Added Activities
• Entering order • Waiting

• Ordering materials • Storing
• Preparing drawing • Staging
• Assembling • Counting
• Legally mandated testing • Inspecting
• Packaging • Recording
• Shipping to customer • Obtaining Approvals
• Testing
• Reviewing
• Copying
• Filing
• Revising/Reworking
• Tracking

FLOWCHART WITH WASTE HIGHLIGHTED
You can turn an Start
activity 9.
1.
flowchart into Stop
packing line
Adjust
settings
No
an opportunity
2. 5. 6. 7. 8. Yes 10.
Yes
flowchart by Same
product?
Change
length
Adjust
speed
Run test Speed
OK?
Adjust
stapler
cartons
highlighting No
Yes
the steps that 3.
Pick-up
4.
4.
Timing
Clean
No 12.
Closed?
11.
Staple
appropriate okay? test carton
add waste and tools machine
Yes
complexity to 18. 13.
Stop Load carton
the process. line
No
Yes 17. 16. No 15. 14.
Right Call Product? Refill
product? foil
filling
Yes
19.
Start
production
Example: Changeover process

Organization of Potential Cause

Organizing for Potential Causes
− Understand the need to organize potential causes visually
− Know when and how to construct a C&E diagram
− Know when and how to create and use a tree diagram or an

affinity diagram

BRAIN STORMING
What is Brainstorming ?
• Brainstorming is a simple but effective technique for
generating many ideas of a group of people within a
short span of time to solve a given problem
BASIC RULES FOR BRAINSTORMING

➢ Defer evaluation
➢ Fantasize freely
➢ Generate quantity
➢ Build on ideas

DEFER EVALUATION
• Put critical faculties in cold storage- even constructive
criticism. This is to ensure a proper climate of
acceptance of all sorts of ideas. No idea should be
treated as stupid.
FANTASIZE FREELY
• Don’t operate with your brakes on. The participants
are encouraged, urged to let themselves go and
generate ideas, no matter how fanciful these ideas are.

GENERATE QUANTITY
• Generate as many ideas as possible. A pearl diver will
be more successful in finding pearls, perhaps the pearl,
when he brings up 200 oysters than when he surfaces
only 15-20 oysters.
BUILD ON IDEAS
• Idea of one participant is more effectively built up by
another participant.

PRINCIPLES OF BRAINSTORMING
 Deferment of evaluation develops the appropriate
psychologically safe climate for ideation
 The uniqueness of each participants’ knowledge is
tapped to develop new insights
 Ideas of one participant tend to trigger off ideas in the
brains of other group members
 Free association encourages fruitful ideation
 The pressure of time bound sessions in a non-
threatening atmosphere is conducive to a high
productivity of ideas
CAUSE AND EFFECT DIAGRAM
• To generate in a structured manner, maximum
number of ideas regarding possible causes for a
problem by using brainstorming technique.
HOW TO PREPARE CAUSE AND EFFECT DIAGRAM

 Clarify the problem
 Gather members for discussion
 Conduct Brainstorming session
 Group the causes
Man, Material, Machine, Method etc.
 Draw the cause and effect chart
 Check for missing information
 Determine importance of significance of causes
STRUCTURE OF A CAUSE AND EFFECT DIAGRAM
Man Machine
Problem
/ Effect
Material Method
Each of the main branches has many potential

subbranches that further subdivide the potential
causes.
Cause and Effect Diagram for High Petrol Consumption
Procedure Driver Vehicle
Spark plugs
Impatience Heavy Contacts
Poor Bad Life
Craze
anticipation attitude
Body Technical
Wrong Poor details
skill Shape
Always gears Fuel mix
late Lack of Inexperience High H.P Carburetor
awareness Wrong
Riding on culture Engine
clutch
Cylinders High Petrol
Spurious Consumption
Crossings Spares
Restrictions Impurities
Traffic Incorrect
One way
No turn Tyres Inferior Octane no.
Frequent Petrol
Faulty
Circuitous stops Negligence
Speed Breakers pressure Additives
Road
Ignorance
Potholes Irregular Incorrect viscosity
Low pressure
Poor servicing
Clogged
condition Oil
False filters
Steep economy Not changed
Low level
Road Maintenance Materials

USES OF CAUSE AND EFFECT DIAGRAM
• To investigate and list down the cause and effect
relationship of problem under investigation.
• Analyze the problem to trace the real root cause.
• To help stratification for collection of further data
to confirm relationship.
• To help evolve counter-measure.

Identifying Potential Causes: Review
− Start with a narrow problem definition

− List potential root causes
− Organize potential causes using a cause-and-effect
diagram, tree diagram, or affinity diagram
− Visual displays can be a powerful communication tool
− Now it is time to revisit and finalize the focused problem
statement.
− None of these causes has been verified; cause verification
is the next step in the DMAIC cycle

Validation of Potential
Cause

Strategy
Yes Use
Prioritization
Potential Too many? matrix
Causes Develop strategy
to minimize
No
No impact.
No Search for
Controllable? controllable Feasible?
causes
Yes
No Relationship Yes
Known?
Establish
Relationship Yes
Desired Yes Use Investigation
State Method (GEMBA)
Known?
No
Data No Plan for

Available? Experimentation
and Collect Data
Yes
Use Test of
Hypothesis

Validation of Causes
 GEMBA (Work Place) Investigation

 List each cause and verify them through workplace
observations.
Causes Specifications/ Desired Observations Remarks

states
Bunching Maximum 2 at a 6 out of 10 May be a potential
of car time moments root cause for
Delay in servicing

H o
:  o
= x
H a
:  o
< x
Hypotheses of
H o
:  A
=  Means
H a
:  A
>  B
Hypotheses of
Standard
Deviations
Introduction to Hypothesis Testing

Indian Statistical Institute,SQC & OR Unit, Mumbai
Introduction to Inferential statistics
In various real-life situations , we are interested to make some
decision about the population or process parameters like mean
(µ), variation (σ or σ2 ) and proportion(P).
In such situations, we generally take decision based on the sample

collected from the population or process.
Inferential statistics in an area of statistics where we can learn

how sample statistics (i.e. x-bar, s, s2 ) can be used to make a
decision about the population or process parameters.

Red Apple Garden Green Apple Garden
1. What are the average weights of red and green apples in the garden?
2. I think the weights of red and green apples are not same.
3. I think red apples are more uniform than the green apple.
4. I think percentage of rotten apple is more in case of red apples than

green apples in garden.

Inferential statistics : Estimation Vs. Hypothesis testing
Statistical Inference
Parameter estimation Hypothesis testing

1. Point estimation : It is the value of a single
statistic such as sample mean or sample
proportion.
2. Confidence interval (CI) estimation: CI

includes the population parameters with some
known probability (i.e. 95% CI of mean is 10,20).
It is a decision making procedure about a hypothesis regarding population

parameters. A hypothesis is a claim or an assertion about a population
parameter that is yet to be proved.
Ex: time spend on mobile phone are different for male and female
Hypothesis Testing
We want to take a practical problem and change it to a statistical
problem
We use relatively small samples to estimate population parameters
✓ There is always a chance that we can select a “weird” sample
✓ Sample may not represent a “typical” set of observations
Inferential statistics allows us to estimate the probability of getting a
“weird” sample
Example
✓ If we wanted to know a coin was “fair”, we could flip it a number
of times and track how many heads we saw
✓ By chance we would expect about 50% of the flips to be heads
✓ If we flipped the coin 10 times and got 10 heads, we would be
fairly confident the coin is not fair
✓ There is one chance out of 1000 that we could have gotten 10
heads with a fair coin
✓ Therefore, we would say we are willing to take a 0.1% chance of
being wrong about our “unfair” coin

Overall Approach
Practical Problem Statistical Problem
y = f ( x1 , x2 ,..., x k )
Practical Solution Statistical Solution

Statistical thinking will one day be as necessary for
efficient citizenship as is the ability to read and write.
H.G. Wells Circa 1925
Key Terms
Ho = Null Hypothesis ,Ha = Alternative
Hypothesis ,P-Value = Probability Value

Formulating Null and Alternative hypothesis
P
ractica
lpro
blem S
tatistica
lpro
blem
Null hypothesis(H0) : H0 is a statement about the status quo or normal

condition of the population parameter and it is generally expressed with an
equality (‘=‘) sign. H0 is believed to be true unless it can be shown to be incorrect
beyond a reasonable doubt.
Alternative hypothesis (Ha ) : Ha represent a specific claim or inference about

the population parameter that is contradictory to null hypothesis (H0) and most
of the situation we would like to prove alternative hypothesis. Ha is generally
expressed with greater than (‘>’), less than (‘<‘) or not equal to (‘≠’) sign.
An example : After an improvement , manager in a restaurant is interested to

determine whether waiting time to place an order has changed from earlier value of
5 min.
H0 : µ = 5 min. ; Ha : µ ≠ 5 min.
Note : Hypothesis is all about population or process parameters, it is not about the sample statistics.
Therefore, one should use the symbol µ, σ, σ2 while writing hypothesis.

Hypothesis Testing
What is it for Statisticians ?
Ho: Mean Group A = Mean Group B
Ha: Mean Group A = Mean Group B
Ho: Slope of the line is 0

Ha: Slope of the line is not 0
Ho: Variance Group A = Variance Group B

Ha: Variance Group A = Variance Group B
Ho: Variable X is independent of Variable Y

Ha: Variable X is not independent of Variable Y

Hypothesis Testing
What is it for the Average Person ?
Ho: Age doesn’t matter in a company’s hiring practices
Ha: Age does matter in a company’s hiring practices
Ho: Data is Normal

Ha: Data is not Normal
Ho: Batch X Avg. Cycle Time = Batch Y Avg. Cycle Time

Ha: Batch X Avg. Cycle Time = Batch Y Avg. Cycle Time
Ho = _______________________________________
Ha = _______________________________________
S
tatistic
a lp
rob
lem S
tatistic
a lS
olu
tio
n
In test of hypothesis we collect the evidence in the form of sample data. Then,
appropriate statistical test are used to examine whether the evidence is
against or in-favour of the H0 .
Two approaches of statistical solution :
1. Critical value approach : According to this approach, the evidence

(sample ) is against the H0 when the test statistic falls within the critical
region. Therefore we reject the H0.
2. P- value approach : According to this approach, the evidence (sample )

is against the H0 when p-value is low .Therefore we reject the H0.
Note : We will used p-value approach in this session
Fundamentals of Hypothesis Testing
Based on what we know, we form a hypothesis to explain something

that we don’t know
Generally, this hypothesis takes the form of: Y=f(x1,x2...xn)
We devise a test to prove the hypothesis true or false by testing the
effect of the x’s on Y
We assume that the null hypothesis is true
We then look for compelling evidence to support or fail to accept that
hypothesis
If we fail to accept the null hypothesis, then we accept the
alternative hypothesis

Steps to carry out hypothesis testing
Step 1 : Understand Start : Hypothesis

Step 2 : Formulate
whether the problem is for testing typically begins
the problem with H0
mean, variation or with a theory, a claim , or
and Ha.
proportion . an assertion about a
particular population
Step 3 : Collect parameter (like mean,
Step 6 : sample data variation or proportion)
Interpreting p-value (evidence) from the
population/process.
i. p-value > 0.05 :
Sample is in favor of • Test for mean
null hypothesis ( H0) , Step 4 : Select
so your research claim appropriate • Test for variance
(Ha) is false. statistical test.
ii. p-value <= 0.05 : • Test for proportion
Sample is in against Step 5 : Carry out
null hypothesis ( H0) , the test using
so your research claim sample data and get
(Ha) is true. the p-value.
Note:0.05 is called as significance
level that will be discussed later.
Hypothesis Testing
Test for means Test for variance Test for proportion
1.One sample t-test 1. One variance or 1. One Proportion test

Compares mean of one Chi-square test Compares proportion of
sample with specified Compares variance of one one sample with
value.. sample with specified specified value..
value..
2. Two sample t-test

Compares the mean 2. Two variance or F-test 2. Two proportion test
between two Compares the variance Compares the proportion
independent samples . between two samples . between two samples .
3. Paired t-test
Compares the mean 3. Equal variance or 3. Chi-square test
between two dependent Bartllet's -test Compares the proportion
samples . Compares the variance among multiple samples .
among multiple samples .
4. ANOVA
Compares the mean Note : Test for means and variance mentioned in this diagram
among multiple multiple assume data follow normal distribution. For non-normal data
samples . equivalent non-parametric tests are used.
Hypothesis testing using Minitab and Excel
Test for means
Test for proportion
Test for variance

Steps in Hypothesis Testing
1. Define the Practical Problem
2. State the Objectives (Create the Statistical Problem)
3. Establish the Hypotheses
- State the Null Hypothesis (Ho)
- State the Alternative Hypothesis (Ha).
4. Decide on appropriate statistical test
5. State the Alpha and Beta level (usually 5% and 10% respectively)
6. Establish the Effect Size (Delta) and estimate Sample Size
7. Develop the Sampling Plan and Select Samples
8. Conduct test and collect data
9. Calculate the test statistic (z, t, or F) from the data.
10. Determine the probability of that calculated test statistic occurring by
chance.
11. If that probability is less than alpha, reject Ho and accept Ha. If that
probability is greater than alpha, do not reject Ho. ( Practically Accept H0)
12. Replicate results and translate statistical conclusion to practical solution.

Six Sigma Roadmap
Y = f( x1, x 2 , x 3 ,..., xk )
Remember this simple equation ?
DATA TYPE :Discrete
Counts of Discrete Events ( 1, 2, 3, 4 Defects)
Qualitative Descriptions
Democrat / Republican
Good / Bad
Machine 1 / Machine 2
Continuous
Decimal sub-divisions are meaningful
Time, Weight, Thickness, etc...

Scenario #1
A Supervisor wants to know if two operators add
significantly different amounts of Material A during the blending process
What’s the Y ? _____________ Type of Data ? ______________
What’s the X ? _____________ Type of Data ? ______________
What type of tool would you use ? ________________________

Scenario #2
The Personnel Department wants to see if there is a link between age
(old and young) and whether that person gets hired

Scenario #3
A team wants to see if there is relationship between
ambient temperature and the viscosity of a material

Scenario #4
For outstanding payment analysis, Sales dept. wants to see if there is a
link between amount of monthly outstanding and various dealers.

Scenario #5
For accident analysis, safety dept. wants to see if there is a link
between unit weight per container and injuries to consumers

Tool Roadmap
Test of Comparisons:
Y = Continuous Y = Discrete
Comparison Type Mean Variance Defective Defects
Against Standard 1 Sample t Chi-Square Test 1 sample p 1 sample defect
rate
Between Two 2 Sample t F-test 2 Sample p 2 sample defect
OR rate
Paired t
Among Many ANOVA Bartlett's Test Chi-Square test Chi-square
Note: The test mentioned for Y (Continuous) is applicable only when Y follows
Normal Distribution. In case Y does not satisfy the Normality, then we need to use
Non Parametric tests. For carrying out ANOVA, the condition of ‘Equality of variance’
to be satisfied.
Test of Modelling (X = Continuous):

Y = Continuous : Regression
Y = Discrete: Logistic Regression

Review
✓ Introduced the basic concepts of

Hypothesis Testing
✓ Linked Hypothesis Testing to upcoming

topics

Hypothesis Testing of
Means

Hypothesis Testing of Means-Roadmap
Comparing Means
1 Factor 2 Factors 3 or
more
factors
1 Sample 2 Samples 2 or
more
samples
 known  not known independent paired
1-sample 1-sample 2-sample Paired One way Two way ANOVA

Z-test t-test t-test t-test ANOVA ANOVA GLM

Test of Hypotheses… Small Samples
What if:
• We wanted to compare a sample mean with a
hypothesized population mean
• The number of observations were less than 30
• The population standard deviation is not known…
This would be a “one-sample” test (because we selected
a single random sample and compared it’s mean to a µ).
We can use the t-distribution as the test statistic

Characteristics of Student’s t-Distribution
The t-distribution has the following

properties:
− It is continuous, bell-shaped, and symmetrical
about zero like the z-distribution.
− There is a family of t-distributions sharing a mean
of zero but having different standard deviations.
− The t-distribution is more spread out and flatter at
the center than the z-distribution, but approaches
the z-distribution as the sample size gets larger.

Testing for the Population Mean: Small Sample, Population Standard
Deviation Unknown
The test statistic for the one sample case is given by:
X −
t =
s/ n
The current rate for producing 5 amp fuses at General Electric Co. is
250 per hour. A new machine has been purchased and installed that,
according to the supplier, will increase the production rate. A sample of
10 randomly selected hours from last month revealed the mean hourly
production on the new machine was 256, with a sample standard
deviation of 6 per hour. At the .05 significance level can General
conclude that the new machine is faster?

EXAMPLE 1 continued
Step 1: H0 :   250 H1:   250

Step 2: Level of Significance… .05 (one-tailed test)
X −
Step 3: Select Test Statistic…. t=
s/ n
Step 4: Decision Rule...H0 is rejected if t >1.833, df=9 or p
value less than 0.05
Step 5: Compute t, p value using software and decide...
t = [256 − 250] / [6 / 10 ] = 316

.
...H0 is rejected. The new machine is faster.

MINITAB EXAMPLE
OPEN DATA SET: WORKTIME.MPJ
A helpdesk uses Call Work Time (the length of time in which an

operator is actively processing a customr’s question) as a
productivity indicator.
The manager is interested in whether a particular operator meets
the average CWT target of 30 seconds on a given day.
Commands:
Graph > Time Series Plot
Stat > Basic Statistics > Display Descriptive Statistics (graphical
summary)
Stat > Basic Statistics > 1-Sample t
Stat > Basic Statistics > 1-Sample t > Graphs

Comparing Two Independent Population Means
(Two-Sample t-Test)
Answers the question: “ Are the means of the two

samples equal”….i.e., “Could the two sample
means come from identical populations?”
To conduct this test, three assumptions are

required:
− The populations must be normally or approximately
normally distributed.
− The populations must be independent.
− The population variances must be equal.

Pooled Sample Variance and Test Statistic
Here, the two sample variances must be pooled to

form a single estimate of the unknown population
variance (because we assumed equal std dev).
Pooled Sample Variance:
(n1 − 1) s + (n2 − 1) s
2 2
s =
2 1 2
p
n1 + n2 − 2
X1 − X 2
t =
 1 1
Thus…. s 
2
+ 
p
 n1 n2 
Test Statistic:

Pooled Sample Variance and Test Statistic
In the two-sample t-test determining the

Student’s t is accomplished in three steps:
Step 1: Calculate the sample standard

deviations (s1 and s2) 2
( n − 1) s + ( n − 1) s 2
s2p = 1 1 2 2
n1 + n2 − 2
Step 2: Pool the sample variances...
X1 − X 2
t=
Step 3: Determine t  1 1
s  + 
2
p
 n1 n2 

t-Test
Comparing Two Independent Samples
We will now compare the mean values of two groups. We will

use an attributive factor (input) and quantitative output.
We use the file “compare.mtw”. The assembly line is compared
with respect to yield both before and after the modification.
There are two ways to enter the data:
− Enter the “before” yield in C1 and the “after” yield in C2. This
method is called “unstacked”.
− Enter all the values for the yield in C1 and the “status” in C2.
Minitab identifies C2 as the index variable (subscript).
The second method is preferable. We always want to have
differing variables in different columns and the same variables
in the same columns. There is one column for each input
variable and one column for each output variable.
First we will use the “unstacked” method, so that we can later
look at the Stack function in Minitab.

EXAMPLE 2
A recent EPA study compared the highway fuel

economy of domestic and imported passenger
cars. A sample of 15 domestic cars revealed a
mean of 33.7 mpg with a standard deviation of
2.4 mpg. A sample of 12 imported cars revealed
a mean of 35.7 mpg with a standard deviation of
3.9. At the .05 significance level can the EPA
conclude that the mpg is higher on the imported
cars? (Let subscript 1 be associated with
domestic cars.)

EXAMPLE 2 continued
Step 1: H0 : 2  1 H1: 2  1
Step 2: Significance Level.. .05
X1 − X 2
Step 3: Select Test Statistic… t=
 1 1
s  + 
2
p
 n1 n2 
Step 4: Formulate Decision Rule...
H0 is rejected if t > 1.708, df = 25 or if p < 0.05
Step 5: Calculate and decide… t = 1.64 (Verify.)

H0 is not rejected. There is insufficient sample evidence
to claim a higher mpg on the imported cars.

Hypothesis Testing Involving Paired Observations
Independent samples are samples that are not related in

any way.
Dependent samples are samples that are paired or
related in some fashion.
− For example, if you wished to buy a car you would look at
the same car at two (or more) different dealerships and
compare the prices.
Use the paired t-test when the samples are dependent:
A paired t-test examines whether the mean difference between paired
observations is 0.
The paired t-test can also be used to evaluate whether the mean
difference is equal to a specific value.
Observations must be paired…related in some way. For example
-- Weights recorded for individuals before and after an exercise program
-- Measurements taken on the same process with two different
measurement devices.

Paired t-Test...
A paired t-test can answer such questions as:
- Does a new program improve the service level?
- Has a process change resulted in a process improvement?
In a paired t-test
- The data must be continuous
- The data must be random
- The population of the differences should be normally distributed.
- The following test statistic should be used...
Hypothesis Testing Involving Paired Observations

d
t =
sd / n
Where:
d is the average of the differences between paired observations
sd is the standard deviation of the differences
n is the number of paired observations
•Hypothesis Testing Involving Paired Observations
The average of the differences between paired observations, d is

computed using the formula:
d=
 d
n
The standard deviation of the differences, sd, is computed using
the formula:
(d ) 2
d −
2
sd = n
n −1

Paired Comparison
• Another good example of paired comparison is the comparison of
measurements performed using an online system, to
measurements performed in a lab using the same samples.
• This method is also suitable for examining measurement systems
to determine whether testers obtain the same mean value using the
same samples.
• Let’s look at the file shoe.mtw.
• We’re testing shoe material. We have a sample of 10 boys, and
each boy wears two shoes, each of a different material.
• In this case, the boys represent blocks.

Paired Comparison
Material Wear and Tear - Shoes

Boy Material A Material B
1 13.2(L) 14.0(R)
2 8.2(L) 8.8(R)
3 10.9(R) 11.2(L)
4 14.3(L) 14.2(R)
5 10.7(R) 11.8(L)
6 6.6(L) 6.4(R)
7 9.5(L) 9.8(R)
8 10.8(L) 11.3(R)
9 8.8(R) 9.3(L)
10 13.3(L) 13.6(R)

Paired Comparison
T-Test of the Mean
Test of mu = 0,000 vs mu not = 0,000
Variable N Mean StDev SE Mean T P

Delta 10 -0.410 0.387 0.122 -3.35 0.0085
t-distribution for 9 degrees of freedom

1% 0.4
2.5% 5%
0.3
Oserved Value
Prob
0.2
0.1
0.0
-4 -3 -2 -1 0 1 2 3 4
T-V alue

The “Incorrect” Analysis
We are using the same data and will analyze it again, this time by
comparing two independent samples.
Minitab: Stat>Basic Statistics>2-Sample t...
Two Sample T-Test and Confidence Interval
Two sample T for Mtrl A vs Mtrl B
N Mean StDev SE Mean

Why is one analysis
Mtrl A 10 10.63 2.45 0.78 significant and the other
Mtrl B 10 11.04 2.52 0.80 one is not?
95% CI for mu Mtrl A - mu Mtrl B: ( -2.74; 1.92)

T-Test mu Mtrl A = mu Mtrl B (vs not =): T = -0.37 P = 0.72 DF = 18
Both use Pooled StDev = 2.4
?
Analysis of Variance
(ANOVA)
Black Belt Training

Objectives
To know the concept of variance analysis.

To be able to perform simple analyses with 1 and 2 input
factors.
To be able to determine the mathematical model.
To be able to check the model prerequisites.
To determine the practical significance.
To know the concept of blocking and be able to use simple
Randomized Block Designs.
To be able to perform the ANOVA in Minitab and interpret
the results.

ANOVA (Variance Analysis)
• Previously, we discussed the testing of hypotheses using 2

mean values (t-Test).
• ANOVA is used to test hypotheses with 2 or more mean
values.
Ho:  1 =  2 =  3 =  4
HA: At least one µk is different
Advantage:
To test the NULL HYPOTHESIS (all 4 mean values are equal), we would have
to test hypotheses for 6 combinations using the technique previously described
(t-test). Using the ANOVA technique, we can decide whether to “reject the null
hypothesis” or “keep the null hypothesis” with a single test.

ANOVA -- Underlying Assumptions
The F distribution is also used for testing the

equality of more than two means using a
technique called analysis of variance
(ANOVA). ANOVA requires the following
conditions:
− The populations being sampled are normally
distributed.
− The populations have equal standard deviations.
− The samples are randomly selected and are
independent.

Questions Asked by ANOVA
Are the average distances achieved with

each dimple pattern the same?
Do the 4 samples come from the same population?
Ho : 1 = 2 = 3 = 4
Are some of the 4 population means different?
H1 : At least one  k is different

Analysis of Variance Procedure
The Null Hypothesis: the population means are the

same.
The Alternative Hypothesis: at least one of the means
is different.
The Test Statistic: F = (between sample variance)
(within sample variance)
Decision rule: For a given significance level  , reject

the null hypothesis if F (computed) is greater than F
(table) with numerator and denominator degrees of
freedom.

Example: Comparing More than Two Groups
We are using the example

file “Diets.mtw”. DIET A DIET B DIET C DIET D
Twenty-four animals were 62 63 68 56

fed using one of four diets. 60 67 66 62
Diet is the input variable 63 71 71 60
(factor); blood clotting time 59 64 67 61
is the output variable 65 68 63

(response). 66 68 64
The diets were assigned to 63
the animals randomly. Blood 59
samples were taken and

tested in a random
sequence. Why?

Performing ANOVA in Minitab
We perform ANOVA in Minitab

Stat>ANOVA>One-way
One-way Analysis of Variance
Analysis of Variance for Coagtime
Source DF SS MS F P
Diet 3 228.00 76.00 13.57 0.000
Error 20 112.00 5.60
Total 23 340.00
Individual 95% CIs For Mean
Based on Pooled StDev
Level N Mean StDev ---+---------+---------+---------+---
1 4 61.000 1.826 (------*------)
2 6 66.000 2.828 (-----*----)
3 6 68.000 1.673 (----*-----)
4 8 61.000 2.619 (----*----)
---+---------+---------+---------+---
Pooled StDev = 2.366 59.5 63.0 66.5 70.0

ANOVA Table
The ANOVA table is an important result of ANOVA
If the p-value is less than 5%,
there is a difference in the
One-Way Analysis of Variance mean value of at least one
group. In this case we reject
the null hypothesis indicating
Analysis of Variance on Coag Time that the mean values of all
groups are equal. The mean
Source DF SS MS F P value of at least one diet is
different from the others.
Diet 3 228.00 76.00 13.57 0.000
An F-test of this magnitude
Error 20 112.00 5.60 may also occur randomly, but
only at a frequency of 1 per
Total 23 340.00
10,000 occasions. That
corresponds to getting heads
thirteen times in a row with a
The F-test is near 1.00 when fair coin.
the group mean values are
similar. In this case the F-test
is much higher.

Let’s Try This Example
A golf ball designer needs to choose between 4 dimple patterns and

is concerned with their effect on the distance a golf ball travels.
There are 24 golf balls with 4 dimple patterns.
Dimple pattern is the Input variable; Distance traveled is the output
variable.
Golf balls were assigned randomly to Iron Byron who was using the
USGA approved test driver. The golf balls were tested in random
order. (Why?)
Dimple 1 Dimple 2 Dimple 3 Dimple 4

277 281 304 250
268 299 295 277
281 317 317 268
263 286 299 272
290 304 281
295 304 286
281
263

Variance

Hypothesis Testing of Variances-Roadmap
Comparing Variances
1 Sample 2 Samples 2 or more

samples
1 variance 2 variance Test for equal

test test Variance
Chi-Square test F-Test Levene’s Bartlett’s Levene’s

test test Test

Characteristics of F-Distribution
There is a “family” of F Distributions.

Each member of the family is determined by two
parameters: the numerator degrees of freedom and the
denominator degrees of freedom.
F cannot be negative, and it is a continuous distribution.
The F distribution is positively skewed.
Its values range from 0 to  . As F →  the curve
approaches the X-axis.

Variance Homogeneity Applications
There are three main applications for the F-Test:
− A prerequisite for performing the 2-sample t-Test.
− To examine variance homogeneity during process improvement.
Remember, there are two primary ways to improve processes:

 Reduce variation
 Increase tolerances
− During variance analysis (ANOVA), will be discussed later.

Test for Equal Variances
For the two tail test, the test statistic is given by:
2
S
F = 1
2
S 2
S12 and S22 are the sample variances for the two samples.
The null hypothesis is rejected if the computed test
statistic is greater than the critical (table) value with
confidence level  / 2 and numerator and denominator
degrees of freedom.

Variance Homogeneity
If the data is normally distributed, either of the following tests
may be used:
− The F-Test
Only used if there are two distributions
(Minitab performs this test under
Stat>ANOVA>Test for Equal Variance ...).
− Bartlett's Test
May be used for two or more distributions
(Minitab also performs this test under
Stat>ANOVA> Test for Equal Variance ...).
If the data is not normally distributed, there is only one option:

− Levene's-Test
This test may be used for two or more distributions
(Minitab also performs this test under
Stat>ANOVA> Test for Equal Variance ... ).

Variance Homogeneity
The test indicates a significant difference if the calculated p-value

is lower than the specified alpha value (the acceptable risk; 95%
confidence has an alpha value of 0.05 or 5%). A calculated p-
value of less than 0.05 indicates a significant difference between
two distributions.
Please note that the test only indicates a statistically-significant

difference. It does not indicate which of the two (or more)
processes is better.
Statistical significance does not necessarily mean practical

significance.

F-Test
Performing the Test
The F-Test tests two distributions
The calculation can be performed manually using a calculator. The
formula is:
Fcalc = s12/s22
where : s12 = variance of one distribution , s22 = variance of the other
distribution. The larger variance always serves as the numerator.
The critical f-value is read from an f-table, where n-1 equals the
number of degrees of freedom for the numerator and denominator.
If the numerator and denominator have different sampling sizes, the
correct value must be used for each factor.

Example
Open the file compare.mpj
The data must be stacked to perform the test
In Minitab: Stat >ANOVA >Test for Equal Variance
Response Factor
89.7 1
Test for Equal Variances for Response 81.4 1
84.5 1
95% Confidence Intervals for Sigmas Factor Levels
84.8 1
1 87.3 1
79.7 1
2 85.1 1
2 3 4 5 6 7 81.7 1
83.7 1
F-Test Lev ene's Test 84.5 1
Test Statistic: 0,632 Test Statistic: 0,775
84.7 2
P-Value : 0,505 P-Value : 0,390
86.1 2
Boxplots of Raw Data
83.2 2
91.9 2
1
86.3 2
79.3 2
82.6 2
2
89.1 2
83.7 2
80 85 90
Response 88.5 2

F-Test
or Ausbeute
• Depending on whether the samples are

normally distributed or not, either the F-test
(both samples normally distributed) or the
F-Test Levene´s-Test (at least one sample not normally
distributed) must be interpreted.
Test Statistic: 0,632
• This means that each sample must be checked
P-Value : 0,505 for normal distribution prior to the test.
Stat>basic statistic>normality test
Levene's Test
Because the samples are distributed
normally we interpret the F-test.
Test Statistic: 0,775 We can see no deviation in
variance homogeneity.
P-Value : 0,390

Proportion
Black Belt Training

Comparing Proportions
Attribute Data
One Factor Two Factors

Two Two or More
One Sample Samples Samples
One Sample Two Sample Chi Square Test

Proportion Proportion (Contingency Table)
Minitab: Minitab: Minitab:
Stat - Basic Stats - 1 Stat - Basic Stats - 2 Proportions Stat - Tables - Chi-Square Test
Proportion If P-value < 0.05 the proportions If P-value < 0.05 at least one
If P-value < 0.05 a difference are different proportion is different
exists between the proportion
Chi Square Test
and target. (Contingency Table)
Now we are comparing Proportions Data

Minitab:
Stat - Tables - Chi-Square Test
If P-value < 0.05 the factors are not
independent

1 Proportion Test
target
vs . value
P
Practical Question ‘‘Is the population proportion
(example) statistically different from the
target value?”
Statistical Question
Ho : P = target value
Ha : P ≠ target value

Comparing Two Proportions
This test is used to determine if the process defect rate (or
proportion, p) of one sample differs by a certain amount D from
that of another sample (e.g., before and after your improvement
actions)
The hypotheses:
H0: p1 - p2 = 0
Ha: p1 - p2  0
The test statistic is calculated as follows:
p̂1 − p̂ 2
Z obs =
p̂(1 − p̂ )(1 n 1 + 1 n 2 )
where
X1 + X 2
p̂ =
n1 + n 2
This is compared to Zcritical = Z/2

East West
32 135
Product A
Product B 32 80
Product C
42 98
Chi Square - Test For Independence

Remember this Example?
The Personnel Department wants to see if there is a link between age
(old and young) and whether that person gets hired
Got Hired
What’s the Y ? _____________ Discrete
Type of Data ? ______________
Age
What’s the X ? _____________ Discrete
Type of Data ? ______________
Chi-Square

The Data
Hired Not Hired Total
Old 30 150 180
Young 45 230 275
Total 75 380 455
How Do You Make The Decision Here?

The Hypothesis
With the Chi-Square Test for Independence,

statisticians assume most variables in life are
independent, therefore:
Ho: Data is Independent (Not Related)
Ha: Data is Dependent (Related)
If the P Value is <.05 , then reject Ho

Step #1
We must develop an Observed Frequency Table by
breaking our 2 variables into different levels:
Age: Old & Young

Hiring Practices: Hired & Not Hired
We then collect data to perform the analysis.
Hired Not Hired
Old 30 150
Young 45 230

Step #2
Calculate Column & Row Totals
Old 30 150 180
Young 45 230 275
Total 75 380 455

Step #3
Develop an expected frequency table. That is, what should

this table look like if these if these 2 factors are really
independent?
Hired Not Hired
Old
Young
How do we do that?

Step #3 Continued
Develop an expected frequency table. That is, what should this table
look like if these if these 2 factors are really independent?

75 x 180
Old = 29.6 ___ 180
455
Young ___ ___ 275
Total 75 380 455
Cell’s expected frequency is:
(Column Total) * (Row Total)

Grand Total

Step #3 Continued
29.6 is what we would expect if the 2 factors are really independent
Old 29.6 150.3

___ 180
Young 45.3
___ ___
229.7 275
Total 75 380 455
You finish the table!

Step #4
Subtract the expected value from the observed (O-E)
Old 30-29.6=.4 -0.3

___ 180
Young ___
-0.3 ___
0.3 275
Total 75 380 455

Step #5
Square the Differences (O-E)^2
Old (.4)*(.4)=.16 .09

___ 180
Young ___
.09 ___
.09 275
Total 75 380 455

Step #6
Compute the Relative Squared Differences (O-E)^2 / E
Old .16 / 29.6 = .005 .0006

___ 180
Young ___
.002 ___
.0004 275
Total 75 380 455

So What?
The sum of the relative squared differences is
distributed as a Chi Square distribution!
0 1 2 3 4 5
If there is independence, we expect the difference to

be close to 0. The further away we are, the more likely the
variables are dependent. To help us make that decision, we will
rely on the P value.

Chi Square -
Test For
Independence
Collect Data
Run Minitab
Tables
Chi-Square
Command
Evaluate The
P Value
Examine
Contingency
Table
Make
Decision

Analyzing The Data In Minitab
Chi-Square Test
Note:
Expected counts are printed below The observed and expected
observed counts counts are the same values you
Hired Not Hire Total calculated a moment ago
1 30 150 180
29.67 150.33
2 45 230 275
45.33 229.67
Total 75 380 455
Chi-Sq = 0.004 + 0.001 + What Decision Would You Make?

0.002 + 0.000 = 0.007
DF = 1, P-Value = 0.932
A P-Value !

Chi-Square Comments
Chi-Square is the least “insightful” and usually one of

the more “difficult to analyze” tools we learned this
week. But that is what happens when we deal with
attribute data.
You must have at least FIVE expected frequencies for the

Chi-Square Test to work or Minitab will crash.
Your data should have been gathered to assure

randomness. Beware of other hidden factors (X’s).

Excercise 1
A) You determine the faulty orders from 2 regions.

Faulty Orders Correct Orders
Region 1 110 420
Region 2 110 400
 Is there a difference between the regions? P = .........

B) You receive additional information regarding the faulty orders.
Error 1 Error 2 Correct Orders
Region 1 90 20 420
Region 2 60 50 400
 Is there a difference between the regions? P = ..........
What are your conclusions?

Decision risk in hypothesis testing
Hypothesis testing involves the risk of reaching an incorrect
conclusion, as the decision is taken based on sample observations.
You might wrongly reject H0 (Type-1 error ) or wrongly fail to reject
H0( Type-II error).
Type-1 error : Rejecting null hypothesis (H0) , when it is actually true is

called as type-1 error. The probability of type-1 error is α-risk or level
of significance (α). ( Ex: Adjusting a machine when it is not required)
Type-II error : Failing to reject null hypothesis when it is actually false

is called type –II error. Probability of type-II error is β-risk. (Ex: Not
adjusting a machine when it requires adjustment).
Note :Traditionally , decision maker choose a significance level (α) of

0.01,0.05 or 0.1 depending on the cost of making a type-1 error. Most
commonly used significance level is 0.05.

P-value : The observed level of significance
P-value is the probability of rejecting null hypothesis when

H0 is true. It is calculated from the sample data.
A small p-value indicates strong evidence against H0.
Therefore , we reject H0.
On the other hand, a large p-value indicates little or no
evidence against H0. Therefore, we can not reject H0.
Many people confuse these rules, mistakenly believing that a high p-value
is reason for rejection. One can avoid this confusion by remembering the
following:
If the p-value is low , then H0 must go.

Hypothesis and Decision Risk
When accepting or rejecting a hypothesis, we do so with a

known degree of risk and confidence
To do so, we specify in advance of the investigation the
magnitude of decision risk and test sensitivity which is
acceptable
Once this has been accomplished, we have the information
necessary to determine an ideal sample size
We must then consider the practical limitations of cost, time
and available resources in order to arrive at a rational sampling
plan

About the Null Hypothesis...
The Null Hypothesis (Ho) is assumed to be true

This is like the defendant being presumed to be “Not
Guilty”
Remember: The Indian justice system is NOT based on
“guilty until proven innocent”
We don’t assume that our experiment has an effect
until the probability of “no effect” is too small to
believe
You are the prosecuting attorney - you must provide
evidence beyond a “reasonable doubt”
NOTE:
Not Guilty  Innocent
Decision Errors
In deciding to Reject or Not, we could make one of

two decision errors
Your Decision
Accept Ho Reject Ho
Type I
Ho True Correct Error
The (-Risk)
Truth
Type II Error
Correct
Ho False ( -Risk)

Example: A Trial
Jury’s Decision
He’s Not Guilty He’s Guilty Consequence:

Innocent Man
Actually Type I Goes to Jail
Innocent Correct Error
(-Risk)
The
Truth
Type II Error
Correct
Actually ( -Risk)
Guilty
Consequence: Criminal goes Free

Example 1
A pharmaceutical dispenser that is supposed to dispense 25 ml of agent

was calibrated to dispense 25 ml quantities into 10 previously-weighed
containers. The actual quantities dispensed were:
25.01 ml, 24.89 ml, 25.10 ml, 24.95 ml, 24.97 ml,
25.04 ml, 25.08 ml, 24.91 ml, 25.07 ml, 24.85 ml
Test the null hypothesis that says the dispenser provides 25 ml agent
against the test hypothesis that this is not the case.

Example 2
Water hardness is measured in order to determine calcium ion concentrations (in
ppm). The hardness of water in hot and cold water lines for a manufacturing
process were measured. A technician objected, stating that warm water was
“harder” than cold water. The hardness of the various samples is as follows:
Warm water: 133.4, 135.4, 137.1, 138.4, 136.3, 137.1,

133.3, 136.5, 137.6, 139.5
Cold water: 134.1, 134.7, 136.0, 131.7, 134.7, 135.2,

135.9, 135.6, 135.8, 132.2
Test the null hypothesis that the warm and cold water have the same calcium
concentration, against the test hypothesis that warm water has a higher
concentration.

Example 3
A chemical company manufactures paint thinner. The content of ethyl

alcohol in the pint thinner is set at 3%. To determine whether the
manufacturing process has exceeded the 3% threshold, 20 samples of
thinner are taken. The ethyl alcohol concentrations were determined as
follows:
4.2, 5.3, 3.5, 4.3, 3.7, 3.2, 3.5, 2.8, 3.5, 3.7,
2.8, 3.3, 2.7, 3.0, 3.1, 3.0, 3.7, 3.3, 3.4, 2.3
Test the null hypothesis that the process is unchanged (3% ethyl
alcohol), against the test hypothesis that the mean values of the
process are more than 3%.

Example 4
A manufacturer of foils has implemented a new process to reduce the weight of the
product. The foil strength is an important variable affecting the weight. Foils in eight
different strengths were manufactured using both the old and new methods. The weight
(in grams) for each combination is shown below:
Strength Old Process New Process

1 154 152
2 159 152
3 169 171
4 176 167
5 183 182
6 199 194
7 200 204
8 213 208
Test the null hypothesis that there is no difference in weight between the old and new
process, against the test hypothesis that the new process has reduced the weight of the
foil.

Example 5
Two brands (A and B) of air conditioner dust filters were tested to determine
whether one was better than the other. All filters were tested on the same system,
and the dust quantity (in grams) filtered over a 6 hour period was measured. The
data obtained for the two filters is as follows:
Filter A: 9.1, 11.8, 1.5, 7.2, 4.2, 9.6, 8.7, 10.2, 4.4, 7.8, 4.3
Filter B: 15.6, 9.3, 16.9, 5.1, 14.5, 19.0, 10.3, 12.5, 13.3, 16.1, 2.6
Test the null hypothesis there there is no difference in average dust quantity
filtered against the test hypothesis that one filter is better than the other.

Correlation and
Regression

Correlation
❖If two variables X and Y, are related such that as
Y increases / decreases with another variable X, a
correlation is said to exist between them.
❖A scatter diagram is a chart that pictorially depicts

the relationship between two such data types.
Some Examples of Relationship
•Cutting speed and tool life
•Moisture content and thread elongation
•Breakdown and equipment age
•Temperature and lipstick hardness
•Striking pressure and electrical current
•Temperature and percent foam in soft drinks
Scatter Diagram of Automotive Speed vs. Mileage
40
35
Mileage (km/Lit)
30
25
20
15
25 35 45 55 65 75
Speed (km/h)

Scatter diagram
• A scatter diagram depicts the relationship as a
pattern that can be directly read.
• If Y increases with X, then X and Y are positively
correlated.
• If Y decreases as X increases, then the two types of
data are negatively correlated.
• If no significant relationship is apparent between X
and Y, then the two data types are not correlated.

Correlation (r): The Strength of the Relationship
Y Y Y
X X X
Strong Positive Correlation Moderate Positive Correlation No Correlation
r = .95 r = .70 r = .006
R2 = 90% R2 = 49% 2
R = .0036%
Y Y Y
X X X
Strong Moderate Other Pattern -
Negative Correlation Negative Correlation No Linear Correlation
r = -.90 r = -.73 r = -.29
R2 = 81% R2 = 53% R2 = 8%

DATA ON CONVEYOR SPEED AND SEVERED LENGTH
Sl. No. Conveyor Severed Sl. No. Conveyor Severed

Speed Length Speed Length
(cm/sec) (mm) (cm/sec) (mm)
1 8.1 1046 16 6.7 1024
2 7.7 1030 17 8.2 1034
3 7.4 1039 18 8.1 1036
4 5.8 1027 19 6.6 1023
5 7.6 1028 20 6.5 1011
6 6.8 1025 21 8.5 1030
7 7.9 1035 22 7.4 1014
8 6.3 1015 23 7.2 1030
9 7.0 1038 24 5.6 1016
10 8.0 1036 25 6.3 1020
11 8.0 1026 26 8.0 1040
12 8.0 1041 27 5.5 1013
13 7.2 1029 28 6.9 1025
14 6.0 1010 29 7.0 1020
15 6.3 1020 30 7.5 1022

Scatter Diagram for Conveyor Speed and Severed Length
1050
1045
1040
Severed Length (mm)
1035
1030
1025
1020
1015
1010
1005
1000
5 5.5 6 6.5 7 7.5 8 8.5 9
Conveyor Speed (cm/sec)

USES OF SCATTER DIAGRAM
❖ If an increase in Y depends on increase in X, then,

if X is controlled, Y will be naturally controlled.
❖ If X is increased, Y will increase somewhat. Then

Y seems to have causes other than X.

REGRESSION
❖ Regression is the prediction of dependent variable
from knowledge of one or more other independent
variables.
❖ Regression Analysis is a statistical technique for
estimating the parameters of an equation relating
a particular value of dependent variable to a set of
independent variables. The resulting equation is
called Regression Equation.
❖ Linear regression is the regression in which the
relationship is linear.
❖ Curvilinear regression is the regression in which
the best fitting line is a curve.
SIMPLE LINEAR REGRESSION
❖ Only a single predictor variable or independent
variable ‘X’ (e.g.: cutting speed) and a response
variable or dependent variable ‘Y’ (e.g: tool life).
The regression equation is

Y = a+b X

where, Y = Predicted value of Y
a = Intercept (the predicted value of Y when X = 0)
b = Slope of the line (the amount of difference in Y
associated with a 1 - unit difference in X)

Design of Experiment

STATISTICALLY DESIGNED EXPERIMENTS
◆ A statistically designed experiment permits simultaneous consideration of all
the possible factors that are suspected to have bearing on the quality
problem under investigation and as such even if interactions effect exist, a
valid evaluation of the main effect can be made. Scanning a large number
of variables is one of the ready and simpler objectives that a statistically
designed experiment would fulfill in many problem situations.
◆ Even a limited number of experiments would enable the experimenter to

uncover the vital factors as which further trials would yield useful results.
The approach has number of merits, it is quick, reliable and efficient.

Objectives Of Experimentation
The following are some of the objectives of experimentation in an
industry :
➢ Improving efficiency or yield

➢ Finding optimum process settings
➢ Locating sources of variability
➢ Correlating process variables with product characteristics
➢ Comparing different processes, machines, materials etc
➢ Designing new processes and products.

Design of experiments
Design of experiments (DOE) is a valuable tool to optimize product and

process designs, to accelerate the development cycle, to reduce
development costs, to improve the transition of products from research and
development to manufacturing and to effectively trouble shoot manufacturing
problems. Today, Design of Experiments is viewed as a quality technology to
achieve product excellence at lowest possible overall cost.

Traditional approach
One-factor-At-A-Time
This is a traditional method of experimentation which tests, then changes, one factor
at a time to allow for observation and comparison. Note on the example below, all 8
factors are varied one-at-a-time . It is efficient because it takes only 16 runs.
•A1 and A2 are evaluated by comparing Result - 1 and Result - 2

•B1, B2 and B3 are evaluated by comparing Result-2, Result-3 and Result-4.
•C1, C2, and C3 are evaluated by comparing Result-4, Result-5 and Result-6
•Etc.
Run No. A B C D E F G H Re sult
1 1 1 1 1 1 1 1 1 Result 1
2 2 1 1 1 1 1 1 1 Result 2
3 2 2 1 1 1 1 1 1 Result 3
4 2 3 1 1 1 1 1 1 Result 4
5 2 3 2 1 1 1 1 1 Result 5
6 2 3 3 1 1 1 1 1 Result 6
7 2 3 3 2 1 1 1 1 Result 7
8 2 3 3 3 1 1 1 1 Result 8
9 2 3 3 3 2 1 1 1 Result 9
10 2 3 3 3 3 1 1 1 Result 10
11 2 3 3 3 3 2 1 1 Result 11
12 2 3 3 3 3 3 1 1 Result 12
13 2 3 3 3 3 3 2 1 Result 13
14 2 3 3 3 3 3 3 1 Result 14
15 2 3 3 3 3 3 3 2 Result 15
16 2 3 3 3 3 3 3 3 Result 16

Traditional approach
Problem: Current Car gas mileage is 20 mpg. Would like to
get 30 mpg.
We might try:
 Change brand of gas
 Change octane rating
 Drive Slower
 Tune-up Car
 Wash and wax car
 Buy new tires
 Change Tire Pressure
What if it works?
What if it doesn’t?
“Survey Says” These variables greatly effect MPG

One-Factor-At-A-Time
Problem: Fuel economy we want is 30 MPG

Try changing each input variable at two settings believed to be
associated with dramatically changing fuel economy. See what
happens.
Speed Octane Tire Pressure MPG
55 85 30 23
60 85 30 29
60 90 30 23
60 85 35 24
How many more Combinations would you need to figure out the best
combination of variables? (3 Variables at two settings; 2x2x2 = 8 total)
How can you explain the above results? (Combination 2 is the answer)
If there were more variables, how long would it take to get a good solution?
(Multiply by another 2 for each one)
What if there’s a specific combination of two or more variables that leads to
the best mileage? (Too hard for me to figure out; What do you think?)

Full Factorial Experiment
Problem: Want 30 MPG
Speed Octane Tire Pressure MPG

55 85 30 23
60 85 30 29
55 90 30 37
60 90 30 23
55 85 35 37
60 85 35 24
55 90 35 30
60 90 35 36
OFAT Runs
What conclusion do you make now?

(Murphy is alive and well!)
TERMINOLOGY USED IN D.O.E.
EXPERIMENT: A planned set of operations which leads to a corresponding set of

observations. The purpose of experimentation is to ensure that the experimenter
obtains the data relevant to the task of decision making in an economical way.
OUTCOME (RESPONSE): The numerical result of a trial based on a given treatment

combination is called Outcome or Response.
The response may be :

− Continuous or measurement type and follows a normal distribution
− Continuous or measurement type but does not follow normal distribution
− Discrete or count type and does not follow normal distribution
E.g.: diameter of a shaft, No. of rejected cylinders etc.

FACTOR (X) - The parameters of the process which are deliberately varied from trial
to trial. This could be qualitative or quantitative. e.g. Speed, feed, coolant rate,
operator skill.
LEVELS OF A FACTOR - The alternative values of a factor considered in the
experiment are called its levels.
e.g.: Speed 400 rpm, circular wheel etc.
TREATMENT COMBINATION - The set of levels of all factors employed in a given trial
is called treatment or treatment combination.
EXPERIMENTAL UNIT : It is a generic term used to denote the group of material to
which a treatment is applied in a single trial of the experiment.
BALANCED TEST - Where number of samples in each treatment combination is same.

EFFECT OF FACTOR :
MAIN EFFECT: The change in the average response produced by a change in the
level of the factor is called “Main Effect” of that factor.
INTERACTION EFFECT : If the effect of one factor is different at different levels of

another factor, the two factors are said to interact (or) to have interaction.
The interaction between factors A and B, is termed as “First Order Interaction” or

“Two Factor Interaction” and is denoted by AxB.
If the interaction between two factors A and B, is different at different levels of a

third factor C, then there is said to be interaction among three factors. This is
referred to as “Second Order Interaction” or “Three Factor Interaction” and is
denoted by AxBxC.

Interactions
Y = f (X1, X2). But if X2 = f (X1)
Then changing X1 will give other than predicted Y since X2 also
automatically changes.
The same holds true for change of x2
e.g: leakage of dome welded components is a function of current and

electrode thickness but current also depends on electrode thickness.
Hence there is interaction between electrode and current

An example to understand interaction
F
I
N
I
S
H
Speed X Speed Y
Changing feed from level A to level B betters finish.

But this effect is more predominant speed level Y than speed level X.
Hence there is an interaction between speed & feed
REPLICATION: Replication is a repetition of the whole experiment in order to
estimate experimental error, increase precision (detect smaller changes).
EXPERIMENTAL ERROR: The failure of two identically treated experimented

units to give the same value.
STEPS IN DESIGNING AND ANALYZING
1. Statement of the problem.
2. Formulation of hypothesis.
3. Planning of the experiment.
a) Choosing an appropriate experimental technique.
b) Examination of possible outcomes to make sure that the experiment
provides the required information.
c) Consideration of possible results from the point of view of statistical
analysis.
4. Collection of data, after performing the experiment according
to the plan.
5. Statistical Analysis of the data.
6. Drawing conclusions with appropriate level of significance.
7. Verification or evaluation of results (conclusions).
8. Drawing final conclusions and recommendations.
PLANNING FOR EXPERIMENTATION
The various steps to be followed in this direction are listed below :

➢ Selection of area of study : Pareto analysis
➢ Proof of the need for experimentation
➢ Brain storming and Cause & Effect diagram : To list all the possible factors
➢ Classification of factors
➢ Interactions to be studied
➢ Response and type of model for analysis

Classification of factors
Tools like brainstorming and cause & effect diagrams helps in identification of
factors and preparing a complete list of the factors involved in any experiment.
Factors listed can be classified into three categories :
1. Experimental Factors
Experimental factors are those which we really experiment with by varying them at
various levels.
2. Control Factors
Control Factors are those which are kept at a constant (controlled) level throughout
experimentation.
3. Error or Noise Factors
Error or Noise factors are those which can neither be changed at our will nor can
be fixed at one particular level. Effect of these factors causes the error component
in the experiment and as such these factors are termed as error or noise factors.
Note : At the planning stage itself all the factors viz. Experimental, Control and error should
be recognized. This will help to tackle them appropriately during experimentation.

PLANNING FOR EXPERIMENTATION
State what do you want
What is my response(s)
What are my factors
Choose the level of the factors
Decide on the design
Run the design and collect the data
Analysis the data and obtain results
Run confirming test on settings

Requisites of DOE
UNBIASEDNESS
PRECISION
INDUCTIVE SCOPE
CLEARLY DEFINED OBJECTIVES
Fulfillment of the requirements
1. RANDOMISATION
2. REPLICATION AND
3. LOCAL CONTROL OR ERROR CONTROL

Replication
Definition
Replication means repeating all the experimental conditions (or
running a combination) two or more times.
− This does not mean measuring an experimental unit twice
− It does mean repeating a certain set of conditions and measuring the new
output
− Two replicates means that for an 8-run design you will do 16 runs in one
experiment
 Minitab will randomize all the runs (including replicates) at the same time
 If for some reason you cannot, or choose not to, do all the runs at the
same time, you need to be concerned about blocking (a topic we’ll
discuss later in this module)
− One replicate really means no replication

Randomization: the Experimenter’s Insurance
Definition
To assign the order in which the experimental trials will be run
using a random mechanism
− It is not the standard order
− It is not running trials in an order that is convenient
− To create a random order, you can “pull numbers from a hat” or have
Minitab randomize the sequence of trials for you
Why?
− Averages the effect of any lurking variables over all of the factors in
the experiment
 Prevents the effect of a lurking variable from being mistakenly
attributed to another factor
− Helps validate statistical conclusions made from the experiment

Local control
By local control is meant blocking, grouping or balancing the experimental

units. Balancing is done by replicating all the treatment combination, the same
number of times under different conditions. Local control makes the test more
sensitive and powerful, by reducing the experimental error.

A WORD OF ADVICE
It is observed that only 2 to 6 variables end up being vital few.
Try to keep the design simple by utilizing your experience to decide
which are the most likely factors unless you know nothing of the
process.
The above calls for judgement which sometimes can be wrong.
REMEMBER:
The Experiment is Run to Understand

Reality, Not the Data
Full Factorial Experiments
Wear of pin is an important criteria in affecting field life of

a component.
It is believed that hardness of pin is an important
parameter affecting it.
Hence experiments are carried out to check wear on :
− Pin of hardness in range of 60 - 62 RC
− Pin of hardness in range of 66 - 68 RC

Seek the answers to the following questions
What is your response ?

How many Factors [f] ?
How many Levels [L] ?
The experiment is Lf
How many combinations/runs are possible ?
How many runs do you plan to carry out ?

SEEK THE ANSWERS TO THE FOLLOWING QUESTIONS
What is the response ? Wear

How many Factors [f] ? 1
How many Levels [L] ? 2
The experiment is Lf 21
How many combinations/runs are possible ? 2
How many runs do we plan to carry out ? 4 [Taking 2
replication]
HENCE IT IS A 21 FULL FACTORIAL.

22 Full factorial experiments
It is believed that pin wear depends on

− Hardness
− Oil flow
The levels of hardness are

− 60 - 62 Rc
− 66 - 68 Rc
The levels of oil flow is

− 20 cc / min
− 120 cc / min

Number of Factors : 2
Number of Levels : 2
Possible Runs : 22
Nos. we plan to carry out: 4
Hence it is a 22 full factorial experiment.
Similarly you have 23 and 24 full factorial experiments for 3 and 4

factors respectively.

EXAMPLE- 22 FACTORIAL EXPERIMENT
Consider a chemical process of Silicate Mfg. It is felt that
Temperature and Concentration are the contributors to increase
residue.
The factors and levels are as below

Factor -1 +1
Temp. 40ºC 80ºC
Conc. Low High
-1 signifies one level (normally lower) and +1 signifies the other level
(normally higher)

It is now believed that residue depends on concentration of Acid and Temperature of bath.
RUN CONC. TEMP. RESIDUE

1 Low 40 20.4
2 Low 40 19.3
3 Low 40 17.6
4 Low 40 16.3
5 Low 80 9.7
6 Low 80 16.4
7 Low 80 14.8
8 Low 80 12.3
9 High 40 17.4
10 High 40 17.7
11 High 40 23.2
12 High 40 20.4
13 High 80 15
14 High 80 24
15 High 80 15.6
16 High 80 15.2

WHAT DO WE WANT TO FIND ?
We want to find that

− Does concentration and temperature have any effect on residue.
− Of concentration and temperature which is more important .
− What is the ideal and feasible level of the process settings.
− Does any interaction exist between temperature and concentration.
− Is there any problem with data or model adequate ?
How Do We Find This. ?
Let us do together using MINITAB

Consider another setup of surface cleaning. It is felt that Time, Temp.
and Conc. are the contributors.
The factors and levels are as below

Factor -1 +1
Temp. R.T. 90ºC
Time 3 mins 10 mins
Conc. Low High
-1 signifies one level (normally lower) and +1 signifies the other level
(normally higher)

HOW MANY FACTORS? 3
HOW MANY LEVELS? 2
HOW MANY RUNS WOULD BE THERE IDEALLY? 8
HOW MANY YOU PLAN TO RUN? 8
WHICH EXPERIMENT?
23 FULL FACTORIAL EXPERIMENT

EXAMPLE: THE PROBABLE COMBINATIONS ARE
NO. TEMP. TIME CONC.

1 RT 3 mins Low
2 90 3 mins Low
3 RT 10 mins Low
4 90 10 mins Low
5 RT 3 mins High
6 90 3 mins High
7 RT 10 mins High
8 90 10 mins High
This is called an array

Since it contains all possible combinations. It is a full factorial array
It is also called orthogonal array
If columns are orthogonal we can estimate the effect of a variable independent of the other
variables

Designing the Experiment
Minitab Steps for Designing Full Factorial Experiments:
1. GO TO STAT > DOE> Factorial > Create Factorial Design > Type of Design > No of
Factors> Click Designs > Select Full Factorial > Enter no of Replications > OK>Click
Factors > Enter Factor names and Levels>OK > Click Options >Select Randomization
as required > OK>OK Select type of Design
Input number of Factors
Click Design
Select Full Factorial
Select No. of Replicates
Click OK
Minitab Steps for Designing Full Factorial Experiments:

1. GO TO STAT > DOE> Factorial > Create Factorial Design > Type of Design > No of
Factors> Click Designs > Select Full Factorial > Enter no of Replications > OK>Click
Factors > Enter Factor names and Levels>OK > Click Options >Select Randomization as
required > OK>OK
Enter Factor name & Levels
Deactivated if you don’t want to randomize it.

The Design out put along with the data obtained after conducting experiment.
StdOrder RunOrder CenterPt Blocks Tempareture Time Concentration Response

1 1 1 1 RT 3mins Low 65
11 2 1 1 RT 10 mins Low 43
13 3 1 1 RT 3mins High 61
12 4 1 1 90 10 mins Low 45
5 5 1 1 RT 3mins High 58
15 6 1 1 RT 10 mins High 50
3 7 1 1 RT 10 mins Low 50
7 8 1 1 RT 10 mins High 52
10 9 1 1 90 3mins Low 42
8 10 1 1 90 10 mins High 41
14 11 1 1 90 3mins High 43
9 12 1 1 RT 3mins Low 65
16 13 1 1 90 10 mins High 45
6 14 1 1 90 3mins High 45
4 15 1 1 90 10 mins Low 41
2 16 1 1 90 3mins Low 44
Note here the second column gives the run order on which the experiment
has to be conducted.

04 - ISI Six Sigma Analysis

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

04 - ISI Six Sigma Analysis

Uploaded by

Copyright:

Available Formats

SIX SIGMA

216 Indian Statistical Institute,SQC & OR Unit, Mumbai

217 Indian Statistical Institute,SQC & OR Unit, Mumbai

IN MEASURE PHASE, WE DEALT WITH Y’s.

219 Indian Statistical Institute,SQC & OR Unit, Mumbai

To better understand your

− Create a flowchart of your

− Determine cycle time and identify bottlenecks.

220 Indian Statistical Institute,SQC & OR Unit, Mumbai

Flowcharts are tools that make a process visible.

Start Step 1 Step 2 Step 3

221 Indian Statistical Institute,SQC & OR Unit, Mumbai

− Activity flowcharts − Deployment flowcharts

222 Indian Statistical Institute,SQC & OR Unit, Mumbai

Hotel Check-out Process Process Name

223 Indian Statistical Institute,SQC & OR Unit, Mumbai

Deployment flowcharts show People or groups

process and which people Sales Billing Shipping Customer Elapsed

in processes that involve 3

functions, as they help Notifies billing

highlight handoff areas. 5

224 Indian Statistical Institute,SQC & OR Unit, Mumbai

Basic Activity Deployment

• To illustrate where in • To identify rework • To clarify roles and

Which flowchart do you intend to use for your project?

225 Indian Statistical Institute,SQC & OR Unit, Mumbai

When creating a flowchart, work with a group

− Arrange the steps in sequence

− Check for missing steps or decision points

226 Indian Statistical Institute,SQC & OR Unit, Mumbai

Flowcharts can map four different perspectives on a process:

227 Indian Statistical Institute,SQC & OR Unit, Mumbai

Process Steps [As we think]

Put original Adjust Press Remove

228 Indian Statistical Institute,SQC & OR Unit, Mumbai

Take Original Copier Yes Wait? No Leave

Place No Glass Yes

Select Select Select Paper? No Find

Yes Paper No Find

Start Copy Yes Quality No Stop

Remove Collect Staple Clear

229 Indian Statistical Institute,SQC & OR Unit, Mumbai

230 Indian Statistical Institute,SQC & OR Unit, Mumbai

Value-Added Activities Nonvalue-Added Activities

• Entering order • Waiting

231 Indian Statistical Institute,SQC & OR Unit, Mumbai

Example: Changeover process

232 Indian Statistical Institute,SQC & OR Unit, Mumbai

233 Indian Statistical Institute,SQC & OR Unit, Mumbai

− Understand the need to organize potential causes visually

− Know when and how to construct a C&E diagram

− Know when and how to create and use a tree diagram or an

234 Indian Statistical Institute,SQC & OR Unit, Mumbai

BASIC RULES FOR BRAINSTORMING

235 Indian Statistical Institute,SQC & OR Unit, Mumbai

236 Indian Statistical Institute,SQC & OR Unit, Mumbai

237 Indian Statistical Institute,SQC & OR Unit, Mumbai

HOW TO PREPARE CAUSE AND EFFECT DIAGRAM

Each of the main branches has many potential

241 Indian Statistical Institute,SQC & OR Unit, Mumbai

242 Indian Statistical Institute,SQC & OR Unit, Mumbai

− Start with a narrow problem definition

What’s the Y ? _ Type of Data ? __

What’s the X ? _ Type of Data ? __

What’s the Y ? _ Type of Data ? __

What’s the X ? _ Type of Data ? __

What’s the Y ? _ Type of Data ? __

What’s the X ? _ Type of Data ? __

What’s the Y ? _ Type of Data ? __

What’s the X ? _ Type of Data ? __

What’s the Y ? _ Type of Data ? __

What’s the X ? _ Type of Data ? __