Professional Documents
Culture Documents
Lesson - 4.2 - Exploratory Data Analysis - Analyze - Phase
Lesson - 4.2 - Exploratory Data Analysis - Analyze - Phase
Certification Course
Exploratory Data Analysis
Learning Objectives
Brew Time?
Using Multi-Vari
analysis and
Inconsistent coffee temperature
Regression analysis
Burner?
Look at Interactions
in Data
Cup used?
Multi-Vari analysis is used when you have multiple discrete Xs (like work shift, employee, location) and Y is
continuous (like part length or cycle time).
Multi-Vari Analysis
• Select the process • Sample size is five • The tabulation • Chart is plotted • The observed
where 1 inch plates pieces from each sheet with data with time on X axis values are linked by
are manufactured equipment records contains and the plate appropriate lines.
with 4 equipment the columns with thickness on Y axis.
• Frequency of data time, equipment
collection is every number, and
• Measure its thickness as
thickness within a two hours starting
from 8AM until headers.
specified range of
0.95 – 1.05 inches 2PM
Multi-Vari Analysis
Correlation and Linear Regression
Correlation
-1 0 +1
Movement in both No correlation Movement in both
variables is inverse between the two variables is same
variables
Higher the absolute value of ‘r’, stronger is the correlation between Y and X.
An ‘r’ value of > +0.70 or < -0.70 indicates a strong correlation.
Correlation
Y Y
X X
Y Y
-
X X
Correlation
College College
performance – performance –
Dependent Dependent
variable variable
Correlation Correlation
Coefficient Coefficient
r = 0.29 r = 0.45
Correlation
Strength of
the Significance
r relationship p of the
between two relationship
variables
• 𝐻0 : r = 0
If p < ∝ • 𝐻𝑎 : r ≠ 0
Correlation
Exercise
Be careful with Beware of non-
caution while Check for
claims of linear
Correlating outliers
causality relationships
averaged data
Regression Analysis (𝐑𝟐 )
Regression analysis generates a line on scatter plot that quantifies the relationship between X and Y.
Predict future
values of Y given X,
and X given Y
If a high percentage
of variability in Y (R2 >
70%) is explained by
changes in X
Regress Y on one or
more X’s
simultaneously
Regression Analysis (𝐑𝟐 )
Transfer
function
to
control Y
Vital X
Regression Analysis
Y = A + BX ± C Y = Dependent variable/output/response
X = Independent variable/input/predictor
Using the
transfer
High value of R2
function,
y=0.2119x-
30% variation
0.3091,
due to residual
Chirps/sec =
70% of factors
18.76 when
variability in Y is
Temp = 90F
R2 = 0.6975 explained by X
If 𝑅2 value is small, refer to the Cause and Effect Matrix and study the
relationship between Y and a different X variable.
SLR Using MS Excel
Residual Analysis
90 1
Residual
Percent
0
50
-1
10
1 -2
-2 -1 0 1 2 15 16 17 18 19
Residual Fitted Value
Residual
0
2
-1
1
0 -2
-1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Residual Observation Order
Linear Regression
Multiple Regression
• If a new variable, X2, is added to the r2 model, the impact of X1 and X2 on Y gets
tested.
• 𝑌 = 𝑏0 + 𝑏1 𝑋1 + 𝑏2 𝑋2 + ⋯ + 𝑏𝑛 𝑋𝑛
• where 𝑋1 , 𝑋2 , … 𝑋𝑛 are multiple independent variables
Nonlinear Regression
Exponential Quadratic
Other nonlinear regressions models are Cubic, Quartic, Power, Logarithmic, and Logistic
Multiple Regression
Multiple Regression
Linear
F = C x (9/5) +32
Box and Cox Transformations
λ Y’
Family of power
-2 Y-2 = 1/Y2
transformations are used for
-1 Y-1 = 1/Y1
-0.5 Y-0.5 = 1/(√(Y))
Converting a dataset to
0 log(Y) use parametric statistics
0.5 Y0.5 = √(Y)
1 Y1 = Y
Any continuous data > 0
2 Y2
A. Cyclical
B. Temporal
C. Special
D. Positional
Knowledge
Check
Which is NOT a variation component analyzed in Multi-Vari studies?
1
A. Cyclical
B. Temporal
C. Special
D. Positional
The variation components analyzed in Multi-Vari studies are positional, cyclical, and temporal. So, option c is the
answer.
Knowledge
Check
Which correlation value indicates a strong negative relationship?
2
A. + 0.90
B. + 0.50
C. - 0.50
D. - 0.90
Knowledge
Check
Which correlation value indicates a strong negative relationship?
2
A. + 0.90
B. + 0.50
C. - 0.50
D. - 0.90
A negative relationship is indicated by a minus sign and the larger the absolute value, the stronger is the relationship.
So, option d is the best choice.
Knowledge
Check A team wants to predict the delivery hours when the independent variable of training hours is
10. The model equation is y = 13x-5 and r value is 0.40. What is the result?
3
A. 0.4
B. 125
C. 0.16
A. 0.4
B. 125
C. 0.16
Although a predicated value for Y could be calculated, it should not be based on the coefficient of correlation value.
Knowledge
Check A team discovers that their output variable is not normal and desires to transform the data.
Using the Box and Cox method the team is provided a λ value of -1. What would an output value
4 of 35.5 transform to?
A. 1.55
B. 0.028
C. 17.75
D. 71
Knowledge
Check A team discovers that their output variable is not normal and desires to transform the data.
Using the Box and Cox method the team is provided a λ value of -1. What would an output value
4 of 35.5 transform to?
A. 1.55
B. 0.028
C. 17.75
D. 71
The -1 lambda transformation is 1/Y and therefore the inverse of 35.5 is 0.028.
Knowledge
Check
Which residual plot is used to see if the residuals have a constant variance?
5
C. Residual Histogram
C. Residual Histogram
The residual versus fit plot checks to see if all residuals randomly center around a center value of 0 to prove constant
variance.
Lean Six Sigma Activities and Tools: Analyze
Activities
❑ Review Project Charter ❑ Process Map Flow ❑ Identify Root Causes ❑ Develop Potential Solutions ❑ Develop SOP’s, Training
❑ Validate High-Level Value ❑ Identify Key Input, Process ❑ Reduce List of Potential ❑ Evaluate, Select, and Plan & Process Controls
Stream Map and Scope and Output Metrics Root Causes Optimize Best Solutions ❑ Implement Solution and
❑ Validate Voice of the ❑ Develop Data Collection ❑ Confirm Root Cause to ❑ Develop ‘To-Be’ Process Ongoing Process
Customer Plan Output Relationship Maps Measurements
& Voice of the Business ❑ Validate Measurement ❑ Estimate Impact of Root ❑ Develop and Implement ❑ Confirm Attainment of
❑ Validate Problem Statement System Causes on Key Outputs Pilot Solution Project Goals
and Goals ❑ Collect Baseline Data ❑ Prioritize Root Causes ❑ Implement 5s Program ❑ Identify Project Replication
❑ Validate Financial Benefits ❑ Determine Process ❑ Statistical Analysis ❑ Develop Full Scale Opportunities
❑ Create Communication Plan Capability ❑ Complete Analyze Tollgate Implementation Plan ❑ Training
❑ Select and Launch Team ❑ Complete Measure Tollgate ❑ Cost/Benefit Analysis ❑ Complete Control Tollgate
❑ Develop Project Schedule ❑ Complete Improve Tollgate ❑ Transition Project to
❑ Complete Define Tollgate Process Owner
❑ Project Charter ❑ Process Mapping ❑ Cause & Effect Matrix ❑ Process Flow ❑ Mistake-Proofing
❑ Voice of the Customer ❑ Data Collection Plan ❑ FMEA Improvement ❑ Standard Operating
❑ SIPOC Map ❑ Statistical Sampling ❑ Hypothesis Tests ❑ Design of Experiments Procedures (SOP’s)
❑ Project Valuation (ROI) ❑ Measurement System ❑ Simple & Multiple (DOE) ❑ Process Control Plans
❑ Stakeholder Analysis Analysis (MSA) Regression ❑ Solution Selection Matrix ❑ Visual Process Control
❑ Communication Plan ❑ Gage R&R ❑ ANOVA ❑ Piloting Tools
❑ Effective Meeting Tools ❑ Control Charts ❑ Components of ❑ Pugh Matrix ❑ Statistical Process Controls
❑ Time Lines, Milestones, ❑ Histograms Variation ❑ Pull System (SPC)
and Gantt Charting ❑ Normality Test ❑ Visual Workplace
❑ Pareto Analysis ❑ Process Capability ❑ Total Productive
Analysis Maintenance
❑ Metrics
❑ Team Feedback Session
Tools
Analyze Tollgate Questions
❑ Has the team analyzed data about the process and its performance to help stratify the problem, understand reasons for
variation in the process, and generate hypothesis as to the root causes of the current process performance?
❑ Has an evaluation been done to determine whether the problem can be solved without a fundamental recreation of the
process? Has the decision been confirmed with the Project Sponsor?
❑ Has the team investigated and validated (or de-validated) the root cause hypotheses generated earlier, to gain confidence that
the “vital few” root causes have been uncovered?
❑ Does the team understand why the problem (the Quality, Cycle Time or Cost Efficiency issue identified in the Problem
Statement) is being seen?
❑ Have learning’s to-date required modification of the Project Charter? If so, have these changes been approved by the Project
Sponsor and the Key Stakeholders?
❑ Have any new risks to project success been identified, added to the Risk Mitigation Plan, and a mitigation strategy put in place?
Note :With answers to these questions you are now ready to move to the Measure Phase.