Six Sigma Green Belt Material

Certified Six Sigma - Green
Belt Professional
VS-1103
Certified Six V skills
Sigma - Green www.vskills.in
Belt Proficiency Testing Programme
of
Professional Intelligent Communication Systems India

Limited
VS-1103 JV of DSIIDC ( Govt of NCT Delhi)

and TCIL (Govt of India)
This document describes deploying,

managing, maintaining quality systems
Certified - Six Sigma - Green
and other areas of defining, measuring,
analyzing, improving and controlling Belt Professional
processes by Six Sigma Green Belt
professionals. This document is for
Training Material
beginners and intermediaries.
Certified Six Sigma - Green Belt Professional
Copyright© 2014 Cubezoid Solutions Private Limited
Content, design, typesetting and published by Cubezoid Solutions Private Limited,

info@cubezoid.com
All rights reserved
This book is provided on the condition that it shall not by way of trade or otherwise, be lent,
resold, hired out or otherwise circulated without the publisher’s prior consent in any form of
binding or cover other than in which it is published and without a similar condition including this
condition being imposed on the subsequent purchaser and without limiting the rights under the
copyright reserved above, no part of this publication, may be reproduced, stored in or introduced
into a retrieval system, or transmitted in any form or by any means (electronic, mechanical,
photocopying , recording or otherwise) without the prior permission of the copyright owner and
publisher of the book
Disclaimer:
Due care and diligence has been taken while editing and printing this book. Neither the Author, publisher nor the printer of the
book holds any responsibility for any mistake that may have crept in inadvertently. Cubezoid Solutions Private Limited – the
publishers, will be free from any liability for damages and loss of any nature arising out or related to the content. All disputes
are subject to the jurisdiction of the competent courts in Delhi.
www.vskills.in Page 2
TABLE OF CONTENTS
1. Six Sigma and Organization .................................................................................................... 4

1.1. Six Sigma and Organizational Goal.....................................................................................................................4
1.2. Lean Principles.................................................................................................................................................15
1.3. Design for Six Sigma (DFSS) ............................................................................................................................20
2. Define ....................................................................................................................................... 27
2.1. Process Management ........................................................................................................................................27
2.2. Project Management.........................................................................................................................................29
2.3. Management and Planning Tools .....................................................................................................................31
2.4. Business Results ...............................................................................................................................................40
2.5. Team Dynamics and Performance ...................................................................................................................43
3. Measure.................................................................................................................................... 50
3.1. Process Analysis and Documentation ...............................................................................................................50
3.2. Statistics and Probability ...................................................................................................................................53
3.3. Collecting and Summarizing Data.....................................................................................................................60
3.4. Probability Distributions ...................................................................................................................................69
3.5. Measurement System Analysis..........................................................................................................................79
3.6. Control Chart ...................................................................................................................................................86
3.7. Process Capability and Performance ................................................................................................................89
4. Analyze..................................................................................................................................... 97
4.1. Exploratory Data Analysis ................................................................................................................................97
4.2. Hypothesis Testing .........................................................................................................................................104
5. Improve and Control ............................................................................................................ 116

5.1. Design of Experiments (DOE) .......................................................................................................................116
5.2. Statistical Process Control (SPC) ....................................................................................................................127
5.3. Implement and Validate .................................................................................................................................144
5.4. Control Plan ...................................................................................................................................................148
1. SIX SIGMA AND ORGANIZATION

ORGANIZATION
Six sigma is a method on quality, which is focused on results. It's also a technique of measurements
which results in lower defects which convert into cost savings and competitive advantage.
Sigma (σ), is an mathematical symbol representing one standard deviation from the average or
mean. Most control charts set their range at +3σ, but Six Sigma extends three more standard
deviations. With six sigma, there are only 3.4 parts per million (PPM) defective. A 6 Sigma level
process is operating at 99.9997% quality level.
1.1. Six Sigma and Organizational Goal

Six Sigma is defined as a methodology that aims at a quasi-perfect production process. It is also
defined as a methodology that aims at a rate of 3.4 defects per million opportunities (DPMO).
In the design phase of any process, the customers’ needs and expectations are identified and
translated into Critical-To-Quality (CTQ) characteristics. These characteristics are put into the
products’ design so as to manufacture or deliver it consistently and economically. But variability
comes during delivery or manufacture hence, tolerance levels are specified thus, the company
should measure and control the variations. Then the process performance is measured to know
how the output against specified limits by the process capability or the ability of the process to
generate products that are within the specified limits, and the process stability or company’s ability
to predict the process performance based on past experience. Usually the SPC is used with sample
being tested at specified intervals and estimation is derived for whole to know number of defects.
Continuous Improvement
Improvement
Continuous improvement involves constantly identifying and eliminating the causes that prevent a
system or process from functioning at its optimum level. The concept of continuous improvement
originated in Japan in the 1970s. It was adopted in many countries, including U.S.A., in the early
1980s. Continuous improvement—and consequent customer satisfaction—is the principle on which
the concept of Lean manufacturing is developed. When this principle is combined with just-in-
time technique, it results to Lean manufacturing. Continuous improvement helps an organization
to add value to its products and services by reducing defects, mistakes, etc. and to maximize its
potential. As continuous improvement requires constant ongoing efforts, it is essential that the top
management takes a long term view and commits itself for its implementation.
Continuous improvement enables organizations identify and rectify problems as and when they
occur. Thus, it ensures smooth functioning of the processes. Many modern quality improvement
models or tools like control charts, sampling methods, process capability measures, value analysis,
design of experiments, etc. have been influenced by the concept of continuous improvement.
Six Sigma History
History of six sigma encompassed various events which shaped it’s formation and spread. Six
sigma has evolved over time. It’s more than just a quality system like TQM or ISO. The events for
six sigma evolution are as
Carl Frederick Gauss (1777-1855) introduced the concept of the normal curve.
Walter Shewhart in 1920’s showed that three sigma from the mean is the point where a
process requires correction.
Following the defeat of Japan in World War II, America sent leading experts including Dr. W.
Edwards Deming to encourage the nation to rebuild. Leveraging his experience in reducing
waste in U.S. war manufacture, he offered his advice to struggling emerging industries.
By the mid-1950s, he was a regular visitor to Japan. He taught Japanese businesses to
concentrate their attention on processes rather than results; concentrate the efforts of everyone
in the organization on continually improving imperfection at every stage of the process. By the
1970s many Japanese organizations had embraced Deming's advice. Most notable is Toyota
which spawned several improvement practices including JIT and TQM.
Western firms showed little interest until the late 1970s and early 1980s. By then the success of
Japanese companies caused other firms to begin to re-examine their own approaches and
Kaizen began to emerge in the U.S.
Many measurement standards (Zero Defects, etc.) later came on the scene but credit for
coining the term “Six Sigma” goes to a Motorola engineer named Bill Smith. (“Six Sigma” is
also a registered trademark of Motorola). Bill Smith, along with Mikel Harry from Motorola,
had written and codified a research report on the new quality management system that
emphasized the interdependence between a product’s performance in the market and the
adjustments required at the manufacturing point.
Various models and tools emerged which are
Kaizen – It refers to any improvement, one-time or continuous, large or small

TQM – It is Total Quality Management with Organization management of quality consisting of
14 principles
PDCA Cycle - Edward Deming’s Plan Do Check Act cycle
Lean Manufacturing – It focuses on the elimination of waste or “muda” and includes tools
such as Value Stream Mapping, the Five S’s, Kanban, Poka-Yoke
JIT– It is Just in Time Business or catering to needs of customer when it occurs.
Six Sigma – It is designed to improve processes and eliminate defects; includes the DMAIC
and DMADV models inspired by PDCA
Quality Pioneers
Various pioneers emerged who helped shape quality principles and laid the foundations for six
sigma. They included
Walter A. Shewhart - He is the pioneer of Modern Quality Control who, recognized the need to
separate variation into assignable and un-assignable causes. He is the founder of the control chart
and originator of the plan-do-check-act cycle. He was the first to successfully integrate statistics,
engineering, and economics and defined quality in terms of objective and subjective
quality.
Dr. W. Edwards Deming – He studied under Shewhart at Bell Laboratories and major
contributions includes developing 14 points on Quality Management, a core concept on
implementing total quality management, is a set of management practices to help companies
increase their quality and productivity. The 14 points are
Create constancy of purpose for improving products and services.

Adopt the new philosophy.
Cease dependence on inspection to achieve quality.
End the practice of awarding business on price alone; instead, minimize total cost by working
with a single supplier.
Improve constantly and forever every process for planning, production and service.
Institute training on the job.
Adopt and institute leadership.
Drive out fear.
Break down barriers between staff areas.
Eliminate slogans, exhortations and targets for the workforce.
Eliminate numerical quotas for the workforce and numerical goals for management.
Remove barriers that rob people of pride of workmanship, and eliminate the annual rating or
merit system.
Institute a vigorous program of education and self-improvement for everyone.
Put everybody in the company to work accomplishing the transformation.
Joseph Juran - His major contributions are directing most of his work at executives and the field of
quality management and developing the “Juran Trilogy” for managing quality, as Quality planning,
quality control, and quality improvement. He also enlightened the world on the concept of the
“vital few, trivial many” which is the foundation of Pareto charts.
Philip Crosby - He stressed on Quality management and four absolutes of quality including
Quality is defined by conformance to requirements.

System for causing quality is prevention not appraisal.
Performance standards of zero defects not close enough.
Measurement of quality is the cost of nonconformance.
Arman Feigenbaum - He developed a systems approach to quality (all organizations must be

focused on quality) by emphasizing that costs of quality may be separated into costs for prevention,
appraisal, and failures (scrap, warranty, etc.)
Kaoru Ishikawa - He developed the concept of true and substitute quality characteristics as
True characteristics are the customer’s view

Substitute characteristics are the producer’s view
Degree of match between true and substitute ultimately determines customer satisfaction
He also advocated of the use of the 7 tools and advanced the use of quality circles or worker
quality teams. He also developed the concept of Japanese Total Quality Control
Quality first and not short term profits.

Next process is the customer.
Use facts and data to make presentations.
Respect for humanity as a management philosophy of full participation
Genichi Taguchi - He developed the quality loss function (deviation from target is a loss to society)
and promoted the use of parameter design (application of Design of experiments) or robust
engineering. The goal is to develop products and processes that perform on target with smallest
variation insensitive to environmental conditions and the focus is on engineering the design.
Value of Six Sigma
The Six Sigma concept was developed at Motorola in the 1980s. Six Sigma can be viewed as a
philosophy, a technique, or a goal.
Philosophy - Customer-focused breakthrough improvement in processes
Technique - Comprehensive set of statistical tools and methodologies
Goal - Reduce variation, minimize defects, shorten the cycle time, improve yield, enhance
customer satisfaction, and boost the bottom line
Six sigma is not about quality for the sake of quality; it is about providing better value to customers,
investors and employees. Six Sigma is a process of asking questions that lead to tangible and
quantifiable answers that ultimately produce profitable results. There are four groups of quality
costs, which are
External failure cost: warranty claims, service cost

Internal failure cost: the costs of labor, material associated with scrapped parts and rework
Cost of appraisal and inspection: these are materials for samples, test equipment, inspection
labor cost, quality audits, etc..
Cost related to improving poor quality: quality planning, process planning, process control, and
training.
Usually companies are at 3 Sigma level which translates to 25-40% of annual revenue being taken
by cost of quality. Thus, if a company can improve its quality by 1 sigma level, its net income will
increase hugely, approximately 10 percent net income improvement.
Furthermore, when the level of process complexity increases (eg. output of one sub-process feeds
the input of another sub-process), the rolled throughput yield of the process will decrease, then the
final outgoing quality level will decline, and the cost of quality will increase. Project teams with well-
defined projects improve the company's profits.
Mathematical Six Sigma
The term ‘Six Sigma’ is drawn from the statistical discipline ‘process capability studies’. Sigma,
represented by the Greek alphabet ‘σ’, stands for standard deviation from the ‘mean’. ‘Six Sigma’
represents six standard deviations from the ‘mean.’ This implies that if a company produces
1,000,000 parts/units, and its processes are at Six Sigma level, less than 3.4 defects only will result.
However, if the processes are at three sigma level, the company ends up with as many as 66,807
defects for every 1,000,000 parts/units produced.
The table below shows the number of defects observed for every 1,000,000 parts produced (also
referred to as defects per million opportunities or DPMO).
Sigma Level Defects per million opportunities

Two Sigma 308,507 DPMO
Three Sigma 66,807 DPMO
Four Sigma 6,210 DPMO
Five Sigma 233 DPMO
Six Sigma 3.4 DPMO
Process standard deviation (σ) should be so minimal that the process performance should be able
to scale up to 12σ within the customer specified limits. So, no matter how widely the process
deviates from the target, it must still deliver results that meet the customer requirements. Few
terms used are
USL – It is upper specification limit for a performance standard. Any deviation beyond this is a
defect.
LSL – It is lower specification limit for a performance standard. Any deviation below this is a
defect.
Target – Ideally, this will be the middle point between USL and LSL.
Six Sigma approach is to find out the root causes of the problem, symbolically represented by Y =
F(X). Here, Y represents the problem that occurs due to cause (s) X.
Y x1, x2, x3, …., xn

Dependent Independent
Customer related output Input-process
Effect Cause
Symptom Problem
Monitor Control
Benefits of Six Sigma
Continuous defect reduction in products and services
Enhanced customer satisfaction
Performance dashboards and metrics
Process sustenance
Project based improvement, with visible milestones
Sustainable competitive edge
Helpful in making right decisions
Business processes
A business process or a process is a group of tasks which result in a specific service or product for
customers. It can be visualized with a flowchart or a process matrix. Business processes are
fundamental to every company’s performance and implement the business strategy. Understanding
and optimizing the business process is the crux of six sigma.
Frequently, organizations treat the symptoms of a process performance issue without truly
understanding the root cause or impact of the issue. Dissecting and truly understanding root cause
for process performance is critical to effective process improvement which is can be accomplished
by six sigma. Each process, have the three elements of inputs, process and outputs that affect its
function. A business process is a collection of related activities that produce something of value to
the organization, its stakeholders or its customers.
Having a standard model such as DMAIC (Define-Measure-Analyze-Improve-Control) makes

process improvement and optimization much easier by providing the teams with an easy roadmap.
This disciplined, structured, rigorous approach consists of steps which are linked logically to the
previous step and to the next step. It is not enough for organizations to treat process improvement
as one-time or periodic events. A sustaining focus on process management and continuous
improvement is the key.
Types of Processes - Processes can be classified as management processes, operational processes

and supporting processes.
Management processes - These processes administer the operation of a system. Some

examples of management processes are planning, corporate governance, etc.
Operational processes - These processes create the primary value stream for the customers.
Hence, they are also called ‘core business processes’. Some examples of operational processes
are purchasing of raw materials, manufacturing of goods, rendering of services, marketing, etc.
Supporting processes - These processes support the core business processes of the
organization. Some examples of supporting processes are accounting, technical support, etc.
These processes can be divided into many sub-processes that play their intended roles to
successfully complete the respective head processes.
Business System
A business system is a group of business processes which combine to form a single and identifiable
unit of business as a whole. It is composed of processes, which in turn are composed of sub-
processes and which are further composed of individual tasks.
A business system is a system that implements a process or a set of processes. It ensures that all the
processes operate smoothly without delays or lack of resources. Six sigma directs business systems
to ensure that the processes, products, and services are subjected to continuous improvement and
for which collection and analysis of data from processes is initiated.
It is important to have an appropriate business system in place and the relevant processes under
the system are well-documented. The documentation of the processes must be done in such a way
that every task, activity, and their sequence are taken into account for proper execution as planned
for in the business system.
Process Control
Feedback received from process is used for process control thus, focusing on the input and output
of the process for data collection. Every sub-process or task act as an input to next task or as output
for previous one. Achieving optimum resources usage by a process though keeping quality output
by
Applying feedback loop to collect data from various process stages so as to apply improvisation
Re-design the process for data collection, analysis and improvisation as part of the process.
A real-time feedback will initiate improvisation quickly. Tools like control chart helps in data
collection and analysis as well.
Six Sigma Green Belt’s
Belt’s Responsibilities
A Six Sigma Green Belt has nearly identical responsibilities as a Black Belt when it comes to
projects but they work on less complex challenges or problems than the Black Belt professionals.
There are no dedicated Green Belt practitioners in any organization as, most Green Belts retain
the positions they had prior to being trained in Six Sigma and use the new skills to improve their
working environment and performance. The responsibilities of a Six Sigma Green Belt includes
Project Management involving defining the project scope, marshal resources, setting up of
goals, timelines and milestones and also reporting or updating stakeholders and executives.
Task Management involving establishing the team’s lean Sigma roadmap, leading the
implementation of Six Sigma tools, managing team meetings, tracking and reporting team
progress
Team Management involving selecting team members, manage the team’s organizational
interfaces and ensuring the team is trained and equipped for their work.
DMAIC Methodology
The Six Sigma methodology is conceptually based a five phase project. Each phase has a specific
purpose and specific tools and techniques which aid in achieving the phase objectives as well as
lead the Six Sigma professional to significant conclusions. The 5 Phases of the Six Sigma
Methodology is called as DMAIC or the Define Phase, Measure Phase, Analyze Phase, Improve
Phase and the Control Phase. All the five phases are discussed below.
Define Phase - The goal of Define is to establish the projects foundation and is the most important
aspect of the Six Sigma project. Projects start with a current state challenge which is articulated in a
quantifiable manner as well as the goal to achieve, is also determined.
After specification of problems and goals the remaining tasks of valuation, team, scope, project
planning, time line, stakeholders, VOC/ VOB etc. are to be completed. Various tools used by the
Define Phase are
Project Charter
Problem Statement
Business Case
Objective
High level time line
Project Scope
Project Team
Stakeholder Assessment
Pareto Charts
SIPOC
VOC/VOB and CTQ's
High Level Process Map
Measure Phase – In this phase baseline information is gathered about the process or product and
achieve the following objectives
Gather All possible x's

Analyze measurement system and Data Collection Requirements
Validate Assumptions and Improvement Goals
Determine COPQ
Refine Process Understanding
Determine Process Capability
Process Stability
This Phase involves the usage of following tools
Process Maps, Value Stream Mapping

Failure Modes and Effects Analysis (FMEA)
Cause and Effect Diagram
XY Matrix
Basic Control Charts
Six Sigma Statistics
Basic Statistics
Descriptive Statistics
Normal Distributions
Graphical Analysis
Measurement Systems Analysis
Variable Gage R&R
Attribute Gage R&R

Gage Linearity and Accuracy
Gage Stability
Process Capability (Cpk, Ppk) and Sigma
Data collection plan
Analyze Phase – It entails establishing verified drivers by using statistics and higher order analytics
to discover the fact-based relationship between the process performance and the x's or the root
causes or drivers of improvement effort. Thus, resulting in establishment of hypothesis for
improvements. This phase establishes transfer function Y=f(x) and validates list of critical X's and
their impacts. The analyze phase also results in a beta improvement plan like pilot plan. This
phase utilizes various tools like
Hypothesis Testing
Simple Linear Regression
Multiple Regression
Improve Phase – This phase is aimed only on making the improvement like improving the
designing, testing and implementing of the solution. It involves enlisting statistically proven results
from active study or pilot, creating the improvement plan, updating the stakeholder assessment,
revising the business case with investment ROI, risk assessment and adding new process capability.
This phase uses tools like
Design of Experiment (DOE)

Implementation Plan
Change Plan
Communication Plan
Control Phase – It is the last phases of the Six Sigma methodology which establishes automated
and managed mechanisms to maintain and sustain improvements in the process. A successful
control plan also results in a reaction and mitigation plan with an accountability structure. It
involves tools like control plan, training plans, poka-yoke and/or audit plans. The Six Sigma
methodology is a complete system with tools and techniques built-in which ensures the Six Sigma
practitioner to achieve success.
Cost of Quality (COQ)

Cost of quality is the sum of various costs as that of appraisal costs, prevention costs, external
failure costs, and internal failure costs. It is generally believed that investing in prevention of failure
will decrease the cost of quality as failure costs and appraisal costs will be reduced. Understanding
cost of quality helps organizations to develop quality conformance as a useful strategic business
tool that improves their product, services & brand image. This is vital in achieving the objectives of
a successful organisation.
COQ is primarily used to understand, analyze & improve the quality performance. COQ can be
used by shop floor personnel as well as a management measure. It can also be used as a standard
measure to study an organization’s performance vis-à-vis another similar organisation and can be
used as a benchmarking indices.
The various costs which constitute cost of quality are
Appraisal cost is the cost incurred because of inspecting the processes. The cost associated with
checking and testing to find out whether it has been done first time right.
Prevention cost is the cost incurred because of carrying out activities to prevent failures. The
cost associated with planning and training associated with doing it first time right.
External failure cost is the cost incurred because of the failure that occurred when the
customer used the product.
Internal failure cost is the cost incurred because of the failures within the organization.
Examples of the various costs are
Prevention - Training Programme, Preventive Maintenance

Appraisal - Depreciation of Test/ Measuring Equipment, Inspection Contracts
Internal Failure - Scrap, Rework, Downtime, Overtime
External Failure - Warranty, Allowances, Customer Returns, Customer Complaints, Product
Liability, Lawsuits, Lost Sales
Identifying COQ can have several benefits, as
It provides a standard measure across the organisation & also inter-organisation

It builds awareness of the importance of quality
It identifies improvement opportunities
Being a cost measure, it is useful at shop floor as well as at management level
Organizational Drivers and Metrics

Key Drivers – Performance measurement and analysis is the primary way to reduce wastages and
maintain higher quality products or services. Various internal and external entities act as the key
drivers for improvements. Internal key drivers include operational, workforce, governance and
compliance performance, and the external key drivers include customer, service, competitive and
financial performance.
Various performance measures are present but only those performance metrics need to be
considered which represent the factors for improvisations in selected performances like financial
or customer.
Voice Of the Customer (VOC) - It is the term used to describe the stated and unstated needs or
requirements of the customer. It helps in listing the relative importance of features and benefits
associated with the product or service thus, showing the expectations and promises that are both
fulfilled and unfulfilled by the product or service. Voice of the Customer (VOC) is describes
customer’s feedback about their experiences with and expectations for the products or services.
Gathering VOC information can be done by

Direct interviews of customers like site intercepts, personal interviews, focus groups, customer
feedback forms, or structured online surveys.
Indirect interviews with representatives like sales people or customer service representatives,
who interface with the customer and report on their needs.
Conducting VOC helps by

Customize products, services, add-ons and features to meet the needs and wants of customers
No one becomes an industry leader without listening to the customer. Quality (customer
perceived) is the leading driver of business success
Maximize company’s profit. Higher market share companies have higher profits
The Balanced Scorecard - It is the most widely used business performance measurement
framework, introduced by Robert S. Kaplan and David P. Norton in 1992. Balanced scorecards
were initially focused on finding a way to report on leading indicators of a business’s health, they
were refocused to measure the firm’s strategy that directly relate to the firm’s strategy. Usually the
balanced scorecard is broken down into four sections, called perspectives, as
The financial perspective - The strategy for growth, profitability and risk from the shareholder’s
perspective. It focuses on the ability to provide financial profitability and stability for private
organizations or cost-efficiency/effectiveness for public organizations.
The customer perspective - The strategy for creating value and differentiation from the
perspective of the customer. It focuses on the ability to provide quality goods and services,
delivery effectiveness, and customer satisfaction
The internal business perspective - The strategic priorities for various business processes that
create customer and shareholder satisfaction. It aims for internal processes that lead to
“financial” goals
The learning and growth perspective - The priorities to create a climate that supports
organizational change, innovation and growth. It targets the ability of employees, technology
tools and effects of change to support organizational goals.
The Balanced Scorecard is needed due to various factors, as
Focus on traditional financial accounting measures such as ROA, ROE, EPS gives misleading
signals to executives with regards to quality and innovation. It is important to look at the means
used to achieve outcomes such as ROA, not just focus on the outcomes themselves.
Executive performance needs to be judged on success at meeting a mix of both financial and
non-financial measures to effectively operate a business.
Some non-financial measures are drivers of financial outcome measures which give managers
more control to take corrective actions quickly.
Too many measures, such as hundreds of possible cost accounting index measures, can
confuse and distract an executive from focusing on important strategic priorities. The balanced
scorecard disciplines an executive to focus on several important measures that drive the
strategy.
Organizational Goals
Before a Six Sigma project can be executed, organizational strategic planning goals and objectives
must be defined. Determining selection of appropriate projects and choosing an effective
improvement model are crucial tasks that help to ensure company is pointed in the right direction.
The broad objectives of the organization must be aligned with its long term strategies. One of the
techniques that an organization can use to align its objectives with long term strategies is ‘hoshin
planning’. Hoshin planning helps an organization to develop its business plan and deploy the same
across the organization in order to reach the set goals.
Project selection is a testimony to a leader’s role in successfully aligning the broad objectives of the
organization with its long term strategies. A project selection committee or group can be formed to
screen and select projects. It can include Champions, Master Black Belts, Black Belts, and
important executive supporters.
The project selection committee sets the criteria to select the projects. The project selection criteria
are framed on the basis of the key factors that define the business case and business need of an
organization. After selecting the projects, the project selection committee matches the projects
selected with teams assigned to execute them.
1.2. Lean Principles

Lean manufacturing focuses on lean philosophy which is about elimination of waste in all forms at
the workplace. Specific lean methods include just-in-time inventory management, Kanban
scheduling systems and 5S workplace organization.
Many of these concepts were developed by a Japanese company, Toyota which is an automobile
manufacturer in the 1940s and these concepts became widespread for removing waste thus,
graduating as best practices in many industries beyond automotive companies. Applying these
principles to production has the potential for both improved profitability and increased
complexity.
Origins
Lean Manufacturing has evolved over times. In 1890's Frederick W. Taylor began to look at
individual workers and work methods. Frank Gilbreth added Motion Study and invented Process
Charting. Lillian Gilbreth introduced psychology by studying the motivations of workers and how
attitudes affected the outcome of a process. These ideas led to waste elimination, a key component
of JIT and Lean Manufacturing.
In 1910, Henry Ford developed and implemented the first comprehensive Manufacturing Strategy
by arranging all the elements of a manufacturing system like people, machines, tooling and
products, in a continuous system or an assembly line for manufacturing the Model T automobile.
Toyota Production System
During 1949 and 1975, in Toyota Motor Company, Taichii Ohno and Shigeo Shingo, began to
incorporate Ford production and other techniques into an approach called Toyota Production
System or Just In Time. But, they found flaws in the Ford system, especially with treatment
towards employees as Ford used employees only for muscle power.
The Toyota Production System (TPS) focuses on muri and muda. Muri focuses on the
preparation and planning of the process, or what work can be eliminated in the design process.
Muda are those waste steps and processes that add cost. Muri is used in new product design and
muda is used to improve existing operations.
Concept and Tools
Lean manufacturing is not just usage of few techniques or processes but a journey in itself which
takes a holistic view of the organization and involves various phases which make use of various
techniques and processes. The process for lean manufacturing involves following steps
Define value from the customer’s perspective

Map the value stream
Create flow by removing causes of waste
Create pull if flow is difficult to achieve
Measure and validate
Practice continuous improvements
Mudas - Muda is a Japanese term meaning "waste" as, lean manufacturing is an Japanese
management philosophy hence, Japanese terms and concepts are used extensively. There are 7
mudas or seven types of waste that are found in a manufacturing process which are
Overproduction - Producing more than the customer requires is waste causing other wastes like
inventory costs, manpower and conveyance to deal with excess product.
Needless Inventory - Inventory at any point is a no value-add as it ties up financial resources of
the company and is exposed to the risk of damage, obsolescence, spoilage, and quality issues.
It also needs space and other resources for proper management and tracking.
Defects - Defects and broken equipment results in defective products and subsequently
customer dissatisfaction, which need more resources for solving.
Non-value Processing – It is also called over-processing, for which more resources are wasted
in production, their wasted movement and time. Any processing that does not add value to the
product is waste like in-process protective packaging due to extra manufacturing steps.
Excess Motion - Unnecessary motion due to poor workflow, poor layout, housekeeping,
inconsistent work methods or lack of standardized procedures, is a waste.
Transport and Handling – It is shipping damage and includes pallets not being properly stretch
wrapped (wasted material), or a truck is not loaded to use floor space efficiently.
Waiting - These are wastages in time, due to broken machinery, lack of trained staff, shortages
of materials, inefficient planning and waiting for material.
Waste Elimination Techniques - Various waste elimination techniques which are used in lean
manufacturing are listed, as
Pull System – It is the technique for producing parts as per the customer’s demand.
Companies need to have a Push System or building products to stock as per sales forecast,
without firm customer orders.
Kanban – It is a method for maintaining an orderly flow of material. Kanban cards are used to
indicate material order points, how much material is needed, from where the material is
ordered, and to where it should be delivered.
Total Quality Management – It is a management system for continuous improvement in all
areas of a company's operation. It is applicable to every operation of the organization and
involves employees.
Quick Changeover (or SMED - Single Minute Exchange of Dies) – It is the technique for
reducing changeover time to change a process from running a specific product manufacture to
another. It enables flexibility in final product offerings and also to address smaller batch sizes.
5S or Workplace Organization – It is a systematic method for organizing and standardizing the
workplace and is applicable to every function in an organization.
Total Productive Maintenance – It focuses on proactive and progressive maintenance of
equipments by utilizing the knowledge of operators, equipment vendors, engineering and
support persons to optimize machine performance thus, drastically reducing breakdowns,
unscheduled and scheduled downtime which results in improved utilization, higher
throughput, and better product quality.
Takt time is a measure of customer demand expressed in units of time and is calculated as
Takt time = Available time per shift / Demand per shift or Cycle time/Number of People
Visual Controls – They provide an immediate understanding (usually thirty seconds) of a
condition or situation like what’s happening with regards to production schedule, backlog,
workflow, inventory levels, resource utilization, and quality. It includes kanban cards, lights,
color-coded tools, lines delineating work areas and product flow, etc.
Poka Yoke or Mistake Proofing - Poka Yoke is a quality management concept developed by a
Matsushita manufacturing engineer named Shigeo Shingo to prevent human errors from
occurring in the production line as, extensive automation and computerization is expensive.
Poka yoke is implemented by using simple objects like fixtures, jigs, gadgets, warning devices,
paper systems, and the like to prevent people from committing mistakes.
Value-
Value-Added and Non-
Non-Value-
Value-Added Activities
Value refers to an activity for which customer will pay for or which is valued by the customer and
rest are non-value activities. Value stream refers to the sequence of activities involved from
customer’s request ion to fulfillment and VSM records these activities as icons or symbols.
Value Stream Mapping (VSM) is a visualization tool oriented to understand and streamline work
processes using icons and symbols to depict various elements and improve the flow of material and
information. It helps in identifying and decreasing waste or non-value addition, in the process. It
can also be used as a strategic planning tool and a change management tool other than a
communication tool.
Few icons used for mapping and development of VSM, includes
Icon Name Description

Inventory This is a material Queue of products that are not being
processed. It represents storage of raw materials as well
as finished goods. The time period may be listed below
the icon.
Supermarket This is an inventory “supermarket” that contains some
inventory available to downstream customers enabling
them to select what they need. The next process or
customer would pull from this inventory.
Go See Scheduling Glasses represent collecting information visually. It can

also indicate informal Scheduling.
Kanban Post This represents a location for kanban signal pickup.
Developing the VSM - VSM mapping involves step by step development of the VSM state map
whether a present or of future state map and involves the following steps
Draw customer, supplier and production control icons.

Enter customer requirements and calculate daily production required.
Draw outbound shipping icon and truck with delivery frequency.
Draw inbound shipping icon, truck and delivery frequency.
Add process boxes, in sequence, left to right, and data boxes below.
Add communication arrows with methods and frequencies.
Obtain process attributes and add data boxes.
Add operator symbols, inventory locations and levels in days of demand graph at bottom.
Add push, pull and FIFO icons.
Add working hours, cycle times (CT) and lead times.
Calculate total cycle lead time.
5S
5S is a discipline for creating and maintaining a clutter- free, clean, organized safe and high -
performance workplace in 5 steps, which are seiri, seiton, seiso, seiketsu and shitsuke.
Seiri - Sorting out: Clean out the work area, keeping what is necessary in the work area,
relocating or discarding what is not
Seiton - Systematic arrangement / Set limits and Locations: Arrange needed items so they are
easy to find, use and return
Seiso - Shine and Sweep: Clean and care for equipment area
Seiketsu - Standardization: Make all work areas similar
Shitsuke - Self-Discipline / Sustain: Make these rules natural and instinctual
Theory of Constraints (TOC)
It is a methodology for identifying the most important limiting factor (i.e. constraint) that stands in
the way of achieving a goal and then systematically improving that constraint until it is no longer the
limiting factor. It was first published in The Goal by Eliyahu M. Goldratt and Jeff Cox in 1984.
TOC conceptually models the manufacturing system as a chain, and advocates focusing on its
weakest link. Goldratt defines a five-step process that a change agent can use to strengthen the
weakest link, or links, which includes
Identify the System Constraint - The part of a system that constitutes its weakest link can be
either physical or a policy.
Decide How to Exploit the Constraint - Goldratt instructs the change agent to obtain as much
capability as possible from a constraining component, without undergoing expensive changes.
Subordinate Everything Else - The non-constraint components of the system must be adjusted
to a "setting" that will enable the constraint to operate at maximum effectiveness. Once this has
been done, the overall system is evaluated to determine if the constraint has shifted to another
component. If the constraint has been eliminated, the change agent jumps to step five.
Elevate the Constraint - "Elevating" the constraint refers to taking whatever action is necessary to
eliminate the constraint. This step is only considered if steps two and three have not been
successful. Major changes to the existing system are considered at this step.
Return to Step One, But Beware of "Inertia"
1.3. Design for Six Sigma (DFSS)

Design for Six Sigma can be seen as a subset of Six Sigma focusing on preventing problems by
going upstream to recognize that decisions made during the design phase profoundly affect the
quality and cost of all subsequent activities to build and deliver the product. Early investments of
time and effort pay off in getting the product right the first time. DFSS adds a new, more predictive
front end to Six Sigma. It describes the application of Six Sigma tools to product development and
process design efforts with the goal of “designing in” Six Sigma performance capability. The
intension of DFSS is to bring such new products and/or services to market with a process
performance of around 4.5 sigma or better, for every customer requirement.
Quality Function Deployment (QFD)
Quality Function Deployment is a method for prioritizing and translating customer inputs into
designs and specifications for a product, service, and/or process. While the detail of the work
involved in QFD can be both complex and exhaustive, the essentials of the QFD method are
based on common-sense ideas and tools. QFD is a planning tool that relates a list of delights,
wants, and needs of customers to design technical functional requirements.
With the application of QFD, possible relationships are explored between quality characteristics as
expressed by customers and substitute quality requirements expressed in engineering terms. In the
context of DFSS, these requirements critical-to characteristics, which include subsets such as
critical-to-quality (CTQ) and critical-to-delivery (CTD). In the QFD methodology, customers
define the product using their own expressions, which rarely carry any significant technical
terminology. The voice of the customer can be discounted into a list of needs used later as input to
a relationship diagram, which is called QFD’s house of quality.
One major advantage of a QFD is the attainment of shortest development cycle, which is gained by
companies with the ability and desire to satisfy customer expectation. The other significant
advantage is improvement gained in the design family of the company, resulting in increased
customer satisfaction. QFD is a robust method having many variations in applications, as
Prioritize and select improvement projects based on customer needs and current performance
Assess a process’s or product’s performance versus competitors
Translate customer requirements into performance measures
Design, test, and refine new processes, products, and services
QFD uses various other methods like Voice of the Customer input to Design of Experiments, to
work well. A special multidimensional matrix, also called as the “House of Quality,” is the best-
known element of the QFD method. A full QFD product design project will involve a series of
these matrices, translating from customer and competitive needs to detailed process specifications.
QFD concept involves two core concepts, which are
The QFD Cycle - An iterative effort to develop operational designs and plans in four phases of
Translate customer input and competitor analysis into product or service features.
Translate product/service features into product/service specifications and measures.
Translate product/service specifications and measures into process design features.
Translate process design features into process performance specifications and measures.
QFD is accomplished by multidisciplinary DFSS teams using a series of charts to deploy critical
customer attributes throughout the phases of design development. QFD is usually deployed over
four phases. The four phases are phase 1—CTS planning, phase 2—functional requirements, phase
3—design parameters planning, and phase 4—process variables planning, as shown in the figure
below.
Prioritization and Correlation - Detailed analysis of the relationships among specific needs,
features, requirements, and measures. Matrices like the House of Quality or the simple L-Matrix
keep this analysis organized and document the rationale behind the design effort.
The QFD Cycle develops the links from downstream Ys (Customer Requirements and Product
Specifications) back to upstream Xs (Process Specifications) in the design process itself. With an
existing process or product, it can be used to clarify and document those relationships if they’ve
never been investigated before. Another benefit of the House of Quality is a “diagonal”
relationship test afforded by the matrix, testing combinations that may not have been considered
by our standard human “linear” thought processes. An example is shown below
QFD analysis is conducted in six steps as
It starts with the articulation of customer requirements. Techniques used could be interviewing,
observation, prototyping, conceptual modeling, etc. The data from marketing research are also
used. These requirements are also known as the "What's".
In the second step, the company's current product is ranked against the competitors.
Next, the team looks at Product/Process Characteristics, in other words, the "How's" of meeting
the customer requirements. Candidate CCR's are listed across the top and for each their
relevance is considered and ranked as to which will address customer needs.
Then, the team relates customer and technical requirements with ratings such as "high",
"moderate", "low", and "no" correlation. The team evaluates the degree to which customer wants
and needs are addressed by the product or process characteristics.
In the fifth step, the roof of the "House" focuses on relationships among product/process
characteristics. It shows whether the "How’s" reinforce or conflict with one another.
In last, the team summarizes the key conclusions. It ranks the relevance of product or process
characteristics to the attainment of customers' wants or needs.
Design And Process Failure Mode And Effects Analysis (DFMEA and PFMEA)
FMEA is a systematic, proactive method for evaluating a process to identify where and how it
might fail and to assess the relative impact of different failures, in order to identify the parts of the
process that are most in need of change. FMEA includes review of the following
Steps in the process

Failure modes (What could go wrong?)
Failure causes (Why would the failure happen?)
Failure effects (What would be the consequences of each failure?)
FMEA evaluates processes for possible failures and to prevent them by correcting the processes
proactively rather than reacting to adverse events after failures have occurred. FMEA is also useful
in evaluating new process prior to implementation and in assessing impact of changes to an existing
process. FMEA usually involves the following steps
Select a process to evaluate with FMEA - Evaluation using FMEA works best on processes that
do not have too many sub-processes, instead of doing an FMEA on a large and complex
process.
Recruit a multidisciplinary team - Be sure to include everyone who is involved at any point in
the process.
Have the team meet together to list all of the steps in the process - Number every step of the
process, and be as specific as possible. It may take several meetings for the team to complete
this part of the FMEA, depending on the number of steps and the complexity of the process.
Flowcharting can be a helpful tool for outlining the steps. When finished, be sure to obtain
consensus from the group. The team should agree that the steps enumerated in the FMEA
accurately describe the process.
Have the team list failure modes and causes - For each step in the process, list all possible
failure modes, anything that could go wrong, including minor and rare problems. Then, for
each failure mode listed, identify all possible causes.
For each failure mode, have the team assign a numeric value (known as the Risk Priority
Number, or RPN) for likelihood of occurrence, likelihood of detection, and severity. Assigning
RPNs helps the team prioritize areas to focus on and can also help in assessing opportunities
for improvement. For every failure mode identified, the team should answer as a group with
consensus on all values assigned to the following questions
Likelihood of occurrence: How likely is it that this failure mode will occur? - Assign a score
in 1 and 10, with 1 meaning “very unlikely to occur” and 10 meaning “very likely to occur.”
Likelihood of detection: If this failure mode occurs, how likely is it that the failure will be
detected? - Assign a score between 1 and 10, with 1 meaning “very likely to be detected”
and 10 meaning “very unlikely to be detected.”
Severity: If this failure mode occurs, how likely is it that harm will occur? - Assign a score
between 1 and 10, with 1 meaning “very unlikely that harm will occur” and 10 meaning
“very likely that severe harm will occur.” In patient care examples, a score of 10 for harm
often denotes death.
Evaluate the results - To calculate the Risk Priority Number (RPN) for each failure mode,
multiply the three scores obtained (the 1to 10 score for each of likelihood of occurrence,
detection, and severity). The lowest possible score will be 1 and the highest 1,000. Identify the
failure modes with the top 10 highest RPNs. These are the ones the team should consider first
as improvement opportunities. To calculate the RPN for the entire process, simply add up all
of the individual RPNs for each failure mode.
Use RPNs to plan improvement efforts - Failure modes with high RPNs are probably the most
important parts of the process on which to focus improvement efforts. Failure modes with very
low RPNs are not likely to affect the overall process very much, even if eliminated completely,
and they should therefore be at the bottom of the list of priorities.
Failure Mode: What could go wrong?

Failure Causes: Why would the failure happen?
Failure Effects: What would be the consequences of failure?
Likelihood of Occurrence: 1–10, 10 = very likely to occur
Likelihood of Detection: 1–10, 10 = very unlikely to detect
Severity: 1–10, 10 = most severe effect
Risk Priority Number (RPN): Likelihood of Occurrence × Likelihood of Detection × Severity
Design FMEA (DFMEA) – It is used to analyze designs before they are released to production. In
the DFSS algorithm, a DFMEA should always be completed well in advance of a prototype build.
The input to DFMEA is the array of functional requirements. The outputs are
List of actions to prevent causes or to detect failure modes and
History of actions taken and future activity.
The DFMEA helps the DFSS team in
Estimating the effects on all customer segments

Assessing and selecting design alternatives
Developing an efficient validation phase within the DFSS algorithm
Inputting the needed information for Design for X (DFMA, DFS,DFR, DFE, etc.)
Prioritizing the list of corrective actions using strategies such as mitigation, transferring,
ignoring, or preventing the failure modes
Identifying the potential special design parameters (DPs) in terms of failure
Documenting the findings for future reference
Process FMEA (PFMEA) – It is used to analyze manufacturing, assembly, or any other processes
such as those identified as transactional DFSS projects. The focus is on process inputs. Software
FMEA documents and addresses failure modes associated with software functions. The PFMEA is
a valuable tool available to the concurrent DFSS team to help them in
Identifying potential manufacturing/assembly or production process causes in order to place

controls on either increasing detection, reducing occurrence, or both
Prioritizing the list of corrective actions using strategies such as mitigation, transferring,
ignoring, or preventing the failure modes
Documenting the results of their processes
Identifying the special potential process variables (PVs), from a failure standpoint, which need
special controls
DFSS Roadmap
IDOV and DMADV helps in improving and extending DFSS. Both IDOV and DMADV are
discussed.
IDOV - IDOV stands for Identify, Design, Optimize and Verify. It is a variant of DFSS (Design
For Six Sigma) but, different from DMAIC (define, measure, analyze, improve and control). It
consists of four different phases as
Identify Phase - It identifies specific customer needs, based on which a product or business
process will be designed. It is essential for launching a new product or service and involves
various activities as, defining VOC, developing a team and team charter, performing
competitive analysis and identifying CTQs. Other crucial steps in this phase involve the
identification of customer and product requirements, establishment of an appropriate business
model, identification of technical requirements such as CTQs, allocation of roles and
responsibilities. Some of the tools used are QFD, FMEA, target costing and benchmarking.
Design Phase - It focus on functional requirements, development of alternate business
processes, evaluation of available options and selection of the most appropriate business
process based on CTQs identified earlier. It includes the formulation of concept design,
identification of probable risk elements, and identification of design parameters by utilizing
advanced simulation tools and formulation of procurement plans and manufacturing plans.
Tools used in this phase include risk assessment, FMEA, engineering analysis and Design of
experiments.
Optimize Phase - This phase uses CTQs for calculating the tolerance level of a selected
business process by simulation tools. It predicts the performance capability of a business
process, optimizing existing design and developing alternative design elements. This phase may
involve assessment of process capabilities, optimization of design parameters, development of
design for robust performance and reliability, error proofing and establishment of tolerance
measurement objectives. Tools usually used are manufacturing database and flow back tools,
design for manufacturability, process capability models, Monte Carlo methods, tolerance
measurement tools and Six Sigma tools.
Validate Phase – It being the last phase, focus on testing and validating the selected design. Any
changes to the design can be made in this phase. This phase involves prototype test and
validation, assessment of performance, failure modes, reliability and risks, design iteration and
final phase review.
DMADV – DMADV refers to Define, Measure, Analyze, Design and Verify. DMADV is one
aspect of Design for Six Sigma (DFSS), which has evolved from the earlier approaches of
continuous quality improvement and Six Sigma approach to reduce variation. A key component of
the DMADV approach is an active ‘toll gate’ check sheet review of the outcomes of each of the
five steps of DMADOV. It is depicted as
The application of DMADOV is aimed at creating a high-quality product keeping in mind

customer requirements at every stage of the game. In general, the phases of DMADOV are
Define phase – In this phase, wants and needs believed to be most important to customers are
identified by historical information, customer feedback and other information sources. Teams
are assembled to drive the process. Metrics and other tests are developed in alignment with
customer information. The key deliverables are team charter, project plan, project team,
critical customer requirements and design goals.
Measure phase - The defined metrics are used to collect data and record specifications for
remaining process. All the processes needed to successfully manufacture the product or service
are assigned metrics for later evaluation. Technology teams test metrics and then apply them.
The key deliverables are qualified measurement systems, data collection plan, capability
analysis, refined metrics and functional requirements.
Analyze phase - The result of the manufacturing process (i.e. finished product or service) is
tested by internal teams to create a baseline for improvement. Leaders use data to identify
areas of adjustment within the processes that will deliver improvement to either the quality or
manufacturing process of a finished product or service. Teams set final processes in place and
make adjustments as needed. The deliverables are data analysis, initial models developed,
prioritized X's, variability quantified, CTQ flow-down and documented design alternatives.
Design phase - The results of internal tests are compared with customer wants and needs. Any
additional adjustments needed are made. The improved manufacturing process is tested and
test groups of customers provide feedback before the final product or service is widely
released. The deliverables includes validated and refined models, feasible solutions, trade-offs
quantified, tolerances set and predicted impact.
Verify phase - The last stage in the methodology is ongoing. While the product or service is
being released and customer reviews are coming in, the processes may be adjusted. Metrics are
further developed to keep track of on-going customer feedback on the product or service. New
data may lead to other changes that need to be addressed so the initial process may lead to new
applications of DMADV in subsequent areas. The key deliverables are detailed design,
validated predictions, pilot / prototype, FMEA's, capability flow-up and standards and
procedures.
The applications of these methodologies are generally rolled out over the course of many months,
or even years. The end result is a product or service that is completely aligned with customer.
2. DEFINE
In define phase of a Six Sigma DMAIC project, the project leaders are responsible for clarifying
the purpose and scope of the project or the process to be improved and for knowing the quality
expectations of the customer. This phase also involves establishing realistic estimates for timeline
and costs thus, ensuring stakeholders and project team on the same page about project's
implementation, evaluation, progress and success.
Project need to be assessed for suitability for DMAIC and it involves answering the following
questions
Is data available or easy to obtain?

Does leadership support exist for improving this process?
Is DMAIC really needed or is this a “just do it”: a problem with a known solution that should
just be implemented?
Is the team trying to boil the ocean or is the scope reasonable for chartering as a DMAIC
project?
Is the process directly related to a key outcome such as profitability, customer satisfaction, or
employee satisfaction?
The define phase achieves a number of purposes which includes assessing the current project
against the strategic objectives and ensuring it’s potential. The phase also results in identification of
the project scope, objectives, sponsors, schedule, deliverables and team members along with team
formation. Before initiating, availability of resources is paramount.
2.1. Process Management

Management
Business Process Basics
A business process is a group of tasks which result in a specific service or product for customers. It
can be visualized with a flowchart or a process matrix. Business processes are fundamental to every
company’s performance and implement the business strategy. Understanding and optimizing the
business process is the crux of six sigma.
Flowchart Process Matrix
Dissecting and truly understanding root cause for process performance is critical to effective
process improvement which can be accomplished by Six Sigma. Each process, have the three
elements of inputs, process and outputs that affect its function. A business process is a collection of
related activities that produce something of value to the organization, its stakeholders or its
customers.
Process Elements – Every process has a start from the state or resources it needs and end where
the process need to reach. The intermediate between both is the process logic which makes it
possible.
Process Identification – Process to be improved or optimized need to be identified by the process

boundaries which indicate the influence and involvement of a process and it’s resources. SIPOC
diagrams are usually used for process identification as it provides a top-level view of the process.
SIPOC stands for Suppliers, Inputs, Process, Outputs and Customers. SIPOC enables the team to
quickly develop a common understanding of the process and it's key customers and suppliers.
The steps to create a SIPOC are

Naming the process.
Defining the starting point and the ending point of the process as listed in the scope section of
the team charter.
Enlist the key outputs of the process.
Identify the entity receiving those outputs whether internal or external.
State the top-level process steps without any decision points or feedback loops.
Identify the inputs to process and the entities supplying those inputs.
Systems Thinking - Systems thinking entails observing system as an whole. The term system is
defined as a whole consisting of parts, each of which can affect the other's properties. The
performance of the system is known by how parts interrelate like for a business, the manner in
which sales, procurement, manufacturing and distribution relate to each other determines the
business performance, instead of individual performance. Systems thinking can be applied in
various ways in the Six Sigma project, as
Systems thinking can be used to launch a high-impact initiative for real root cause areas instead
of the symptoms of high level problems.
It can be used to map out the system dynamics around a mission critical Big Y to optimize, and
then identify the various high-leverage daughter projects.
During the define phase, it identifies the possible negative consequences of optimizing the
project Y. Thus, the project team can put avoidance or elimination strategies.
During the measure or analyze phases the system dynamics of the critical Xs can be identified
that affect the project Y that the team has been tasked to optimize.
Six Sigma programs can avoid irrelevant issues and address the real issues by using the systems
thinking. It helps in integrating successful management processes into a single management system
which wisely uses resources while focusing on what is important for customers, shareholders and
employees.
Owners and Stakeholders

Stakeholders are the entity which has interest in the process or the business and they include the
supplier, customer, employees and investors. Similarly the process stakeholder includes the
process operators, executive, managers, suppliers, customer and supporting staff like logistics
persons. The interest of stakeholders may also vary with time.
Process owners are the individuals within an organization, responsible for coordinating and
managing the workflow and activities at every stage of a process. They are also responsible for the
performance of a process against the listed goals and measured by key process indicators. they
have the authority to make necessary changes to the process and it’s stages in achieving the listed
goals.
Project teams having stakeholders and process owners are more effective in achieving the results.
Stakeholder involvement is very helpful as they have the detailed knowledge about the process
thus, they come out with innovative and impactful process improvement whilst considering the
consequences and feasibility of the same.
Customer Identification
Customer identification is crucial task of any Six Sigma project. Various tools like brainstorming,
SIPOC and marketing analysis data are useful for the purpose. Customer identification should be
carried out even if customers are known so as, to be better aware of the customers and reveal any
hidden customers.
Customers can be categorized as internal or external or on basis of location, demography, sex, etc.
The criteria of classification is dependent upon achieving the desired results
Customer Data Collection and Analysis
Capturing customer data which have been identified in earlier steps can be accomplished by
various tools like VOC, survey, etc.
Customer data analysis is the next step after customer data collection. Analysis helps in prioritizing
and understanding customer needs. Various analysis tools are used like Pareto diagram, FMEA,
affinity diagrams, interrelationships digraph, matrix diagrams and priority matrices issue
identification and addressing.
Customer Requirement Mapping
Customer requirement mapping involves identification of processes for improvements as needed
against customer requirements. Quality Function Deployment (QFD) is an effective tool for the
purpose as, QFD is a structured method to identify and prioritize customer’s expectations.
2.2. Project Management

Project Management refers to the process of getting the project completion within the available
resources and designated timeframe effectively and efficiently. It includes various crucial entities
which are
Project Charter and Plan – Project charter is a statement of objectives of a project which also sets
out detailed project goals, roles and responsibilities. It also identifies the main stakeholders. Project
charter henceforth consists of the problem statement for which the project is initiated, the purpose
outlining the goals to be achieved by the project, the scope of the project on enlisting the resource
requirement and the results to achieve in quantifiable terms. Project charter also contains the likely
benefits to the stakeholders for taking up the project and justifies the feasibility for same.
Project plan development involves setting up timelines and milestones to achieve as the project
processes. It acts as the basis on which resource requirements are computed. Various project
planning tools are used for the purpose like Gantt charts, CPM/PERT charts, project schedules,
etc.
Project Risk Analysis is conducted during project planning to work out feasibility of the project as
well develop counter-measures to mitigate risks involved and their impact. Usually aspects of
project which are analyzed are safety, reliability, serviceability, etc. Risk analysis involves
identification and mitigation of risks. Various analysis tools are used like
SWOT (Strengths, Weaknesses, Opportunities and Threats) Matrix - It involves a scan of the
internal and external environment to classify internal as strengths (S) or weaknesses (W), and
those external to the firm can be classified as opportunities (O) or threats (T).
Risk Priority Number - Risk Priority Number (RPN) is a measure risk by assigning the RPN
values range from 1 (absolute best) to 1000 (absolute worst) to identify critical failure modes
with project.
Failure modes and effects analysis (FMEA) - It identifies failures in a project by studying the
impact of all possible failures which are prioritized according to severity, frequency and
identification.
Risk mitigation involves continuous review of risk identification and mitigation plans as during
project progress environmental changes and new risk are identified if any step changes mid way
thus, a risk management system is embedded during project planning.
Project
Project Scope – After defining the project charter and planning, the project scope is finalized thus,
defining the resource requirement and listing the affected departments during the project
execution. Project managers utilize various tools during this step like SIPOC, Pareto charts,
brainstorming, etc to defining and documenting the project scope.
Project Metrics – They are the essential component of project management which shows the status
of the project. Their selection and updation is necessary for proper monitoring of the project’s
progress. Project metrics are tactical and used by project manager to adapt project work flow and
technical activities i.e. guide adjustments to work schedule to avoid delays and assess product
quality on an ongoing basis. Project metrics usually applied measure consumption of time, budget,
other resources and quality of output.
Project Documentation – It involves documenting all objectives, milestones, activities, process and
blueprints of the project or in short all documents from project being conceived to implementation
so as to provide accurate measure of project success. Large projects need more detailed
documentation to cover all aspects of the project. Various graphical tools and techniques are used
like state mapping, storyboard and six sigma projects implement DMAIC methodology thus
documentation is done accordingly with figures and charts showing activity at that stage.
Project Closure – It is the last phase of project which confirms achievement of laid objectives for
the project with completion of required documentation. It also involves discussion with project
sponsors for project completion agreement which involves comparison with the project charter.
2.3. Management and Planning Tools

Various management and planning tools are used which are
Flowchart
It is used to develop a process map. A process map is a graphical representation of a process
which displays the sequence of tasks using flowcharting symbols.
It shows the inputs, actions and outputs of a given system. Inputs are the factors of production like
land, materials, labor, equipment, and management. Actions are the way in which the inputs are
processed and value is added to the product like procedures, handling, storage, transportation, and
processing. Outputs are the finished good or delivered service given to the customer but, output
also includes un-planned and undesirable entities like scrap, rework, pollution, etc. Flowchart
symbols are standardized by ANSI and common symbols used are
Symbol Function
Process Flow
Terminator or start/stop of process
Decision or branching
Data Input or Output
Process or Action step
The flowchart shows a high-level view of a process view and it's capability analysis. The flow chart
can be made either more complex or less complex.
Check Sheets
They consist of lists of items and are indicator of how often each item on the list occurs. It is also
called as confirmation check sheets. They are used for data collection process easier by pre-written
descriptions of events likely to occur like ‘‘Have all inspections been performed?’’ ‘‘How often
does a particular problem occur?’’ ‘‘Are problems more common with part X than with part Y?’’
It is a simple tool for process improvement and problem solving. It can also highlight items of
importance during data collection. They are an effective tool for quality improvement when used
with histograms and Pareto analysis. It is not a check list which is used to ensure that all important
steps or actions have been taken but check sheet is a tally sheet to collect data on frequency of
occurrence of defects or errors. It is of two types
Location or concentration diagram - In it the marking is done on a diagram like before

submitting car to service center, a car diagram is used to list defects at present by marking and
writing on the diagram. Online application forms highlight errors before submission by
highlighting the error section, is also an example of this type.
Graphical or Distribution check sheet - It is commonly used for collecting frequency by
marking to visualize the distribution of the data as shown in diagram below
Pareto charts
It is a type of bar chart in which the horizontal axis represents categories which are usually defects,
errors or sources (causes) of defects/errors. The height of the bars can represent a count or
percent of errors/defects or their impact in terms of delays, rework, cost, etc.
By arranging the bars from largest to smallest, a Pareto chart determines focusing on which
categories will yield the biggest gains if addressed, and which are only minor contributors to the
problem. It is the process of ranking opportunities to determine which of many potential
opportunities should be pursued first. It is used at various stages in a quality improvement program
to determine which step to take next.
Pareto Chart Development – It involves the following steps
Collect data on different types or categories of problems.

Tabulate the scores.
Determine the total number of problems observed and/or the total impact. Also determine the
counts or impact for each category.
For small or infrequent problems, add them together into an "other" category
Sort the problems by frequency or by level of impact.
Draw a vertical axis and divide into increments equal to the total number observed. Do not
make the vertical axis as tall as the tallest bar, which can overemphasize the importance of the
tall bars and lead to false conclusions
Draw bars for each category, starting with the largest and working down.
The "other" category always goes last even if it is not the shortest bar
Cause and Effect Diagram

It helps teams uncover potential root causes by providing structure to cause identification effort. It
is also called as fishbone or Ishikawa diagram. It helps in ensuring new ideas being generated
during brainstorming by not overlooking any major possible cause.
It should be used for cause identification after clearly defining the problem. It is also useful as a
cause—prevention tool by brainstorming ways to maintain or prevent future problems.
Developing Cause and

and Effect Diagram – It involves the following steps
Name the problem or effect of interest. Be as specific as possible.

Write the problem at the head of a fishbone "skeleton"
Decide the major categories for causes and create the basic diagram on a flip chart or
whiteboard.
Typical categories include the manpower, machines, materials, methods, measurements and
environment
Brainstorm for more detailed causes and create the diagram either by working through each
category or open brainstorming for any new input.
Write suggestions onto self-stick notes and arrange in the fishbone format, placing each idea
under the appropriate categories.
Review the diagram for completeness.
Eliminate causes that do not apply
Brainstorm for more ideas in categories that contain fewer items
Discuss the final diagram. Identify causes which are most critical for follow-up investigation.
Tree Diagram
They are also similar to cause and effect diagram but tree diagram break down problem
progressively in detail by partitioning bigger problem into smaller ones. This partitioning brings a
level when the problem seems easy to solve. It is made by starting from right and going towards the
left. It is used by quality improvement programs. Sometimes goals are placed on left and resources
on right and then both are linked to for achievement of goal.
It starts with single entity which branches into two or more, each of which branch into two or more,
and so on. It looks like a tree, with trunk and multiple branches. It is used for known issues whose
specific details are to be addressed for achieving an objective. It also assists in listing other solution,
detailing processes and probing the root cause of a problem. It is also known as systematic diagram
or tree analysis or analytical tree or hierarchy diagram.
Affinity Diagram
The word affinity means a ‘‘natural attraction’’ or kinship. The affinity diagram organizes ideas into
meaningful categories by recognizing their underlying similarity. It reduces data by organizing large
inputs into a smaller number of major dimensions, constructs or categories. It organizes facts,
opinions and issues into natural groups to help diagnose a complex situation or find themes.
It helps to organize a lot of ideas and identify central themes in them. It is useful when information
about a problem is not well organized and solution beyond traditional thinking is needed. It
organizes ideas from a brainstorming session in any phase of DMAIC and can find themes and
messages in customer statements gleaned from interviews, surveys, or focus groups.
Developing Affinity Diagram
Gather inputs from brainstorming session or customer feedbacks.

Write each input on cards and place them randomly.
Allow people to silently start grouping the cards.
When the clustering is done, create a "header" label (on a note or card) for each group.
Write the theme on a larger self-stick note or card (the "Header") and place it at top of cluster.
Continue until all clusters are labeled
Complete the diagram and discuss the results.
Matrix Diagram
It is also known as matrix or matrix chart as it uses a matrix to display information. The matrix
diagram displays relationship amongst two, three or four groups of information like the strength of
relationship amongst the group, the roles played by various groups, etc. It helps in analyzing the
correlations between groups of information. It enables systematic analysis of correlations. Six
different matrix shaped diagram are possible: L, T, Y, X, C and roof–shaped, depending on how
many groups must be compared.
Relationship amongst two groups of entities is done by an L–shaped matrix or roof shaped matrix.
T–shaped, Y–shaped or C–shaped matrix are used to show relationship amongst three groups and
four groups, X–shaped matrix is used. Various matrix types showing relationship is listed below
L-shape T-shape Y-shape X-shape
Interrelationship Digraph
Interrelationship digraphs helps in organizing disparate information, usually ideas generated during
brainstorming sessions. It defines the ways in which ideas influence one another instead of
arranging ideas into groups as done by affinity diagrams.
Similar to affinity diagram, interrelationship digraphs are developed by writing down the ideas or
information on paper like Post-it notes which are then placed on a large sheet of paper and arrows
are drawn between related ideas. An idea that has arrows leaving it but none entering is a root idea.
By evaluating the relationships between ideas the functioning is made clear and usually the root
idea is the key to improving the system.
Benchmarking
Benchmarks are measures (of quality, time, or cost) that have already been achieved by others. It
indicates about the level of possible goal so as to set goals for own operations. It is helpful for
listing new ideas into the process though borrowed from others.
Usually the benchmarking data is sourced from surveys or interviews with industry experts, trade
or professional organizations, published articles, company tours, prior experience of current staff
or conversations.
Types of
of Benchmarks
Internal/Company - It establishes a baseline for external benchmarking Identifies differences
within the company and provides rapid and easy-to-adapt improvements though opportunities
for improvement are limited to the company's practices.
Direct Competition - It prioritizes areas of improvement according to competition and is of
interest to most companies but often involves a limited pool of participants thus, opportunities
for improvement are limited to "known" competitive practices and may lead to potential
antitrust issues.
Industry - It provides industry trend information and is a conventional basis for quantitative
and process-based comparison though opportunities for improvement may be limited by
industry paradigms
Best-in-Class - It examines multiple industries to provide the best opportunity for identifying
radically innovative practices and processes by building a brand new perspective but, usually
difficult to identify best-in-class companies and get them to participate.
Prioritization
Prioritization Matrix
It is used to prioritize is to arrange or deal with in order of importance. A prioritization matrix is a
combination of a tree diagram and a matrix chart and used to help decision makers determine the
order of importance of the activities. It narrows down options by systematically comparing choices
through the selection, weighing, and application of criteria.
It quickly surfaces basic disagreements, forces the team to narrow down all solutions from all
solutions to the best solutions, limits "hidden agendas" by bringing decision criteria to the forefront
of a choice and increases follow-through by asking for consensus after each step of the process.
Developing a prioritization matrix - It involves five simple steps, as
Determine criteria and rating scale - Determine the factors to assess the importance of each
entity. Choose factors that will clearly differentiate important from unimportant which are the
criteria like the value it brings to the customer, etc. Then, for each criteria, establish a rating
scale to use in assessing how well a particular entity satisfies that criteria.
Establish criteria weight - Place criteria in descending order of importance and assign a weight.
Create the matrix - List criteria down the left column and the weight and names of potential
entities across the top in an L-shaped matrix to judge the relative importance of each criterion.
Work in teams to score entities - Review each entity and rate the entity on each of the criteria.
Next, multiply the rating for each criterion by its weight and record the weighted value. After
evaluating the entity against all of the criteria, add up the weighted values to determine the
entity’s total score.
Discuss results and prioritize list - After entities have been scored, undertake a discussion to
compare notes on results and develop a master list of prioritized entities that everyone agrees
upon.
An example of prioritization matrix where, 10 is much less expensive, 5 is less expensive, 1 is
same cost, 0.2 is more expensive and 0.1 is much more expensive
Focus Group
They are facilitated discussion sessions of customers that help an organization understand the
Voice of the Customer (VOC). Usually they are of 1-3 hour sessions with maximum 20 customers.
It facilitates better understanding of the voice of customer and organizes the gathered data. It also
enables evaluation of the feedbacks and channelizes them for further action..
Usually two types of focused groups are applied, first being the explorative focus group which
explores the collective needs of customers, develop and evaluate concepts for new product
development as sensed or demanded by the voice of the customer. The next, experiential focus
group observes the usage of products in the market and study what the customers feel and
experience about the products, learning their reasons and motivations to use the product.
Online focus groups have gained importance in recent times due to access to internet but, the
discussion takes place on the internet instead of a interview site. Online focus groups are more
suited for younger age groups.
Gantt Chart
It is a graphical chart, showing the relationships amongst the project tasks, along with time
constraints. The horizontal axis of a Gantt chart shows the units of time (days, weeks, months,
etc.). The vertical axis shows the activities to be completed. Bars show the estimated start time and
duration of the various activities. A Gantt chart shows what has to be done (the activities) and when
(the schedule) as shown in the figure below
Milestone Charts - Gantt charts are often modified in a variety of ways to provide additional
information. One common variation is milestone charts. The milestone symbol represents an
event rather than an activity; it does not consume time or resources.
CPM/PERT Chart
CPM or "Critical Path Method" - It is a tool to analyze project and determine duration, based on
identification of "critical path" through an activity network. The knowledge of the critical path can
permit project managers to change duration. It is a project modeling technique developed in 1950s
and is used with all forms of projects. It displays activities as nodes or circles with known activity
times.
CPM is a diagram showing every step of the project, as letters with lines to each letter representing
the sequence in which the project steps take place. A list of activities is required to complete the
project and the time (duration) that each activity will take to complete, along with the sequence and
dependencies between activities. CPM lays out the longest path of planned activities to the end of
the project as well as the earliest and latest that each activity can start and finish without delaying
other steps in the project. The project manager can then, determine which activities in the project
need to be completed before others and how long those activities can take before they delay other
parts of the project. They also get to know which set of activities is likely to take the longest, also
called as the critical path which is also the shortest possible time period in which the project can be
completed.
PERT Chart - A PERT chart (program evaluation review technique) is a form of diagram for CPM
that shows activity on an arrow diagram. PERT charts are more simplistic than CPM charts
because they simply show the timing of each step of the project and the sequence of the activities.
In PERT, estimates are uncertain and ranges of duration and the probability that activity duration
will fall into that range is taken whereas CPM is deterministic.
A PERT chart is a graphic representation of a project’s schedule, showing the sequence of tasks,
which tasks can be performed simultaneously, and the critical path of tasks that must be completed
on time in order for the project to meet its completion deadline. The chart can be constructed with
a variety of attributes, such as earliest and latest start dates for each task, earliest and latest finish
dates for each task, and slack time between tasks. A PERT chart can document an entire project or
a key phase of a project. The chart allows a team to avoid unrealistic timetables and schedule
expectations, to help identify and shorten tasks that are bottlenecks, and to focus attention on most
critical tasks. It is most useful for planning and tracking entire projects or for scheduling and
tracking the implementation phase of a planning or improvement effort.
Developing PERT Chart

Identify all tasks or project components - Ensure the team has knowledge of the project so that
during the brainstorming session all component tasks needed to complete the project are
captured. Document the tasks on small note cards.
Identify the first task that must be completed - Place the appropriate card at the extreme left of
the working surface.
Identify any other tasks that can be started simultaneously with task #1 - Align these tasks either
above or below task #1 on the working surface.
Identify the next task that must be completed - Select a task that must wait to begin until task
#1(or a task that starts simultaneously with task #1) is completed. Place the appropriate card to
the right of the card showing the preceding task.
Identify any other tasks that can be started simultaneously with task #2 - Align these tasks either
above or below task #2 on the working surface.
Continue this process until all component tasks are sequenced.
Identify task durations - Reach a consensus on the most likely amount of time each task will
require for completion. Duration time is usually considered to be elapsed time for the task,
rather than actual number of hours/days spent doing the work. Document this duration time
on the appropriate task cards.
Construct the PERT chart - Number each task, draw connecting arrows, and add task
characteristics such as duration, anticipated start date, and anticipated end date.
Determine critical path - The project’s critical path includes those tasks that must start or finish
on time to avoid delays to the total project. Critical paths are typically displayed in red.
Activity Network Diagram

It charts the flow of activity between separate tasks and graphically displays interdependent
relationships between groups, steps, and tasks as they all impact a project. Bubbles, boxes, and
arrows are used to depict these activities and the links between them. It shows the sequential
relationships of activities using arrows and nodes to identify a project’s critical path. It is similar to
the CPM/ PERT and also called as arrow diagram.
Developing Activity Network Diagram - Development starts with compiling a list of tasks essential
for completion of the project. These tasks are then arranged in a chronological order, depending
on the project considering inter-task dependency. All tasks are placed in a progressing line with
tasks that can be done simultaneously, is placed on parallel paths, whereas jobs that are dependent
should be placed in a chronological line. Apply realistic estimate to each task then, enlist the
critical path.
2.4. Business Results

Business results are the outcomes which are measured and were identified during planning stage to
show the impact of the project on organization. It involves performance measures for the business
and the process involved.
Business Performance
It is the crucial performance measure and balanced scorecard is used for it. Balanced scorecard
was developed by Robert S. Kaplan and David P. Norton which focuses on four perspectives
which are
Financial - It focuses relevant high-level financial measures and involves measuring cash flow,
sales growth, operating income and return on equity.
Customer - It identifies measures which are customer facing like percent of sales from new
products, on time delivery, share of important customers’ purchases, ranking by important
customers.
Internal business processes – These measures answers the question "What must we excel at?"
and include cycle time, unit cost, yield, new product introductions.
Learning and growth - It eyes continuity to improve, create value and innovate thus, involves
measures like time to develop new generation of products, life cycle to product maturity, time
to market versus competition.
Project Performance
It usually includes performance indexes on cost, schedule, defects per project, response time, etc.
Two common measures are
Cost Performance Index - It is a measure of the efficiency of expenses spent on a project. It

measures relationship between the budgeted cost of work performed (BCWP) and the actual
work performed (ACWP) as a ratio.
Schedule Performance Index - SPI measures the success of project management to complete
work on time. It is expressed as the ratio of the budgeted cost of work performed (BCWP) to
the budgeted cost of work scheduled (BCWS).
Process
Process Performance
It is a measure of an organization's activities and performance and includes metrics as
Percentage Defective - This is defined as the (Total number of defective parts)/(Total number
of parts) X 100. So if there are 1,000 parts and 10 of those are defective, the percentage of
defective parts is (10/1000) X 100 = 1%
PPM – It is same as the ratio defined in percentage defective, but multiplied by 1,000,000 and
PPM for above example is 10,000. It indicates of presence of one or more defects only.
Defects per Unit (DPU) – It finds the average number of defects per unit which also needs
categorization of the units into number of defects from 0, 1, 2, up to the maximum number. As
an example, the below chart shows defect count for 100 units with maximum of 5 defects.
Defects 0 1 2 3 4 5
# of Units 70 20 5 4 9 1
The average number of defects is DPU = [Sum of all (D * U)]/100 =
[(0 * 70) + (1 * 20) + (2 * 5) + (3 * 4) + (4 * 9) + (5 * 1)]/100 = 47/100 = 0.47
Defects per Opportunity (DPO) – It focus on number of ways of a defect occurrence or the
defect “opportunity”, similar to a failure mode in FMEA. As an example from previous data
considering that each unit can have a defect occurrence in one of 6 possible ways. Then the
number of opportunities for a defect in each unit is 6 and DPO = DPU/O = 0.47/6 = 0.078333
Defects per Million Opportunities (DPMO) - It is obtained by multiplying DPO by 1,000,000
as DPMO = DPO * 1,000,000 = 0.078333 * 1,000,000 = 78,333
Rolled Through Yield (RTY) - A yield measures the probability of a unit passing a step defect-
free, and the rolled throughput yield (RTY) measures the probability of a unit passing a set of
processes defect-free. This takes the percentage of units that pass through several sub-processes
of an entire process without a defect. The number of units without a defect is equal to the
number of units that enter a process minus the number of defective units. For illustration, the
number of units given as an input to a process is P, the number of defective units is D then, the
first-pass yield for each sub-process or FPY is equal to (P – D)/P. After getting FPY for each
sub-process, multiply them altogether to obtain RTY as, the yields of 4 sub-processes are
0.994, 0.987, 0.951 and 0.990, then the RTY = (0.994)(0.987)(0.951)(0.990) = 0.924 or 92.4%.
Sigma Level- A Six Sigma process is the output of process that has a mean of 0 and standard
deviation of 1, with an upper specification limit (USL) and lower specification limit (LSL) set at
+3 and -3. However, there is also the matter of the 1.5-sigma shift which occurs over the long
term. After computing DPMO and RTY, the sigma level can also be computed as
If yield is DPMO is Sigma Level is

30.9% 690,000 1.0
62.9% 308,000 2.0
93.3 66,800 3.0
99.4 6,210 4.0
99.98 320 5.0
99.9997 3.4 6.0
Cost of poor quality - It is also called as the cost of nonconformance, which includes the cost of
all defects as
Internal defects - Before product leaves the organization and includes scrapping, repairing,
or reworking the parts.
External defects - After product leaves the organization and includes costs of warranty,
returned merchandise, or product liability claims and lawsuits.
It is difficult to calculate because the external costs can be delayed by months or even years after
the products are sold thus, the internal costs of poor quality are computed.
Process Capability - It compares the output of an in-control process to the specification limits
by using capability indices. The comparison is made by forming the ratio of the spread
between the process specifications (the specification "width") to the spread of the process
values, as measured by 6 process standard deviation units (the process "width"). It is used to
compare the output of a stable process with the process specifications and make a statement
about how well the process meets specification. There are several statistical measures that are
used to measure the capability of a process as Cp, Cpk and Cr.
Index Description
Estimates what the process is capable of producing if
the process mean were to be centered between the
specification limits. Assumes process output is
approximately normally distributed.
Estimates what the process is capable of producing,
considering that the process mean may not be
centered between the specification limits.
It is 1 divided by Cp.
FMEA and RPN

Failure Mode and Effects Analysis (FMEA) is computed for failure analysis which also involves
Risk Priority Number (RPN) computation. Both have been explained earlier.
2.5. Team Dynamics and Performance

A team is a group of people but every group is not a team. A team is different from a group in the
sense that it is usually small and exists for relatively long period of time till the objective for which it
is formed is accomplished. A team must, ideally, consist of members who possess multifarious
skills to efficiently handle various types of tasks as per job responsibilities and tasks that are to be
carried out. The purpose of forming a team is to improve the internal and external efficiencies of
the company. This is done through the efforts of the team members to improve quality, methods,
and productivity. Management supports the team process by
Ensuring a constancy of purpose

Reinforcing positive results,
Sharing business results
Giving people a sense of mission
Developing a realistic and integrated plan
Providing direction and support
Team Types
A team can generally be classified as ‘formal’ or ‘informal’.
Formal team – It is a team formed to accomplish a particular objective or a particular set of

objectives. The objective of the team formation is called as ‘mission’ or ‘statement of purpose’.
It may consist of a charter, list of team members, letter of authorization and support from the
management.
Informal team – This type of team will not have the documents that a formal team will have.
But an informal team consist versatile membership as the members in it can be changed as per
the requirements of the task on hand.
A team can also be classified into following types depending on a given situation and constraints
that prohibit the formation of either formal or informal teams, as
Virtual team - A virtual team is usually formed to overcome the constraint of geographical locations
which separate members. Some of the characteristics of a virtual team are as follows -
It consists of members who live in different places and who may never meet one another
during the course of accomplishment of the goal of the team.
In a virtual team, the members make use of different technologies like telephone, internet, etc.
to coordinate within the team for the achievement of the common goal.
Process improvement team - It is formed to discover the modifications required in a particular

process in order to improve it. It consists of members who belong to various groups that will be
affected by the proposed changes, thus making it cross functional in nature.
Self-
Self-directed and work group teams - It has wide-ranging goals that are ongoing and repetitive. This
necessitates the team to carry out activities on a daily basis. They are usually formed to make
decisions on matters such as safety, personnel, maintenance, quality, etc.
Team Roles and Member Selection
A team performs optimally when all the members are assigned appropriate roles and they
understand their roles in terms of the overall functioning of the team. Some of the major team
roles and responsibilities are as
Champion
Sets and maintains broad goals for improvement projects in area of responsibility
Owns the process
Coaches and approves changes, if needed, in direction or scope of a project
Finds (and negotiates) resources for projects
Represents the team to the Leadership group and serves as its advocate
Helps smooth out issues and overlaps
Works with Process Owners to ensure a smooth handoff at the conclusion of the project
Regular reviews with Process Owner on key process inputs and outputs
Uses DMAIC tools in everyday problem solving
Process Owner
Maximizes high level process performance
Launches and sponsors improvement efforts
Tracks financial benefit of project
Understands key process inputs and outputs and their relationship to other processes
Key driver to achieve Six Sigma levels of quality, efficiency and flexibility for this process

Participates on GB/BB teams
Team Member
Participates with project leader (GB or BB)
Provides expertise on the process being addressed
Performs action items and tasks as identified
Subject matter expert (SME)
Green Belt (GB)
Leads and/or participates on Six Sigma project teams
Identifies project opportunities within their organization
Know and applies Six Sigma methodologies and tools appropriately
Black Belt (BB)
Proficient in Six Sigma tools and their application
Leads/supports high impact projects to bottom line full-time
Directly supports MBB’s culture change activities
Mentors and coaches Green Belts to optimize functioning of Six Sigma teams
Facilitates, communicates, and teaches
Looks for applicability of tools and methods to areas outside of current focus
Supports Process Owners and Champions
Master Black Belt (MBB)

Owns Six Sigma deployment plan and project results for their organization
Responsible for BB certification
Supervisor for DMAIC BBs; may be supervisor for DFSS BBs
Influences senior management and Champions to support organizational engagement
Leads culture change – communicates Six Sigma methodology and tools
Supports Champions in managing project and project prioritization
Ensures that project progress check, gate review, and closing processes meet corporate
requirements and meet division needs
Communicates, teaches, and coaches
Coach
Some businesses have coaches who support the GBs and others coach the BBs.
Trains Green Belts with help from BBs and MBB
Coaches BBs and GBs in proper use of tools for project success
Is a consulting resource for project teams
Team Stages
Most teams go through four development stages before they become productive - forming,
storming, norming, and performing. Bruce W. Tuckman first identified the four development
stages, which are
Forming - Expectations are unclear. When a team forms, its members typically start out by
exploring the boundaries of acceptable group behavior with leader directs the team. Members
please each other and take pride in being part of new team. This period is also called as
honeymoon period.
Storming - Consists of conflict and resistance to the group’s task and structure. Conflict often
occurs and disagreements slow down the team as every team member positions his position.
However, if dealt with appropriately, these stumbling blocks can be turned into performance
later. This is the most difficult stage for any team to work through.
Norming - A sense of group cohesion develops and team members resolve conflicts by
agreeing on mutually agreeable ideas. Team members use more energy on data collection and
analysis as they begin to test theories and identify root causes. The team develops a routine and
trust amongst members.
Performing - The team begins to work effectively and cohesively as each team member is
independent with responsibility and function.
Transitioning – In this last phase, the team is split as the project ends. If project’s scope is
increased then as per the scope, selective team members continue and rest go back to other
work.
Various conflicts arise amongst members of the team or between members and the leader during
the team formation, relating to the group objectives, structure, or procedures. Several ways to
resolve includes
Do not tighten control or try to force members to conform to the procedures or rules
established in the earlier stage.
If disputes over procedures crop up, opt for a group consensus.
Investigate the reasons behind the conflict and negotiate acceptable solution.
In inter-member conflict, act as a mediator between team members.
Dissuade any counter-productive behavior.
Focus on working together efficiently.
Group norms are enforced on the group by the group itself.
Common problems faced by team includes
Floundering - It can be resolved by reviewing the plan and developing a new plan for
movement.
Reluctant or Dominating participants - It's resolution is to structure the member's participation
and balance it so that it is not tilted towards few members of the team. The leader also acts as
gate-keeper.
Feuds - It is resolved by talking to offending parties in private and developing ground rules for
engagement and behavior.
Team Tools
Various tools are used by team members and leaders during team formation and it’s different
phases which includes
Brainstorming - The brainstorming technique was introduced by Alex Faickney Osborn in his
book Applied Imagination in 1930. It is used as a tool to create ideas about a particular topic and
to find creative solutions to a problem.
Brainstorming Procedure - The first and foremost procedure in conducting brainstorming is to

review the rules and regulations of brainstorming. Some of the rules and regulations are: all the
ideas should be recorded, no scope for criticism, evaluation and discussion of ideas.
The second procedure is to examine the problem that has to be discussed. Ensure that all the team
members understand the theme of brainstorming. Give enough time (i.e., one or two minutes) for
the team members to think about the problem. Ask the team members to think creatively to
generate ideas as much as possible. Record the ideas generated by the members so that everyone
can review those ideas. Proper care has to be taken to ensure that there is no criticism of any of the
ideas and everyone is allowed to be creative.
Brainstorming Rules – Rules to be followed for brainstorming are
Ensure that all the team members participate in the brainstorming session because the more
the ideas that are produced, the greater will be the effect of the solution.
As the brainstorming session is a discussion among various people, no distinction should be
made between them. The ideas generated by other people should not be condemned.
At the time of building people’s ideas, consider each person’s ideas as the best, because the
ideas generated by each individual may be superior to the other person.
While generating ideas, always put more trust on quantitative ideas rather than qualitative
ideas. As a facilitator tally these generated ideas with the team’s performance.
Nominal Group Technique (NGT) - The nominal group technique was introduced by Delbecq,
Van de Ven, and Gustafson in 1971. It is a kind of brainstorming that encourages every participant
to express his/her views. This technique is used to create a ranked list of ideas. In this technique,
all the participants are requested to write their ideas anonymously and the moderator collects the
written ideas and each is voted on by the group. It helps in decision-making and organizational
planning where creative solutions are sought. It is generally carried out on a Six Sigma project to
get feedback from the team members.
NGT Procedure - All the members of the team are asked to create ideas and write them down
without discussing with others. The inputs from all members are openly displayed and each person
is asked to give more explanation about his/her feedback. Each idea is then discussed to get
clarification and evaluation. This is usually a repetitive process. Each person is allowed to vote
individually on the priority of ideas and a group decision is made based on these ratings.
Multi-
Multi-voting - Multivoting, which is also called NGT voting or nominal prioritization, is a simple
technique used by teams to choose the most significant or highest priority item from a list with
limited discussion and difficulty. Generally it follows the brainstorming technique.
Multivoting is used when the group has a lengthy list of possibilities and wants to specify it in a
small list for later analysis and discussion. It is applied after brainstorming for the purpose of
selecting ideas.
Multivoting Procedure – The procedure to be followed for conducting Multivoting, is

Conduct a brainstorming process to create a list of ideas and record the ideas that are created
during this process. After completing this, clarify the ideas and combine them so that everyone
can easily understand. The group should not discuss the ideas at this time.
Participants will vote for the ideas that are eligible for more discussion. Here the participants
are given freedom to vote for as many ideas as they desire. Tally the vote for each item. If any
item gets the majority of votes, it is placed for the next round.
In the next level of voting, the participants can cast their vote for the remaining items in the list.
Participants will continue their voting till they get a proper number of ideas for the group to
examine as a part of the decision-making or problem solving process. When the group holds a
discussion about pros and cons of the project, the remaining ideas are discussed.
This discussion may be completed by a group as a whole.
Continue proper actions by creating a choice of the best option or discovering the top
priorities.
Team
Team Communication
Communication is the exchange of information, ideas and knowledge between sender and receiver
through an accepted code of symbols. It is a two way process. The process is as
an information source or sender, which produces a message
a transmitter or encoding , which encodes the message into signals
a channel, to which signals are adapted for transmission
a receiver or decoding, which decodes the message from the signal
a destination or receiver, where the message arrives.
noise, is any interference with the message traveling along the channel
Communication Types – Communication is either verbal or non-verbal.
Verbal communication - It uses verbal medium like words, speeches, presentations etc. and the
sender shares his/her thoughts in the form of words. The tone of the speaker, the pitch and the
quality of words play a crucial role in verbal communication.
The speaker has to be loud and clear and the content has to be properly defined. While speaking
the pitch ought to be high and clear for everyone to understand and the content must be designed
keeping the target audience in mind. In verbal communication it is the responsibility of the sender
to cross check with the receiver whether he has got the correct information or not and the sender
must give the required response.
Non verbal communication - It involves facial expressions, gestures, hand and hair movements and
body postures for non verbal communication. Any communication made between two people
without words and simply through facial movements, gestures or hand movements is called as non
verbal communication. In other words, it is a speechless communication where content is not put
into words but simply expressed through expressions If one has a headache, one would put his
hand on his forehead to communicate his discomfort - a form of non verbal communication. Non
verbal communications are vital in offices and meetings.
Communication Barriers – Barriers block communication due to which the information to

communicate, is not absorbed correctly by the audience. Various barriers which affect
communications are
Noise – Noises present during communication like in marriages high volume music is used
Cultural barriers – Persons from different culture acts as a barrier like dealing with foreigner
Emotions – Receiver is emotionally charged like sad due to death of near one
Poor retention – Receiver is unable to recall or remember the information
Poor Timing - A last moment communication with deadline may put too much pressure on the
receiver and may result in resentment.
Inappropriate Channel - Poor choice of channel of communication can also be contributory to
them is understanding of the message.
Network Breakdown - Sometime staff may forget to forward a letter or there may be
professional jealousy resulting in closed channel.
Barrier Removal can be done by taking effective steps as per the barrier type. Different barrier
solving steps usually include
Effective Listening
Convey emotional contents of the message
Use appropriate language
Use proper channel
Encourage open communication
Ensure two-way communication
Make best use of body language
Communication techniques for project success - The environment in which conflict is managed is
important. It is essential to manage communications to overcome the barriers and foster a
supportive climate, marked by emphasis on
Presenting ideas or opinions.

Problem orientation- focusing attention on the task.
Spontaneity- communicating openly and honestly.
Empathy: understanding another person's thoughts.
Equality- asking for opinions.
A willingness to listen to the ideas of others.
3. MEASURE
The measure phase has the following objectives of
Defining and identifying the specific processes under investigation.
Defining metrics for measurement of the processes against project objectives.
Establishing the process baseline for validating the present outcomes against defined business
needs and to demonstrate improvement results.
Evaluate measurement system to validate the reliability of data for drawing meaningful
conclusions.
3.1. Process Analysis and Documentation

A process is a group of resources and activities which processes inputs into outputs by value
addition by executing repeatable tasks in a specific order. These activities and resource inputs at
least for crucial process should be documented and controlled.
Process Modeling
Flow charts, process maps, written procedures and work instructions are tools used for process
modeling and documentation.
Flow Charts - A flow chart or process map is a simple graphical tool for documenting the process
flow which is comprehensible to users as it depicts the process sequence. A flow chart examines
each step in detail as each task is represented by a symbol. ANSI standards are present which lists
various symbol types used for representation in a flowchart. Flowchart helps in improvement
identification and can compare the present process to the desired process. Diamond shaped
symbol are used for decision with only two outcomes (yes or no). Common flow chart symbols are
Process Mapping – It depicts a process in schematic format thus providing the ability to visualize
the process under review. Process mapping is similar to flow charting, as it describes a process with
symbols, arrows and words thus avoiding explanations. Process maps are used to outline new
procedures and review old procedures for improvements. Many symbols used are standardized
under ANSI Y15.3. Process maps are usually used to analyze and document top-level processes.
Process mapping consists of different kinds of maps
Relationship Maps show the overall view. They show the departments of an organization and
how they interact with suppliers and customers.
Cross-functional Maps or Swim Lane Charts show which department performs each step and
the inputs and outputs of each step. These maps have more detail than a relationship map but
less than a flowchart.
Relationship Map Cross Functional Map
Written Procedures – Written procedures helps in standardizing the processes thus also enabling
improvement avenues for the process. Written procedures are developed by process owners or
those responsible for the process. Written procedures are used for explaining complex or lengthy
processes or for routine complex tasks thus, making them crucial for being consistent.
Development of written procedures is needed for being comprehensible to the user.
Documenting the process in the form of a procedure facilitates consistency in the process and
avoids mistakes. Written procedure describes the process at a general level whereas, written
instructions are more specific.
Work Instructions – Written instructions are list or sequence of steps to be undertaken usually by
the operational staff for accomplishing a specific task. These instructions are specific to a task
which is usually routine. It enlists step-by-step sequence of activities. Flow charts may also be used
with work instructions to show relationships of process steps. Controlled copies of work
instructions are kept in the area where the activities are performed.
Process Inputs and Outputs
Before a process can be improved, it must first be measured. This is accomplished by identifying
process input variables and process output variables, and documenting their relationships through
cause and effect diagrams, SIPOC and other similar tools. Inputs (Xs) are causes which are
independent variables which results in specific outputs (Ys) or effects which are dependent
variables. Thus, process maps are expanded to cover customer and supplier by SIPOC to gather
measurable data from all.
Cause and Effect (Fishbone) Diagrams – They are also called as Ishikawa diagrams and have been
discussed earlier. They break problems down into small-size pieces and displays possible causes in
a graphical manner. They display how various causes interact with each other and uses
brainstorming rules when generating ideas. A fishbone diagram development consists of

brainstorming, prioritizing and development of an action plan.
It highlights potential causes of a particular problem or the effect. During measure phase, it is used
to brainstorm potential x data. The selected CTQ or CTP are placed in the head of the fish. Each
bone is labeled with categories which usually are people, machine, materials, environment and
methods. Each category is reviewed by the team by brainstorming for input and process data to
collect. The development of fishbone diagram in measure phase involves, review of process maps
developed in define phase, review of categories, putting CTQ or CTP in the head of the fish,
brainstorming and reviewing the diagram with revision as needed.
SIPOC - SIPOC stands for Suppliers-Inputs-Process-Outputs-Customers and has been discussed

earlier. SIPOC addresses issues regarding the input, output, supplier and customers like output
being produced by the process, who provide inputs to this process, what are the inputs, what
resources does this process use, which steps add value, etc. These issues apply to all processes and
SIPOC addresses by putting in place a standard format.
SIPOC development is initiated with persons having knowledge of the process and then
conducting a brainstorming session to describe the problems and garner consensus for resolution.
Development of SIPOC involves identifying the process steps then, identifying the outputs of the
process followed by the customers receiving the outputs of the process, then the inputs and the
supplies of the required inputs.
Relational Matrices - It is a relational matrix which is used to assess the effect of each input (X)
against its output (Y) in a process. It helps team to identify and agree upon outputs critical to the
product and/or customer with level of importance also assigned to each output variable by a
numerical rating. It also highlights the relationship between inputs and outputs (Y=f(x)) and the
relative importance of inputs is also computed.
The procedure of development of relational matrices is to review the process map, list the output
variables (Ys) along the horizontal axis, rate each output in terms of its overall importance (like a
scale of 1 for low importance to 5 for high importance), identify potential inputs (Xs) which
influence the outputs (Ys), rate the effect of each X on each Y, the customer importance rating (Y)
is taken as a weighted response (which is multiplied by the association rating (X) for each
relationship) and the weighted ratings are then added together to compute the importance score.
After which prioritizing is done to focus on specific parts.
3.2. Statistics and Probability

Drawing valid statistical conclusions
conclusions
Drawing statistical conclusions involves usage of enumerative and analytical studies, which are
Enumerative or descriptive studies describes data using math and graphs and focus on the current
situation like a tailor taking a measurement of length, is obtaining quantifiable information which is
an enumerative approach. Enumerative data is data that can be counted. These studies are used to
explain data, usually sample data in central tendency (median, mean and mode), variation (range
and variance) and graphs of data (histograms, box plots and dot plots). Measures calculate from a
sample, called statistics with which these measures describe a population, called as parameters.
A statistic is a quantity derived from a sample of data for forming an opinion of a specified
parameter about the target population. A sample is used as data on every member of population is
impossible or too costly. A population is an entire group of objects that contains characteristic of
interest. A population parameter is a constant or coefficient that describes some characteristic of a
target population like mean or variance.
Analytical (Inferential) Studies - The objective of statistical inference is to draw conclusions about
population characteristics based on the information contained in a sample. It uses sample data to
predict or estimate what a population will do in the future like a doctor taking a measurement like
blood pressure or heart beat to obtain a causal explanation for some observed phenomenon which
is an analytic approach.
It entails define the problem objective precisely, deciding if it will be evaluated by a one or two tail
test, formulating a null and an alternate hypothesis, selecting a test distribution and critical value of
the test statistic reflecting the degree of uncertainty that can be tolerated (the alpha, beta, risk),
calculating a test statistic value from the sample and comparing the calculated value to the critical
value and determine if the null hypothesis is to be accepted or rejected. If the null is rejected, the
alternate must be accepted. Thus, it involves testing hypotheses to determine the differences in
population means, medians or variances between two or more groups of data and a standard and
calculating confidence intervals or prediction intervals.
Statistical Basic Terms – Various statistics terminologies which are used extensively are
Data - facts, observations, and information that come from investigations.

Measurement data sometimes called quantitative data -- the result of using some instrument
to measure something (e.g., test score, weight);
Categorical data also referred to as frequency or qualitative data. Things are grouped
according to some common property(ies) and the number of members of the group are
recorded (e.g., males/females, vehicle type).
Variable - property of an object or event that can take on different values. For example, college
major is a variable that takes on values like mathematics, computer science, etc.
Discrete Variable - a variable with a limited number of values (e.g., gender (male/female).
Continuous Variable – It is a variable that can take on many different values, in theory, any
value between the lowest and highest points on the measurement scale.
Independent Variable - a variable that is manipulated, measured, or selected by the user as
an antecedent condition to an observed behavior. In a hypothesized cause-and-effect
relationship, the independent variable is the cause and the dependent variable is the effect.
Dependent Variable - a variable that is not under the user's control. It is the variable that is
observed and measured in response to the independent variable.
Descriptive Statistics
Central Tendencies - Central tendency is a measure that characterizes the central value of a
collection of data that tends to cluster somewhere between the high and low values in the data. It
refers to measurements like mean, median and mode. It is also called measures of center. It
involves plotting data in a frequency distribution which shows the general shape of the distribution
and gives a general sense of how the numbers are grouped. Several statistics can be used to
represent the "center" of the distribution.
Mean - The mean is the most common measure of central tendency. It is the ratio of the sum
of the scores to the number of the scores. For ungrouped data which has not been grouped in
intervals, the arithmetic mean is the sum of all the values in that population divided by the
number of values in the population as
where, µ is the arithmetic mean of the population, Xi is the ith value observed, N is the
number of items in the observed population and ∑ is the sum of the values. For example, the
production of an item for 5 days is 500, 750, 600, 450 and 775 then the arithmetic mean is µ =
500 + 750 + 600 + 450 + 775/ 5 = 615. It gives the distribution's arithmetic average and
provides a reference point for relating all other data points. For grouped data, an
approximation is done using the midpoints of the intervals and the frequency of the
distribution as
Median – It divides the distribution into halves; half are above it and half are below it when the
data are arranged in numerical order. It is also called as the score at the 50th percentile in the
distribution. The median location of N numbers can be found by the formula (N + 1) / 2.

When N is an odd number, the formula yields an integer that represents the value in a
numerically ordered distribution corresponding to the median location. (For example, in the
distribution of numbers (3 1 5 4 9 9 8) the median location is (7 + 1) / 2 = 4. When applied to
the ordered distribution (1 3 4 5 8 9 9), the value 5 is the median. If there were only 6 values (1
3 4 5 8 9), the median location is (6 + 1) / 2 = 3.5 hence, median is half-way between the 3rd
and 4th scores (4 and 5) or 4.5. It is the distribution's center point or middle value with an
equal number of data points occur on either side of the median but useful when the data set
has extreme high or low values and used with non-normal data
Mode – It is the most frequent or common score in the distribution or the point or value of X
that corresponds to the highest point on the distribution. If the highest frequency is shared by
more than one value, the distribution is said to be multimodal and with two, it is bimodal or
peaks in scoring at two different points in the distribution. For example in the measurements
75, 60, 65, 75, 80, 90, 75, 80, 67, the value 75 appears most frequently, thus it is the mode.
Measures of Spread - Although the average value in a distribution is informative about how scores
are centered in the distribution, the mean, median, and mode lack context for interpreting those
statistics. Measures of variability provide information about the degree to which individual scores
are clustered about or deviate from the average value in a distribution.
Range - The simplest measure of variability to compute and understand is the range. The range
is the difference between the highest and lowest score in a distribution. Although it is easy to
compute, it is not often used as the sole measure of variability due to its instability. Because it is
based solely on the most extreme scores in the distribution and does not fully reflect the
pattern of variation within a distribution, the range is a very limited measure of variability.
Inter-quartile Range (IQR) - Provides a measure of the spread of the middle 50% of the scores.
The IQR is defined as the 75th percentile - the 25th percentile. The interquartile range plays
an important role in the graphical method known as the boxplot. The advantage of using the
IQR is that it is easy to compute and extreme scores in the distribution have much less impact
but its strength is also a weakness in that it suffers as a measure of variability because it discards
too much data. Researchers want to study variability while eliminating scores that are likely to
be accidents. The boxplot allows for this for this distinction and is an important tool for
exploring data.
Variance (σ2) - The variance is a measure based on the deviations of individual scores from the
mean. As, simply summing the deviations will result in a value of 0 hence, the variance is based
on squared deviations of scores about the mean. When the deviations are squared, the rank
order and relative distance of scores in the distribution is preserved while negative values are
eliminated. Then to control for the number of subjects in the distribution, the sum of the
squared deviations, is divided by N (population) or by N - 1 (sample). The result is the average
of the sum of the squared deviations and it is called the variance. The variance is not only a
high number but it is also difficult to interpret because it is the square of a value.
Standard deviation (σ) - The standard deviation is defined as the positive square root of the
variance and is a measure of variability expressed in the same units as the data. The standard
deviation is very much like a mean or an "average" of these deviations. In a normal (symmetric
and mound-shaped) distribution, about two-thirds of the scores fall between +1 and -1 standard
deviations from the mean and the standard deviation is approximately 1/4 of the range in small
samples (N < 30) and 1/5 to 1/6 of the range in large samples (N > 100).
Coefficient of variation (cv) - Measures of variability can not be compared like the standard
deviation of the production of bolts to the availability of parts. If the standard deviation for bolt
production is 5 and for availability of parts is 7 for a given time frame, it can not be concluded
that the standard deviation of the availability of parts is greater than that of the production of
bolts thus, variability is greater with the parts. Hence, a relative measure called the coefficient
of variation is used. The coefficient of variation is the ratio of the standard deviation to the
mean. It is cv = σ / µ for a population and cv = s/ for a sample.
Measures of Shape - For distributions summarizing data from continuous measurement scales,
statistics can be used to describe how the distribution rises and drops.
Symmetric - Distributions that have the same shape on both sides of the center are called
symmetric and those with only one peak are referred to as a normal distribution.
Skewness – It refers to the degree of asymmetry in a distribution. Asymmetry often reflects
extreme scores in a distribution. Positively skewed is when it has a tail extending out to the right
(larger numbers) so, the mean is greater than the median and the mean is sensitive to each
score in the distribution and is subject to large shifts when the sample is small and contains
extreme scores. Negatively skewed has an extended tail pointing to the left (smaller numbers)
and reflects bunching of numbers in the upper part of the distribution with fewer scores at the
lower end of the measurement scale.
Measures of Association – It provides information about the relatedness between variables so as to

help estimate the existence of a relationship between variables and it’s strength. They are
Covariance - It shows how the variable y reacts to a variation of the variable x. Its formula is for
a population cov( X, Y ) = ∑( xi − µx) (yi − µy) / N
Correlation coefficient (r) - It is a number that ranges between −1 and +1. The sign of r will be
the same as the sign of the covariance. When r equals−1, then it is a perfect negative
relationship between the variations of the x and y thus, increase in x will lead to a proportional
decrease in y. Similarly when r equals +1, then it is a positive relationship or the changes in x
and the changes in y are in the same direction and in the same proportion. If r is zero, there is
no relation between the variations of both. Any other value of r determines the relationship as
per how r is close to −1, 0, or +1. The formula for the correlation coefficient for population is
ρ = Cov( X, Y ) /σx σy
Coefficient of determination (r2) - It measures the proportion of changes of the dependent
variable y as explained by the independent variable x. It is the square of the correlation
coefficient r thus, is always positive with values between zero and one. If it is zero, the
variations of y are not explained by the variations of x but if it one, the changes in y are
explained fully by the changes in x but other values of r are explained according to closeness to
zero or one.
Frequency Distributions - A distribution is the amount of potential variation in the outputs of a

process, usually expressed by its shape, mean or variance. A frequency distribution graphically
summarizes and displays the distribution of a process data set. The shape is visualized against how
closely it resembles the bell curve shape or if it is flatter or skewed to the right or left. The
frequency distribution's centrality shows the degree to which the data center on a specific value and
the amount of variation in range or variance from the center.
A frequency distribution groups data into certain categories, each category representing a subset of
the total range of the data population or sample. Frequency distributions are usually displayed in a
histogram. Size is shown on the horizontal axis (x-axis) and the frequency of each size is shown on
the vertical axis (y-axis) as a bar graph. The length of the bars is proportional to the relative
frequencies of the data falling into each category, and the width is the range of the category. It is
used to ascertain information about data like distribution type of the data.
It is developed by segmenting the range of the data into equal sized bars or segments groups then
computing and labeling the frequency vertical axis with the number of counts for each bar and
labeling the horizontal axis with the range of the response variable. Finally, determining the
number of data points that reside within each bar and construct the histogram.
Cumulative Frequency Distribution - It is created from a frequency distribution by adding an

additional column to the table called cumulative frequency thus, for each value, the cumulative
frequency for that value is the frequency up to and including the frequency for that value. It shows
the number of data at or below a particular variable
The cumulative distribution function, F(x), denotes the area beneath the probability density
function to the left of x.
Central limit theorem and sampling distribution of the mean

The central limit theorem is the basis of many statistical procedures. The theorem states that for
sufficiently large sample sizes ( n ≥ 30), regardless of the shape of the population distribution, if
samples of size n are randomly drawn from a population that has a mean µ and a standard
deviation σ , the samples’ means X are approximately normally distributed. If the populations are
normally distributed, the sample's means are normally distributed regardless of the sample sizes.
Hence, for sufficiently large populations, the normal distribution can be used to analyze samples
drawn from populations that are not normally distributed, or whose distribution characteristics are
unknown. The theorem states that this distribution of sample means will have the same mean as
the original distribution, the variability will be smaller than the original distribution, and it will tend
to be normally distributed.
When means are used as estimators to make inferences about a population’s parameters and n ≥
30, the estimator will be approximately normally distributed in repeated sampling. The mean and
standard deviation of that sampling distribution are given as µx = µ and σx = σ/√n. The theorem is
applicable for controlled or predictable processes. Most points on the chart tend to be near the
average with the curve's shape is like bell-shaped and the sides tend to be symmetrical. Using ± 3
sigma control limits, the central limit theorem is the basis of the prediction as, if the process has
not changed, a sample mean falls outside the control limits an average of only 0.27% of the time.
The theorem enables the use of smaller sample averages to evaluate any process because
distributions of sample means tend to form a normal distribution.
Basic Probability
Basic probability concepts and terminology is discussed below
Probability - It is the chance that something will occur. It is expressed as a decimal fraction or a
percentage. It is the ratio of the chances favoring an event to the total number of chances for
and against the event. The probability of getting 4 with a rolling of dice, is 1 (count of 4 in a
dice) / 6 = .01667. Probability then can be the number of successes divided by the total
number of possible occurrences. Pr(A) is the probability of event A. The probability of any
event (E) varies between 0 (no probability) and 1 (perfect probability).
Sample Space - It is the set of possible outcomes of an experiment or the set of conditions.
The sample space is often denoted by the capital letter S. Sample space outcomes are denoted
using lower-case letters (a, b, c . . .) or the actual values like for a dice, S={1,2,3,4,5,6}
Event - An event is a subset of a sample space. It is denoted by a capital letter such as A, B, C,
etc. Events have outcomes, which are denoted by lower-case letters (a, b, c . . .) or the actual
values if given like in rolling of dice, S={1,2,3,4,5,6}, then for event A if rolled dice shows 5 so,
A ={5}. The sum of the probabilities of all possible events (multiple E’s) in total sample space
(S) is equal to 1.
Independent Events - Each event is not affected by any other events for example tossing a coin
three times and it comes up "Heads" each time, the chance that the next toss will also be a
"Head" is still 1/2 as every toss is independent of earlier one.
Dependent Events - They are the events which are affected by previous events like drawing 2
Cards from a deck will reduce the population for second card and hence, it's probability as
after taking one card from the deck there are less cards available as the probability of getting a
King, for the 1st time is 4 out of 52 but for the 2nd time is 3 out of 51.
Simple Events - An event that cannot be decomposed is a simple event (E). The set of all
sample points for an experiment is called the sample space (S).
Compound Events - Compound events are formed by a composition of two or more events.
The two most important probability theorems are the additive and multiplicative laws.
Union of events - The union of two events is that event consisting of all outcomes contained in
either of the two events. The union is denoted by the symbol U placed between the letters
indicating the two events like for event A={1,2} and event B={2,3} i.e. outcome of event A can
be either 1 or 2 and of event B is 2 or 3 then, AUB = {1,2}
Intersection of events - The intersection of two events is that event consisting of all outcomes
that the two events have in common. The intersection of two events can also be referred to as
the joint occurrence of events. The intersection is denoted by the symbol ∩ placed between
the letters indicating the two events like for event A={1,2} and event B={2,3} then, A∩B = {2}
Complement - The complement of an event is the set of outcomes in the sample space that are
not in the event itself. The complement is shown by the symbol ` placed after the letter
indicating the event like for event A={1,2} and Sample space S={1,2,3,4,5,6} then A`={3,4,5,6}
Mutually Exclusive - Mutually exclusive events have no outcomes in common like the
intersection of an event and its complement contains no outcomes or it is an empty set, Ø for
example if A={1,2} and B={3,4} and A ∩ B= Ø.
Equally Likely Outcomes - When a sample space consists of N possible outcomes, all equally
likely to occur, then the probability of each outcome is 1/N like the sample space of all the
possible outcomes in rolling a die is S = {1, 2, 3, 4, 5, 6}, all equally likely, each outcome has a
probability of 1/6 of occurring but, the probability of getting a 3, 4, or 6 is 3/6 = 0.5.
Probabilities for Independent Events or multiplication rule - Independent events occurrence
does not depend on other events of sample space then the probability of two events A and B
occurring both is P(A ∩ B) = P(A) x P(B) and similarly for many events the independence rule
is extended as P(A∩B∩C∩. . .) = P(A) x P(B) x P(C) . . . This rule is also called as the
multiplication rule. For example the probability of getting three times 6 in rolling a dice is 1/6 x
1/6 x 1/6 = 0.00463
Probabilities for Mutually Exclusive Events or Addition Rule - Mutually exclusive events do not
occur at the same time or in the same sample space and do not have any outcomes in
common. Thus, for two mutually exclusive events, A and B, the event A∩B = Ø, and the
probability of events A and B occurring is zero, as P(A∩B) = 0, for events A and B, the
probabilities of either or both of the events occurring is P(AUB) = P(A) + P(B) – P(A∩B) also
called as addition rule. For example let P(A) = 0.2, P(B) = 0.4, and P(A∩B) = 0.5, then
P(AUB) = P(A) + P(B) - P(A∩B) = 0.2 + 0.4 - 0.5 = 0.1
Conditional probability - It is the result of an event depending on the sample space or another
event. The conditional probability of an event (the probability of event A occurring given that
event B has already occurred) can be found as
For example in sample set of 100 items received from supplier1 (total supplied= 60 items and
reject items = 4) and supplier 2(40 items), event A is the rejected item and B be the event if item
from supplier1. Then, probability of reject item from supplier1 is – P(A|B) = P(A∩B)/ P(B),
P(A∩B) = 4/100 and P(B) = 60/100 = 1/15.
3.3. Collecting and Summarizing Data

Process improvement needs it to be measurable by data collection which is critical for any
improvisation.
Types of data and Measurement scales
Data is information that is objective and types of data and measurement scales are discussed next
Types of data – They are of two types, discrete and continuous.
Attribute or discrete data - It is based on counting like the number of processing errors, the
count of customer complaints, etc. Discrete data values can only be non-negative integers such
as 1, 2, 3, etc. and can be expressed as a proportion or percent (e.g., percent of x, percent

good, percent bad). It includes
Count or percentage – It counts of errors or % of output with errors.
Binomial data - Data can have only one of two values like yes/no or pass/fail.
Attribute-Nominal - The "data" are names or labels. Like in a company, Dept A, Dept B,
Dept C or in a shop: Machine 1, Machine 2, Machine 3
Attribute-Ordinal - The names or labels represent some value inherent in the object or
item (so there is an order to the labels) like on performance - excellent, very good, good,
fair, poor or tastes - mild, hot, very hot
Variable or continuous data - They are measured on a continuum or scale. Data values for
continuous data can be any real number: 2, 3.4691, -14.21, etc. Continuous data can be
recorded at many different points and are typically physical measurements like volume, length,
size, width, time, temperature, cost, etc. It is more powerful than attribute as it is more precise
due to decimal places which indicate accuracy levels and specificity. It is any variable measured
on a continuum or scale that can be infinitely divided.
Data are said to be discrete when they take on only a finite number of points that can be
represented by the non-negative integers. An example of discrete data is the number of defects in a
sample. Data are said to be continuous when they exist on an interval, or on several intervals. An
example of continuous data is the measurement of pH. Quality methods exist based on probability
functions for both discrete and continuous data.
Data could easily be presented as variables data like 10 scratches could be reported as total scratch
length of 8.37 inches. The ultimate purpose for the data collection and the type of data are the
most significant factors in the decision to collect attribute or variables data.
Converting Data Types - Continuous data, tend to be more precise due to decimal places but,
need to be converted into discrete data. As continuous data contains more information than
discrete data hence, during conversion to discrete data there is loss of information.
Discrete data cannot be converted to continuous data as instead of measuring how much deviation
from a standard exists, the user may choose to retain the discrete data as it is easier to use.
Converting variable data to attribute data may assist in a quicker assessment, but the risk is that
information will be lost when the conversion is made.
Measurement - A measurement is assigning numerical value to something, usually continuous

elements. Measurement is a mapping from an empirical system to a selected numerical system.
The numerical system is manipulated and the results of the manipulation are studied to help the
manager better understand the empirical system. Measured data is regarded as being better than
counted data. It is more precise and contains more information. Sometimes, data will only occur
as counted data. If the information can be obtained as either attribute or variables data, it is
generally preferable to collect variables data.
The information content of a number is dependent on the scale of measurement used which also
determines the types of statistical analyses. Hence, validity of analysis is also dependent upon the
scale of measurement. The four measurement scales employed are nominal, ordinal, interval, and
ratio and are summarized as
Scale Definition Example Statistics

Nominal Only the presence/absence of an attribute. It go/no-go, percent,
can only count items. Data consists of names success/fail, proportion, chi-
or categories only. No ordering scheme is accept/reject square tests
possible. It has central location at mode and
only information for dispersion.
Ordinal Data is arranged in some order but taste, rank-order
differences between values cannot be attractiveness correlation, sign
determined or are meaningless. It can say or run test
that one item has more or less of an attribute
than another item. It can order a set of items.
It has central location at median and
percentages for dispersion.
Interval Data is arranged in order and differences can calendar time, correlations, t-
be found. However, there is no inherent temperature tests, F-tests,
starting point and ratios are meaningless. multiple
The difference between any two successive regression
points is equal; often treated as a ratio scale
even if assumption of equal intervals is
incorrect. It can add, subtract and order
objects. It has central location at arithmetic
mean and standard deviation for dispersion.
Ratio An extension of the interval level that elapsed time, t-test, F-test,
includes an inherent zero starting point. distance, weight correlations,
Both differences and ratios are meaningful. multiple
True zero point indicates absence of an regression
attribute. It can add, subtract, multiply and
divide. It has central location at geometric
mean and percent variation for dispersion.
Data collection methods
Data collection is based on crucial aspects of what to know, from whom to know and what to do
with the data. Factors which ensure that data is relevant to the project includes
Person collecting data like team member, associate, subject matter expert, etc.
Type of Data to collect like cost, errors, ratings etc.
Time Duration like hourly, daily, batch-wise etc.
Data source like reports, observations, surveys etc.
Cost of collection
Few types of data collection methods includes
Check sheets - It is a structured, well-prepared form for collecting and analyzing data consisting
of a list of items and some indication of how often each item occurs. There are several types of
check sheets like confirmation check sheets for confirming whether all steps in a process have
been completed, process check sheets to record the frequency of observations with a range of
measurement, defect check sheets to record the observed frequency of defects and stratified
check sheets to record observed frequency of defects by defect type and one other criterion. It
is easy to use, provides a choice of observations and good for determining frequency over time.
It should be used to collect observable data when the collection is managed by the same
person or at the same location from a process.
Coded data- It is used when presence of too many digits are to be recorded into small blocks
or during data capturing of large sequences of digits from a single observation or rounding off
errors are observed whilst recording large digit numbers. It is also used if numeric data is used
to represent attribute data or data quantity is not enough for a statistical significance in the
sample size. Various types of coded data collection are
Truncation coding for storing only 3,2 or 9 for 1.0003, 1.0002, and 1.0009
Substitution coding – It stores fractional observation, as integers like expressing the number
32 for 32-3/8 inches with 1/8 inch as base.
Category coding - Using a code for category like "S" for scratch
Adding/subtracting a constant or multiplying/dividing by a factor – It is usually used for
encoding or decoding
Automatic measurements - In it a computer or electronic equipment performs data gathering
without human intervention like radioactive level in a nuclear reactor. The equipment observes
and records data for analysis and action.
Techniques
Techniques for Assuring Data Accuracy and Integrity
Data integrity and accuracy have a crucial in the data collection process as they ensure the
usefulness of data being collected. Data integrity determines whether the information being
measured truly represents the desired attribute and data accuracy determines the degree to which
individual or average measurements agree with an accepted standard or reference value.
Data integrity is doubtful if the data collected does not fulfill the purpose like data collected on
finished good departure gathers data from truck departures but if the data is recorded on
computing device present in the warehouse then integrity is doubtful. Similarly data accuracy is
doubtful if the measurement device does not conforms to the laid down device standards.
Bad data can be avoided by following few precautions like avoiding emotional bias relative to
tolerances, avoiding unnecessary rounding and screening data to detect and remove data entry
errors.
Sampling - Practically all items of population cannot be measured due to cost or being impractical
hence, sampling is used to get a representative group of items to measure. Various sampling
strategies are
Random Sampling - The use of a sampling plan requires randomness in sample selection and
requires giving every part an equal chance of being selected for the sample. The sampling
sequence must be based on an independent random plan. It is the least biased of all sampling
techniques, there is no subjectivity as each member of the total population has an equal chance
of being selected and can also be obtained using random number tables.
Sequential or Systematic Sampling – Init every nth record is selected from a list of the
population. Usually, these plans are ended after the number inspected has exceeded the
sample size of a sampling plan. It is used for costly or destructive testing. If the list does not
contain any hidden order, this strategy is just as random as random sampling.
Stratified Sampling – It selects random samples from each group or process that is different. If
the population has identifiable categories, or strata, that have a common characteristic, random
sampling is used to select a sufficient number of units from each strata. Stratified sampling is
often used to reduce sampling error. The resulting mix of samples can be biased if the
proportion of the samples does not reflect the relative frequency of the groups.
Sample Homogeneity - It occurs when the data chosen for a sample have similar characteristics. It
focuses on how similar the data are in a given sample. If data are from a variety of sources, such as
several production streams or several geographical areas then, the results will reflect these
combined sources. It aims for homogeneous data so as to relate data from a single source to the
degree as much possible, to evaluate and determine the influence from an input of concern on
data. Non-homogeneous data result in errors. Deficiency of homogeneity in data will hide the
sources and make root cause analysis difficult.
Sampling Distribution of Means - If the means of all possible samples are obtained and organized,
we could derive the sampling distribution of the means. The mean of the sampling distribution of
the mean is the mean of the population from which the scores were sampled. Therefore, if a
population has a mean µ, then the mean of the sampling distribution of the mean is also µ.
Sampling Error - The sample statistics may not always be exactly the same as their corresponding
population parameters. The difference is known as the sampling error.
Graphical methods
They are effective tools for the visual evaluation of data is a graph showing the relationship
between variables. They also provide a visual image of the data thus complementing numerical
methods for identifying patterns in the data. They include box plots, stem and leaf plots scatter
diagrams, pattern and trend analysis, histograms, normal probability distributions and Weibull
distributions.
Box plot - It is also called a box-and-whisker plot or “five number summary”. It has five points of
interest, which are the quartiles, the median, and the highest and lowest values and shows how the
data are scattered within those ranges. It shows location, spread and shape of the data. It is used for
graphically showing the variation between multiple variables and the variations within the ranges. In
it, the upper and lower quartiles of the data form the ends of the box, the median forms the
centerline of the box which is also dividing the box and the minimum and maximum data points
are drawn as end points to lines that extend from the box (the whiskers). Outlier data are
represented by asterisks or diamonds outside of the minimum or maximum points. Notches
indicate variability of the median, and widths are proportional to the log of the sample size.
It is used when comparing two or more sets of data or determining significance of an apparent
difference. It is useful with a large number of data sets by providing a graphic summary of a data
set as it visually shows the center, the spread, the overall range and indicates skewness of the
distribution. It is usually used in the early stages of data analysis.
Developing Box plot involves

Enlisting the data in numerical order and computing the median
Enlisting the lower and upper quartile and their medians.
Computing the inter-quartile range and plot the 5-points to a number line (three medians,
lowest and highest value).
Draw a box through the upper and lower quartiles points and a vertical line through the
median point.
Draw the whiskers from each end of the box to the smallest and largest values.
Stem and Leaf Plot - It separates each number into a stem (all numbers but the last digit) and a leaf
(the last digit) like, for the numbers 45, and 59, the stems are 4 and 5, while the leaves are 5 and 9.
It is easy to make and shows shape and distribution quickly. It is a compact depiction of data
showing both variable and categorical data sets. It resembles a histogram and is used to visualize
the spread of a distribution and indicate around what values the data are mainly concentrated. It is
essentially composed of two parts, the stem on the left side of the graph and the leaf on the right.
Data can be read directly from the diagram. It is useful for classifying data and organizing data as it
is collected but all numbers should be whole numbers or of same precision. As in the figure, most
data is in between 70 to 79.
Developing Stem and Leaf Plot
Sort the given data in numerical order (ascending).
Separate the numbers into stems and leaves.

Group the numbers with the same stems.
Histograms - It shows frequencies in data as adjacent rectangles, erected over intervals with an area
proportional to the frequency of the observations in the interval. They are frequency column
graphs that display a static picture of process behavior and require a minimum of 50-100 data
points. It is characterized by the number of data points that fall within a given bar or interval or
frequency. It enables the user to visualize how the data points spread, skew and detect the presence
of outliers. A stable process which is predictable, usually shows a histogram with bell-shaped curves
which is not shown with unstable process even though shapes like exponential, lognormal, gamma,
beta, Poisson, binomial, geometric, etc. are a stable process.
The construction of a histogram starts with the division of a frequency distribution into equal
classes, and then each class is represented by a vertical bar. They are used to plot the density of
data especially of continuous data like weight or height.
Run Charts - It displays how a process performs over time as data points are plotted in
chronological order and connected as a line graph. It is useful in detection of variation or problem
trend or pattern as it is evident in run charts when shift occurs that’s why, it is also called as trend
charts. It can displays sequential data for spotting patterns and abnormalities. It is used for
monitoring and communicating process performance. It is usually used for displaying performance
data over time or for showing tabulations.
Even though trends observable on the run chart might not signify deviation as it might be under
normal limits but, usually it indicates a trend or shift or a cycle. When a run chart exhibits seven or
eight points successively up or down, then a trend is clearly present in the data.
Developing Run Chart

Sequence the input data against time and order the data from lowest to highest.
Calculate the median and the range.
Make the Y-axis scale 1.5 to 2 times the range and of X-axis 2 to 3 times against Y-axis.
Depict the median by a dotted line.
Plot the points and connect them to form a line graph.
Scatter Diagram - It is displays multiple XY coordinate data points represent the relationship
between two different variables on X and Y-axis. It is also called as correlation chart. It depicts the
relationship strength between an independent variable on the vertical axis and a dependent
variable on the horizontal axis. It enables strategizing on how to control the effect of the
relationship on the process. It is also called scatter plots, X-Y graphs or correlation charts.
It graph pairs of continuous data, with one variable on each axis, showing what happens to one
variable when the other variable changes. If the relationship is understood, then the dependent
variable may be controlled. The relationship may show a correlation between the two variables
though correlation does not always refer to a cause and effect relationship. The correlation may be
positive due to one variable moving in one direction and the second variable in the same direction
but, for negative correlation both move in opposite directions. Presence of correlation is due to a
cause-effect relationship, a relationship between one cause and another cause or due to a
relationship between one cause and two or more other causes.
It is used when two variables are related or evaluating paired continuous data. It is also helpful to
identify potential root causes of a problem by relating two variables. The tighter the data points
along the line, the stronger the relationship amongst them and the direction of the line indicates
whether the relationship is positive or negative. The degree of association between the two
variables is calculated by the correlation coefficient. If the points show no significant clustering,
there is probably no correlation.
Developing Scatter Diagram
Collect data for both variables.

Draw a graph with the independent variable on the horizontal axis (x) and the dependent
variable on the vertical axis (y).
For each pair of data, plot a dot (or symbol) where the x-axis value intersects the y-axis value.
Normal Probability Plots - It is used to detect the presence of normal bell curve or Gaussian
distribution in the process data. The plot is defined by mean and variance. For normally
distributed data, the mean and median are very close and may be identical. The normal probability
plot shows whether or not the data are distributed as a standard normal distribution. Normal
distributions will follow a linear pattern. It is also called as normal test plots.
It is used when prediction or taking decisions based on the data distribution and to test the
assumption of normality. In it most of the data concentrate around or on the centerline which
divides the curve into two equal halves. The data is plotted against a theoretical normal distribution
in such a way that the points should form an approximate straight line. Departures from this
straight line indicate departures from normality.
Weibull Plots - It is usually used to estimate the cumulative probability that a given sample will fail
under certain conditions. The data can be used to determine a point at which a certain number of
samples will fail. Once it is known, this information can help design a process such that no part of
the sample approaches the stress limitations. It provides reasonably accurate failure analysis and
forecasts with extremely small samples by providing a simple and useful graphical plot of the
failure data.
The Weibull plot has special scales designed so that the data points will be almost linear if they
follow a Weibull distribution. The Weibull distribution has three parameters but can use only two
if the third is assumed
α is the shape parameter

θ is the scale parameter
γ is the location parameter
Weibull plots usually chart data on the probable life of a product or process which is measured in
hours, miles, or any other metric that describes the time-to-failure. If complete data is available, the
exact time-to-failure is known but for suspended data or right censored, the unit operates
successfully for a known period of time and could have continued for an additional period of time
that is not known whereas, for interval data or left censored, the time-to failure is known but only
within a certain range of time.
3.4. Probability Distributions

Distribution - Prediction and decision-making needs fitting data to distributions (like normal,
binomial, or Poisson). A probability distribution identifies whether a value will occur within a given
range or the probability that a value that is lesser or greater than x will occur or the probability that
a value between x and y will occur.
A distribution is the amount of variation in the outputs of a process, expressed by shape

(symmetry, skewness and kurtosis), average and standard deviation. Symmetrical distributions the
mean represents the central tendency of the data but for skewed distributions, the median is the
indicator. The standard deviation provides a measure of variation from the mean. Similarly
skewness is a measure of the location of the mode relative to the mean thus, if mode is to the
mean's left then the skewness is negative else positive but for symmetrical distribution, skewness is
zero. Kurtosis measures the peakness or relative flatness of the distribution and the kurtosis is
higher for a higher and narrower peak.
Probability Distribution
It is a mathematical formula relating the values of a characteristic or attribute with their probability
of occurrence in the population. It depicts the possible events and the associated probability for
each of these events to occur. Probability distribution is divided as
Discrete data describe a finite set of possible occurrences for the data like rolling a dice with
the random variable can take value from 1, 2, 3, 4, 5 or 6. The most used discrete probability
distributions are the binomial, the Poisson, the geometric, and the hypergeometric distribution.
Continuous data describes a continuum of possible occurrences that is unbroken as, the
distribution of body weight is a random variable with infinite number of possible data points.
Probability Density Function
Probability distributions for continuous variables use probability density functions (or PDF), which
are mathematically model the probability density shown in a histogram but, discrete variables have
probability mass function. PDFs employ integrals as the summation of area between two points
when used in a equation. If a histogram shows the relative frequencies of a series of output ranges
of a random variable, then the histogram also depicts the shape of the probability density for the
random variable hence, the shape of the probability density function is also described as the shape
of the distribution. An example illustrates it
Example: A fast-food chain advertises a burger weighing a quarter-kg but, it is not exactly 0.25 kg.
One randomly selected burger might weigh 0.23 kg or 0.27 kg. What is the probability that a
randomly selected burger weighs between 0.20 and 0.30 kg? That is, if we let X denote the weight
of a randomly selected quarter-kg burger in kg, what is P(0.20 < X < 0.30)?
This problem is solved by using probability density function as, imagine randomly selecting, 100
burgers advertised to weigh a quarter-kg. If weighed the 100 burgers, and created a density
histogram of the resulting weights, perhaps the histogram might be
In this case, the histogram illustrates that most of the sampled burgers do indeed weigh close to
0.25 kg, but some are a bit more and some a bit less. Now, what if we decreased the length of the
class interval on that density histogram then, it will be as
Now, if it is pushed further and the interval is decreased then, the intervals would eventually get
small that we could represent the probability distribution of X, not as a density histogram, but
rather as a curve (by connecting the "dots" at the tops of the tiny rectangles) as
Such a curve is denoted f(x) and is called a (continuous) probability density function. A density
histogram is defined so that the area of each rectangle equals the relative frequency of the
corresponding class, and the area of the entire histogram equals 1. Thus, finding the probability
that a continuous random variable X falls in some interval of values involves finding the area under
the curve f(x) sandwiched by the endpoints of the interval. In the case of this example, the
probability that a randomly selected burger weighs between 0.20 and 0.30 kg is then this area, as
Distributions Types
Various distributions are
Binomial - It is used in finite sampling problems when each observation has only one of two
possible outcomes, such as pass/fail.
Poisson - It is used for situations when an attribute possibility is that each sample can have
multiple defects or failures.
Normal - It is characterized by the traditional "bell-shaped" curve, the normal distribution is
applied to many situations with continuous data that is roughly symmetrical around the mean.
Chi-square - It is used in many situations when an inference is drawn on a single variance or

when testing for goodness of fit or independence. Examples of use of this distribution include
determining the confidence interval for the standard deviation of a population or comparing
the frequency of variables.
Student's t - It is used in many situations when inferences are drawn without a variance known
in the case of a single mean or the comparison of two means.
F - It is used in situations when inferences are drawn from two variances such as whether two
population variances are different in magnitude.
Hypergeometric - It is the "true" distribution. It is used in a similar manner to the binomial
distribution except that the sample size is larger relative to the population. This distribution
should be considered whenever the sample size is larger than 10% of the population. The
hypergeometric distribution is the appropriate probability model for selecting a random sample
of n items from a population without replacement and is useful in the design of acceptance-
sampling plans.
Bivariate - It is created with the joint frequency distributions of modeled variables.
Exponential - It is used for instances of examining the time between failures.
Lognormal - It is used when raw data is skewed and the log of the data follows a normal
distribution. This distribution is often used for understanding failure rates or repair times.
Weibull - It is used when modeling failure rates particularly when the response of interest is
percent of failures as a function of usage (time).
Binomial Distribution
It is used to model discrete data having only two possible outcomes like pass or fail, yes or no and
which are exactly two mutually exclusive outcomes. It may be used to find the proportion of
defective units produced by a process and used when population is large – when N> 50 with small
size of sample compared to the population. The ideal situation is when sample size (n) is less than
10% of the population (N) or n< 0.1N. The binomial distribution is useful to find the number of
defective products if the product either passes or fails a given test. The mean, variance, and
standard deviation for a binomial distribution are µ = np, σ2= npq and σ =√npq. The essential
conditions for a random variable are fixed number of observations (n) which are independent of
each other, every trial results in either of the two possible outcomes and if the probability of a
success is p and the probability of a failure is 1 -p.
The binomial probability distribution equation will show the probability p (the probability of
defective) of getting x defectives (number of defectives or occurrences) in a sample of n units (or
sample size) as
As an example if a product with a 1% defect rate, is tested with ten sample units from the process,
Thus, n= 10, x= 0 and p= .01 then, the probability that there will be 0 defective products is
Poisson Distribution
It estimates the number of instances a condition of interest occurs in a process or population. It
focuses on the probability for a number of events occurring over some interval or continuum
where µ, the average of such an event occurring, is known like project team may want to know the
probability of finding a defective part on a manufactured circuit board. Most frequently, this
distribution is used when the condition may occur multiple times in one sample unit and user is
interested in knowing the number of individual characteristics found like critical attribute of a
manufactured part is measured in a random sampling of the production process with non-
conforming conditions being recorded for each sample. The collective number of failures from the
sampling may be modeled using the Poisson distribution. It can also be used to project the number
of accidents for the following year and their probable locations. The essential condition for a
random variable to follow Poisson distribution is that counts are independent of each other and
the probability that a count occurs in an interval is the same for all intervals. The mean and the
variance of the Poisson distribution are the same, and the standard deviation is the square root of
the mean hence, µ = σ2 and σ =√µ =√σ2.
The Poisson distribution can be an approximation to the binomial when p is equal to or less than
0.1, and the sample size n is fairly large (generally, n >= 16) by using np as the mean of the Poisson
distribution. Considering f(x) as the probability of x occurrences in the sample/interval, λ as the
mean number of counts in an interval (where λ > 0), x as the number of defects/counts in the
sample/interval and e as a constant approximately equal to 2.71828 then the equation for the
Poisson distribution is as
Normal Distribution
A distribution is said to be normal when most of the observations are clustered around the mean.
It charts a data set of which most of the data points are concentrated around the average (mean) in
a symmetrical manner, thus forming a bell-shaped curve. The normal distribution’s shape is
unique in that the most frequently occurring value is in the middle of the range and other
probabilities tail off symmetrically in both directions. The normal distribution is used for
continuous (measurement) data that is symmetric about the mean. The graph of the normal
distribution depends on the mean and the variance. When the variance is large, the curve is short
and wide and when the variance is small, the curve is tall and narrow.
The normal distribution is also called as the Gaussian or standard bell distribution. The
population mean µ is zero and that the population variance σ2 equals one as in the figure and σ is
the standard deviation. The normal probability density function is
For normal distribution, the area under the curve lies between µ − σ and µ + σ.
Z- transformation - The shape of the normal distribution depends on two factors, the mean and
the standard deviation. Every combination of µ and σ represent a unique shape of a normal
distribution. Based on the mean and the standard deviation, the complexity involved in the normal
distribution can be simplified and it can be converted into the simpler z-distribution. This process
leads to the standardized normal distribution, Z = (X − µ)/σ. Because of the complexity of the
normal distribution, the standardized normal distribution is often used instead.
Chi-
Chi-Square Distribution
The chi-square (χ2) distribution is used when testing a population variance against a known or
assumed value of the population variance. It is skewed to the right or with a long tail toward the
large values of the distribution. The overall shape of the distribution will depend on the number of
degrees of freedom in a given problem. The degrees of freedom are 1 less than the sample size. It
is formed by adding the squares of standard normal random variables. For example, if z is a
standard normal random variable, then the following is a chi-square random variable (statistic) with
n degrees of freedom
The chi-square probability density function where v is the degree of freedom and (x) is the
gamma function is
An example of a χ2 distribution with 6 degrees of freedom is as
Student t Distribution
It was developed by W.S. Gosset. The t distribution is used to determine the confidence interval
of the population mean and confidence statistics when comparing the means of sample
populations but, the degrees of freedom for the problem must be know n. The degrees of freedom
are 1 less than the sample size.
The student’s t distribution is a symmetrical continuous distribution and similar to the normal
distribution, but the extreme tail probabilities are larger than for the normal distribution for sample
sizes of less than 31. The shape and area of the t distribution approaches towards the normal
distribution as the sample size increases. The t distribution can be used whenever samples are
drawn from populations possessing a normal, bell-shaped distribution. There is a family of curves,
one for each sample size from n =2 to n = 31.
F Distribution
The F distribution or F-test is a tool used for assessing the ratio of independent variances or
equality of variances from two normal populations. It is used in the Analysis of Variance
(ANOVA, a technique frequently used in the Design of Experiments to test for significant
differences in variance within and between test runs).
If U and V are the variances of independent random samples of size n and m taken from normally
distributed populations with variances of w and z, then
which is a random variable with an F distribution with v1 = n-1 and v2 = m - 1. The F-distribution is
represented by
with (s1)2 is the variance of the first sample (n1- 1 degrees of freedom in the numerator) and (s2)2
is the variance of the second sample (n2- 1 degrees of freedom in the denominator), given two
random samples drawn from a normal distribution.
The shape of the F distribution is non-symmetrical and will depend on the number of degrees of
freedom associated with (s1)2 and (s2)2. The distribution for the ratio of sample variances is
skewed to the right (the large values).
Geometric distribution
It addresses the number of trials necessary before the first success. If the trials are repeated k times
until the first success, we would have k−1 failures. If p is the probability for a success and q the
probability for a failure, the probability of the first success to occur at the kth trial is P(k, p) =
p(q)k−1 with the mean and standard deviation are µ =1/p and σ = √q/p.
Hypergeometric Distribution
Distribution
The hypergeometric distribution applies when the sample (n) is a relatively large proportion of the
population (n >0.1N). The hypergeometric distribution is used when items are drawn from a
population without replacement. That is, the items are not returned to the population before the
next item is drawn out. The items must fall into one of two categories, such as good/bad or
conforming/nonconforming.
The hypergeometric distribution is similar in nature to the binomial distribution, except the sample
size is large compared to the population. The hypergeometric distribution determines the
probability of exactly x number of defects when n items are samples from a population of N items
containing D defects. The equation is
With, x is the number of nonconforming units in the sample (r is sometimes used here if dealing
with occurrences), D is the number of nonconforming units in the population, N is the finite
population size and n is the sample size.
Bivariate Distribution
When two variables are distributed jointly the resulting distribution is a bivariate distribution.
Bivariate distributions may be used with either discrete or continuous data. The variables may be
completely independent or a covariance may exist between them.
The bivariate normal distribution is a commonly used version of the bivariate distribution which
may be used when there are two random variables. This equation was developed by Freund in
1962 as
With
-∞ < x < ∞
-∞ < y < ∞
-∞ < µ1< ∞
-∞ < µ2< ∞
σx> 0, σx> 0
µ1 and µ2 are the two population means
First σ2 and second σ2 are the two variances
ρ is the correlation coefficient of the random variables
Exponential Distribution
It is used to analyze reliability, and to model items with a constant failure rate. The exponential
distribution is related to the Poisson distribution and used to determine the average time between
failures or average time between a numbers of occurrences. The mean and the standard deviation
are µ =1/λ and σ =1/λ.
For example, if there is an average of 0.50 failures per hour (discrete data - Poisson distribution),
then the mean time between failure (MTBF) is 1 / 0.50 = 2 hours (continuous data - exponential
distribution). If a random variable x is distributed exponentially, then its reciprocal y =1/x follows a
Poisson distribution. The opposite is also true. If x follows a Poisson distribution, then the
reciprocal y = 1/x is exponentially distributed. The exponential distribution equation is
With µ is the mean (also sometimes referred to as θ), λ is the failure rate which is the same as1/µ
and x is the x-axis values. When this equation is integrated, it results in cumulative probabilities as
Lognormal Distribution
The most common transformation is made by taking the natural logarithm, but any base logarithm,
such as base 10 or base 2 may be used. It is used to model various situations such as response
time, time-to-failure data, and time-to-repair data. Lognormal distribution is a skewed-right
distribution (with most data in the left tail), and consists of the distribution of the random variable
whose logarithm follows the normal distribution.
The lognormal distribution assumes only positive values. When the data follows a lognormal
distribution, a transformation of data can be done to make the data follow a normal distribution.
Then probabilities, confidence intervals and tests of hypothesis can be conducted (if the data
follows a normal distribution). The lognormal probability density function is
With µ is the location parameter or log mean and σ is the scale (or shape) parameter or standard
deviation of natural logarithms of the individual values.
Lognormal Distribution Plotting Natural Logarithm
Weibull Distribution
The Weibull distribution is a widely used distribution for understanding reliability and is similar in
appearance to the lognormal. It can be used to measure time to fail, time to repair, and material
strength. The shape and dispersion of the Weibull distribution depends on two parameters β
which is the shape parameter and θ which is the scale parameter but, both parameters are greater
than zero.
The Weibull distribution is one of the most widely used distributions in reliability and statistical
applications. The two and three parameter Weibull common versions. The difference is the three
parameter Weibull distribution has a location parameter when there is some non-zero time to first
failure. In general, the probabilities from a Weibull distribution can be found from the cumulative
Weibull function as
With, X is a random variable, x is an actual observation. The shape parameter (β) provides the
Weibull distribution with its flexibility as
If β = 1, the Weibull distribution is identical to the exponential distribution.

If β = 2, the Weibull distribution is identical to the Rayleigh distribution.
If 3 < β < 4, then the Weibull distribution approximates a normal distribution.
3.5. Measurement System Analysis

Measurement
Attribute Screens - Attribute screens use two categories for determining data outcomes, acceptable
or not acceptable, go or no go, pass or fail. This screen is typically used when the percentage of
nonconforming material is high or not known. A screen should evaluate the attributes that are
most helpful in identifying major problems with a product or process.
Gauge Blocks - Gauge blocks are used in manufacturing to set a length dimension for transfer or
for tool calibration. Sets of these blocks usually come in groups of eight to eighty-one. Gauge
blocks are accurate to within a few millionths of an inch.
Measuring Tools
Various measurement tools are
Calipers – They measure distance, depth, height, or length from either an inside or outside
perspective. Most calipers capture physical measurements which are transferred to a scale to
determine the data. Calipers are of types like
Spring calipers – The two sides are connected by a spring to measure difficult to reach
areas. It’s accurate to a tenth of an inch and uses steel ruler to transfer measurement.
Vernier calipers - It uses a vernier scale and are accurate to one thousandth of an inch.
Digital calipers - It uses an electronic readout and are accurate to five thousandths of an
inch.
Optical Comparators – It compares a part to a form that represents the desired dimensions by
projecting a beam of light for a shadow of the object that is magnified by a lens for tolerance
levels.
Micrometers – It is also called as "mics", are handheld measuring devices with a C frame with
the measurement occurring between a fixed anvil and a movable spindle. It is similar to
calipers with a finely threaded screw with a head to show amount of screw movement with use.
It measure items by a combination of readings on a barrel and thimble with accuracy to one
thousandth of an inch.
Measurement System
In order to ensure a measurement method is accurate and producing quality results, a method
must be defined to test the measurement process as well as ensure that the process yields data that
is statistically stable.
Measurement Systems Analysis (MSA) refers to the analysis of precision and accuracy of
measurement methods. It is an experimental and mathematical method of determining how much
the variation within the measurement process contributes to overall process variability.
Characteristics contribute to the effectiveness of a measurement method which is
Accuracy - It is an unbiased true value which is normally reported and is the nearness of
measured result and reference value. It has different components as
Bias - It is the systematic difference between the average measured value and a reference
value. The reference value is an agreed standard, such as a standard traceable to a national
standards body. When applied to attribute inspection, bias refers to the ability of the
attribute inspection system to produce agreement on inspection standards. Bias is
controlled by calibration, which is the process of comparing measurements to standards.
Linearity – It is the difference in bias through measurements. How does the size of the part
affect the accuracy of the measurement method?
Stability – It is the change of bias over time and usage. How accurately does the
measurement method perform over time?
Sensitivity - The gage should be sensitive enough to detect differences in measurement as slight
as one-tenth of the total tolerance specification or process spread.
Precision - It is the ability to repeat the same measurement by the same operator at or near the
same time with nearness of measurement in any random measurement. Its components are
Reproducibility - The reproducibility of a single gage is customarily checked by comparing
the results of different operators taken at different times. It is the variation in the average of
the measurements made by different appraisers using the same measuring instrument when
measuring the identical characteristic on the same part.
Repeatability - It is the variation in measurements obtained with one measurement
instrument when used several times by one appraiser, while measuring the identical
characteristic on the same part. Variation obtained when the measurement system is
applied repeatedly under the same conditions is usually caused by conditions inherent in
the measurement system.
Repeatability serves as the foundation that must be present in order to achieve reproducibility.
Reproducibility must be present before achieving accuracy. Precision requires that the same
measurement results are achieved for the condition of interest with the selected measurement
method.
A measurement method must first be repeatable. A user of the method must be able to repeat the
same results given multiple opportunities with the same conditions. The method must then be
reproducible. Several different users must be able to use it and achieve the same measurement
results. Finally, the measurement method must be accurate. The results the method produces
must hold up to an external standard or a true value given the condition of interest.
Gauge R and R Studies - Assuming that a gauge is determined to be accurate (that is, the
measurements generated by the gauge are the same as those of a recognized standard), the
measurements produced must be repeatable and reproducible. A study must be conducted to
understand how much variance (if any) observed in the process is due to variation in the
measurement system. The most widely used methods to quantify measurement errors are
Range Method - The range method is a simple way to quantify the combined repeatability and
reproducibility of a measurement system.
Average and Range Method - The average and range method computes the total measurement
system variability, and allows the total measurement system variability to be separated into
repeatability, reproducibility, and part variation. It is outlined by AIAG and is a control chart
model using averages and range to study variability in measurement methods. This model
requires two or three replications (r), by two or three appraisers (k), on 10 parts (n). The
average range value is computed as
The average range value is proportionate to the standard deviation of the process. The average
range provides another source of understanding the variation using a specific measurement
method.
Analysis of Variance Method - ANOVA is the most accurate method for quantifying
repeatability and reproducibility and allows the variability of the interaction between the
appraisers and the parts to be determined. It separates the total variability found within a data
set into random and systematic factors. The random factors do not have any statistical
influence on the given data set, while the systematic factors do. It is used mainly to compare
the means of two or more samples though, estimates of variance are the key intermediate
statistics calculated.
For example to check quality of packing boxes being manufactured at different factories with
same manufacturing setup and output of a company, the box samples must be inspected before
they reach the customer. Though, there is low variation in the size or other characteristics of
the boxes ANOVA answers the question of whether the differences (variance) in the boxes
made within each factory are "large" compared to the differences (variance) in the means for
the boxes made at the different factories. Hence, an ANOVA computation compares the
variances among the means to the variances within the samples. What it takes to be "large
enough" for the difference to be statistically significant depends on the sample sizes and the
amount of certainty that is needed in testing.
ANOVA can also report at the interaction between those involved in looking at the
measurement method and the attributes/parts themselves. ANOVA partitions the total
variation as
Choose a small number of parts (usually ten or fewer) in a random manner.
Select a characteristic to be measured.
Number the parts to identify each part specifically.

Select a few technicians or inspectors - usually five or fewer
Require technicians or inspectors to measure the parts using the same measuring device.
Repeat above step to obtain two complete sets of data.
Conduct an ANOVA analysis beginning with the construction of an ANOVA table.
The observed value using an ANOVA study is

Observed Value = Part Mean + Bias + Part Effect + Appraiser Effect + Replication Error
or Observed Value = Reference Value + Deviation
and in equation format
With is the mth measurement taken by appraiser j on the ith part. Assuming that all of
the are independent and normally distributed with mean µ and variance (σ)2 the total
variance is given by
with are the variances due to the part effect, the appraiser effect, and the
replication error.
Measurement Correlation
It means the correlation or comparison of the measurement values from one measurement system
with the corresponding values reported by one or more different measurement systems. A
measurement system or device can be used to compare values against a known standard. The
measurement system or device may also be compared against the mean and standard deviation of
multiple other similar devices, all reporting measurements of the same or similar artifacts, often
referred to as proficiency testing or round robin testing.
Measurement correlation can also mean comparison of values obtained using different
measurement methods used to measure different properties. Examples are correlation of hardness
and strength of a metal, temperature and linear expansion of an item being heated, and weight and
piece count of small parts. It may also identify issues with the measuring device that can be
corrected. Besides repeatability and reproducibility, other components whose combined effect
explains measurement correlation are bias, linearity and P/T variation.
Bias
It is often due to human error. Whether intentional or not, bias can cause inaccurate or misleading
results. In other words, bias causes a difference between the output of the measurement method
and the true value. Types of bias include
Participants tend to remember their previous assessments so, collect assessment sheets
immediately after each trial, change the order of the inputs, transactions or questions and
include an adequate waiting period after the initial trial to make remembering details of the
trial less likely.
Participants spend extra time when they know they are being evaluated, so give specific time
frames.
When equipment is set wrong.
If an instrument underestimates, the bias is negative. If an instrument overestimates, the bias is

positive. The equation for bias is
with, n is the number of times the standard is measured, Xi is the ith measurement and T is the
value of the standard.
Linearity
It is the variation between a known standard throughout the operating range of the gauge. The
purpose of measurement linearity is to determine the reliability of a measuring instrument by
indicating any linearity error or change in the accuracy of the measuring instrument.
When measuring linearity, draw a line through the data points to view a slope (b). The slope is a
"best fit" line that runs through the data points. Linearity is equal to the slope multiplied by the
process variation Vp (tolerance or spread). Typically, the lower the absolute value of the slope, the
better the linearity. The percent linearity is equal to the slope, b , of the best-fit straight line
through the data points, and the linearity is equal to the slope multiplied by process variation, as
The bias at any point can be estimated from the slope and the y-intercept, of the best-fit line, as
If gauge linearity error is relatively high it is due to the gauge is not being calibrated properly at
both the lower and upper ends of its operating range, there are errors in the minimum or
maximum master, the gauge is worn or the internal gauge has faulty design characteristics.
Percent Agreement
Percent agreement between the measurement system and either reference values or the true value
of a variable being measured, can be estimated using a correlation coefficient, “r”. If r = ±1.0, then
there is 100 percent agreement and if r = 0, then there is 0 percent agreement between the
measurement system variables and the reference or true values.
Precision-
Precision-Tolerance Ratio
Precision/Tolerance (P/T) is the ratio between the estimated measurement error (precision) and
the tolerance of the characteristic being measured, where 6σE is the standard deviation of the
measurement system variability, as
The P/T ratio needs to be small to minimize the effect of measurement error. As the P/T ratio
becomes larger, the measurement method loses its ability to indicate a real change in the process.
Values of the estimated ratio [P/T] of 0.1, or less, often are taken to imply adequate gauge capacity.
This is based on the generally used rule that requires a measurement device to be calibrated in
units one-tenth as large as the accuracy required in the final measurement though it is not
applicable every time hence, the gauge must be sufficiently capable to measure product accurately
enough and precisely enough so that the analyst can make the correct decision.
The formula for P/T ratio assumes that measurement errors are independent, measurement errors
are normally distributed and measurement error is independent of the magnitude of the
measurement.
Metrology
It is the science of measurement. The word metrology derives from two Greek words: matron
(meaning measure) and logos (meaning logic). Metrology involves the following
The establishment of measurement standards that are both internationally accepted and
definable
The use of measuring equipment to correlate the extent that product and process data
conforms to specification
The regular calibration of measuring equipment, traceable to established international
standards
Measurement Error
Measurement error is the degree to which the measuring instrument differs from a true value. The
error of a measuring instrument is indication of a measuring instrument minus the true value, as
Measurement error is due to factors, as

Operator variation - This occurs when the same operator realizes variation when using the
same equipment with the same standards.
Operator to operator variation - This occurs when two or more operators realize variation in
results while using the same equipment with the same standards.
Equipment variation - The equipment exhibits erratic measurement results.
Process variation - This occurs when there are two or more methods for using measurement
equipment and those methods yield different results.
Other variation – It includes material variation, software variation, etc.
The confidence interval for the mean of measurements is reduced by obtaining multiple readings
according to the central limit theorem using the following relationship
Total Product Variability

The total variability in a product includes the variability of the measurement process, as
Calibration
Calibration is the comparison of a measurement standard or instrument of known accuracy with
another standard or instrument to detect, correlate, report or eliminate by adjustment, any
variation in the accuracy of the item being compared. The elimination of measurement error is the
primary goal of calibration systems. The calibration systems is used for
Ensures that products and services meet the tolerance range and quality specifications. A well-
maintained calibration system has a positive impact on the quality of products and services
offered to the customer
Ensures that measuring equipment is recalled from use when it is time to be recalibrated.
Periodic recalibration of measuring and test equipment is necessary for measurement accuracy
Ensures that measuring equipment is removed from use when it is incapable of performing its
function with an agreed level of accuracy
Calibration achieves the following goals, as
Reduce quality costs through the early detection of nonconforming products and processes
with the use of measuring equipment of known accuracy
Provide customers with an indication of a supplier’s calibration capabilities
Calibration Schedule - Measuring equipment should be calibrated before initial use and
periodically recalibrated as often as necessary to maintain prescribed accuracies. When production
is continuous, a frequency (or interval) is usually established. When production is sporadic,
calibration is often done on a “prior to use” basis. The recalibration interval will depend on
variables such as historical information, stability, purpose, extent of use, tendency to wear or drift,
how critical the measurement is, the cost of an inaccurate measurement, the environment in which
it is used, etc. Measuring and test equipment should be traceable to records that indicate the date
of the last calibration, by whom it was calibrated and when the next calibration is due. Coding is
sometimes used. It is generally accepted that the interval of calibration of measuring equipment be
based on stability, purpose and degree of usage.
The stability of a measurement instrument refers to the ability of a measuring instrument to

consistently maintain its metrological characteristics over time.
The purpose is important, in general, the critical applications will increase frequency and
minor applications would decrease frequency.
The degree of usage refers to how often an instrument is utilized and to what environmental
conditions an instrument are exposed.
Calibration Standards - In the SI system, most of the fundamental units are defined in terms of
natural phenomena that are unchangeable. This recognized true value is called the standard.
Primary reference standards consist of copies of the international kilogram plus measuring systems
which are responsive to the definitions of the fundamental units and to the derived units of the SI
table. National standards are taken as the central authority for measurement accuracy, and all levels
of working standards are traceable to this “grand” standard.
3.6. Control Chart

Control charts can either be univariate when they monitor a single CTQ characteristic of a product
or service or be multivariate when they monitor more than one CTQ. The univariate control
charts are further classified according to whether they monitor attribute data or variable data.
A typical control chart plots sample statistics and is made up of minimum four lines of, a vertical
line to measure the levels of the sample's means, the two outmost horizontal lines for the UCL and
the LCL; and the center line, which represents the mean of the process. If all of the points plot
between the UCL and the LCL in a random manner, the process is considered to be, in control
which means that the variations are random but are not outside the control limits thus, the process
trends can be predicted because the variations are strictly due to common causes.
The control charts helps in prevent the process from going out of control by detecting the
assignable causes of variation in time and it dissuade from making unnecessary adjustments when
they are not needed. It also determines the natural range (control limits) of a process so as to
compare the range to its specified limits. Control charts inform about the process capabilities and
stability as well. It is a tool for constant process monitoring thus, facilitate the planning of
production resources allocation. Control limits on a control chart are readjusted every time a
significant shift in the process occurs. As per the Western Electric (WECO) rules, a process is said
to be out-of-control if one the following occur
A single point falls outside the 3σ limit

Two out of three successive points fall beyond the 2σ limits
Four out of five successive points fall beyond 1σ from the mean
Eight successive points fall on one side of the center line
Attribute Data univariate chart - It's characteristics resemble binary data — they can only take one
of two given forms like conforming or not conforming, good or bad, etc Attribute data must be
transformed into discrete data to be meaningful. The types of charts used for attribute data are
The p–chart - The p-chart is used when dealing with ratios, proportions, or percentages of
conforming or nonconforming parts in a given sample. A good example for a p-chart is the
inspection of products on a production line. They are either conforming or nonconforming.
The probability distribution used in this context is the binomial distribution with p for the
nonconforming proportion and q (which is equal to 1 − p) for the proportion of conforming
items. Because the products are only inspected once, the experiments are independent from
one another. The first step when creating a p-chart is to calculate the proportion of
nonconformity for each sample as p =m/b where, m represents the number of nonconforming
items, b is the number of items in the sample, and p is the proportion of nonconformity. The
mean proportion is computed as
where, k is the number of samples audited and pk is the kth proportion obtained. The control
limits of a p-chart are
The benefit of the p-chart is that the variations of the process change with the sizes of the
samples or the defects found on each sample.
The np-chart - The np-chart is one of the easiest to build. While the p-chart tracks the
proportion of nonconformities per sample, the np-chart plots the number of nonconforming
items per sample. The audit process of the samples follows a binomial distribution—in other
words, the expected outcome is “good” or “bad,” and therefore the mean number of successes
is np. The control limits for an np-chart are
The c-chart - The c-chart monitors the process variations due to the fluctuations of defects per
item or group of items. The c-chart is useful for the process engineer to know not just how
many items are not conforming but how many defects there are per item. Knowing how many
defects there are on a given part produced on a line might in some cases be as important as
knowing how many parts are defective. Here, non-conformance must be distinguished from
defective items because there can be several nonconformities on a single defective item.
The probability for a nonconformity to be found on an item in this case follows a Poisson
distribution. If the sample size does not change and the defects on the items are fairly easy to
count, the c-chart becomes an effective tool to monitor the quality of the production process. If
c is the average nonconformity on a sample, the UCL and the LCL limits will be given as
The u-chart - One of the premises for a c-chart is that the sample sizes had to be the same. The
sample sizes can vary when a u-chart is being used to monitor the quality of the production
process, and the u-chart does not require any limit to the number of potential defects. Further,
for a p-chart or an np-chart the number of nonconformities cannot exceed the number of
items on a sample, but for a u-chart it is conceivable because what is being addressed is not the
number of defective items but the number of defects on the sample. The first step in creating a
u-chart is to calculate the number of defects per unit for each sample as u = c/ n. where u
represents the average defect per sample, c is the total number of defects, and n is the sample
size. Once all the averages are determined, a distribution of the means is created and then the
mean of the distribution is to be computed as
where k is the number of samples. The control limits are determined based on u and the mean
of the samples, n as
Variable control charts - Control charts monitor not only the means of the samples for CTQ
characteristics but also the variability of those characteristics. When the characteristics are
measured as variable data (length, weight, diameter, and so on), the X -charts, S-charts, and R-
charts are used. These control charts are used more often and they are more efficient in providing
feedback about the process performance. The principle underlying the building of the control
charts for variables is the same as that of the attribute control charts. The whole idea is to
determine the mean, the standard deviation, and the distance between the mean and the control
limits based on the standard deviation.
X charts and R-charts – It is similar to attribute control charts but, quantitative measurements
are considered for the CTQ characteristics instead of qualitative attributes. X -and R-charts
both combined observe the sample means and the variations through their spread. Samples are
taken and measurements of the means X and the ranges R for each sample derived and
plotted on two separate charts. The CL is determined by averaging the X s as, X
=( X 1 + X 2 + X n )/n where, n is the number of samples. The UCL and the LCL are UCL= X
+ 3σ, CL= X and LCL= X + 3σ. The mean range and the standard deviation for normally
distributed data are linked as σ =R/d2 where, the constant d2 is function of n.
Standard error-based X-chart - It is based on the Central Limit Theorem, the standard
deviation used for the control limits is nothing but the standard deviation of the process
divided by the square root of the sample’s size as,
Mean range- based X-chart - With sample sizes n ≤ 10, the variations are also small, so the
range can be used against the standard deviation when constructing a control chart. R is called
the relative range computed as R = d2/σ and the mean range is R = (R1 + R2+···Rk)/k where
Rk is the range of the kth sample. Therefore, the estimator of σ is σ =R/d2. The formulas for
the control limits are
R-chart - In it, the center line will be R and the estimator of sigma is given as σ R = d3σ. The
control limits are
3.7. Process Capability and Performance

Process capability is a predictable pattern of statistically stable behavior where the chance causes of
variation are compared to the engineering specifications. A capable process is a process whose
spread on the bell-shaped curve is narrower than the tolerance range.
Process Capability Studies
A process capability study attempts to quantify whether a process can consistently meet the
standards set by internal or external customers. Since this study yields a prediction, and predictions
should be made from relatively stable processes, a process capability study should only be used in
a relatively controlled and stable process environment.
Measuring capability can be challenging because it is, by definition, a point estimate. Every process
has unpredictable instability, which creates an inherent risk of estimate errors. Since there is no
confidence interval related for mean and standard deviation, there is no confidence interval for
capability, therefore risk cannot be quantified. The user must accept the risk of variability related
to instability. If the variation is due to a common cause, the output will still form a distribution that
is relatively stable as the variation is constant. In this case, a process capability study may be
completed but, if the variation is a result of a special cause, then the output is not as stable and not
as predictable. In this case, a process capability study may have problems with its accuracy.
The objective of a process capability study is to establish a state of control over the manufacturing
process and then maintaining that state of control through time.
Study Procedure – It includes various steps, as
Select a process to study which is critical and can be selected using several techniques like a
Pareto analysis or a cause-and-effect diagram.
Verify or define the process parameters. Verification of what the process entails, its
boundaries, and gain agreement on the process’s definition. Many of these steps are completed
when developing a process map.
Conduct a measurement systems analysis to ensure that the measurement methods produce
sound data.
Select a process capability analysis method like Cpk, Cp, Ppk and Pp.
Obtain the data and conduct an analysis.
Develop an estimate of the process capability. This estimate can be compared to the standards
set by internal or external customers.
After completing a process capability study, address any special causes of variation that can be
isolated. If able, eliminate the special causes that are not desirable. In some cases, a special cause
of variation may be desirable if it produces a better product or output. In that circumstance, if
possible, attempt to make the special cause a common cause to ensure the benefit is achieved
equally on all output.
Identifying Characteristics - Characteristics selected to be part of a process capability study should

meet certain requirements, as
The characteristic should be important relative to the quality of the product or process. A
process may have 15 characteristics, but only one or two should be selected for inclusion in the
process capability study.
The characteristics are Ys or outcomes to process steps that meet customer requirements. The
Ys are changed by changing the Xs or inputs.
The characteristic’s value should be adjustable.
The operating parameters that influence the characteristic should be able to be determined
and controlled.
Sometimes, the characteristic selected has a history of being the most difficult item to control.
Identifying Specifications/Tolerances
The process specifications or tolerances are determined either by customer requirements, industry
standards, or the organization’s engineering department.
Developing Sampling Plans

If the process fits a normal distribution and is in statistical control, then the standard deviation can
be estimated from
For new processes, for example for a project proposal, a pilot run may be used to estimate the
process capability.
Specification Limits - Specification limits are set by the customer, and result from either customer
requirements or industry standards. The amount of variance (process spread) the customer is
willing to accept sets the specification limits. A customer wants a supplier to produce 12-inch
rulers. Specifications call for an acceptable variation of +/- 0.03 inches on each side of the target
(12.00 inches). The customer is saying acceptable rulers will be from 11.97 to 12.03 inches. If the
process is not meeting the customer's specification limits, two choices exist to correct the situation:
Change the process's behavior.

Change the customer's specification (requires customer approval).
Examples of Specification Limits - Specification limits are commonly found in
Blueprints
Engineering drawings and specs
Industry standards
Self-imposed standards within a shop
Federally mandated standards (e.g., emissions controls)
Verifying Stability and Normality - If only common causes of variation are present in a process,
then the output of the process forms a distribution that is stable over time and is predictable. If
special causes of variation are present, the process output is not stable over time.
While the process is currently capable, stability may need to be improved to assure continued
capability. Since the process is stable, but not capable, we can be reasonably sure the lack of
capability is reasonably correct. The process must be improved to become capable. The lack of
stability makes it difficult to estimate the level of capability with any certainty. First, we need to
reduce variation and remove special causes of variation to improve stability so we will have
reasonable estimates of the centering of the process. Following that, we may need to re-center the
process and/or further reduce process variation.
Process performance vs. specification
The performance metric indices establish a controlled process, and then maintain that process
over time. Numbered values are a shortcut method indicating the quality level of a process in parts
per million (ppm). Once the status of the process is determined, the causes in variation (based on
statistical significance) may be identified. Courses of action might be to
Do nothing.
Change the specifications.
Center the process.
Reduce the variation in the Six Sigma process spread.
Accept the losses.
Process Limits - A stable process can be monitored to determine if changes that occur are due to
factors other than random variation. Such observation determines whether changes are necessary
and if any corrective actions are required. Process limits are the voice of the process based on the
variation of the products produced. The supplier collects data over time to determine the variation
in the units against the customer's specification. These data points collected over time establish the
process curve.
Having a predictable process producing 100 percent conformances is the ideal state. Day-to-day
control charts help identify assignable causes to any variations that occur. Control charts are special
types of time series charts in which control limits are calculated around the central location, or
mean, of the variable being plotted.
A process capability diagram displays both the voice of the process and the voice of the customer.
To draw one of these diagrams
Locate the mean of the distribution (X) and draw a normal curve that reflects the upper and
lower process limits (UPL, LPL) to the data.
Draw the customer specifications with the upper and lower limits for those specifications as
appropriate (USL, LSL). Note that a customer may only have a lower limit or just an upper
limit.
Process Performance Metric - It is a measure of an organization's activities and performance and

includes metrics like percentage defective which is defined as the (Total number of defective
parts)/(Total number of parts) X 100. So if there are 1,000 parts and 10 of those are defective, the
percentage of defective parts is (10/1000) X 100 = 1%. Other metrics have been discussed earlier
and are summarized as
Performance Metric Description
Percentage Defective What percentage of parts contain one or more defects?
What is the average number of defective parts per million? This

Parts per Million (PPM) is the same figure in metric 1 above of “percentage defective”
multiplied by 1,000,000.
Defects per Unit (DPU) What is the average number of defects per unit?
What is the average number of defects per opportunity? (where
Defects per Opportunity
opportunity = number of different ways a defect can occur in a
(DPO)
single part
Defects per million The same figure in metric 3 above of defects per opportunity
Opportunities (DPMO) multiplied by 1,000,000
Rolled throughput yield The yield stated as a percentage of the number of parts that go
(RTY) through a multi-stage process without a defect.
The sigma level associated with either the DPMO or PPM level
Process sigma
found in metric 2 or 5 above.
The cost of defects: either internal (rework/scrap) or external
Cost of poor quality
(warranty/product)
Process capability indices

Process capability indices includes Cp and Cpk, who identify the current state of the process and
provide statistical evidence for comparing after-adjustment results to the starting point.
Cp – It measures the ratio between the specification tolerance (USL-LSL) and process spread.
Whenever a process which is normally distributed and is exactly mid-way between the specification
limits, would yield a Cp of 1 if the spread is +/- 3 standard deviations. The usual accepted
minimum value for Cp is 1.33. It’s requirements for both an upper and lower specification and
usage after the process is centered, is the major limitation. It is computed as
It is used to identify the process's current state and measures the actual capability of a process to
operate within customer defined specification limits hence, it should be used when the data set is
from a controlled, continuous process. Hence, it needs standard deviation/Sigma information with
USL and LSL specifications. Cp indicates the amount of variation in the process but not about the
process's ability to align with the target.
Cpk – It measures the absolute distance of the mean to the nearest specification limit. Usually a
Cpk value of minimum 1 and maximum 1.33 is desired. It needs the centering process similar as
that for Cp. Along with Cp, Cpk provides a common measurement for assigning an initial process
capability to center on specification limits. It is computed as
Cp measures "can it fit" while Cpk measures "does it fit.". If Cp= Cpk , then the process is centered.
Cpm - It is also referred to as the Taguchi index. It is more accurate and reliable than the other
indices. It focuses on reducing the variation from a target value (T). Variation from the target T is
expressed as process variability or σ2 and process centering (µ - T), where µ= process average.
Cpm provides a common measurement assigning an initial process capability to a process for
aligning the mean of the sample to the target. It is computed as
With T is the target value, µ is the expected value and σ is the standard deviation. It is applied if
the target is not the center or mean of the USL – LSL or when establishing an initial process
capability during the Measure phase. Higher Cpm value, indicates more likely the output of the
process meet the specs and the target.
Sigma and Process Capability - When means and variances wander over time, a standard deviation
(symbolized by the Greek letter σ) is the most common way to describe how data in a sample
varies from its mean.
A Six Sigma goal is to have 99.99976% error-free work (reducing the defects to 3.4 per million). By
computing sigma and relating to a process capability index such as Ppk, it can be determined the
number of non-conformances (or failure rate) produced by the process. To compute sigma (σ),
use the following equation for a population
With, N is the number of items in the population, is the mean of the population data and x is
each data point.
Process Performance Indices
The most used process performance indices are Pp, Ppk, and Cpm which depict the present status
of the process and also act as an important tool for improvement of the process. These metrics
have a common purpose as process capability indices but, they differ in their approach.
Pp – It measures the ratio between the specification tolerance and process spread. It helps to
measure improvement over time as it signals where the process is in comparison to the customer's
specifications. It is computed as
It is used for collecting continuous data and the process is not in control. It depicts the amount of
variation, but not alignment to the target and for process to be in control, a process must only have
common causes for each of the data points (no data points existing beyond the UCL or LCL).
Ppk – It measures the absolute distance of the mean to the nearest specification limit. It provides
an initial measurement to center on specification limits. It also examines variation within and
between subgroups. It is computed as
It is used with continuous data and the process is not in control. It indicates alignment to the USL
and LSL but not the amount of variation.
Short-
Short-term vs. long-
long-term capability
Short-term capability is measured over a very short time period since it focuses on the machine's
ability based on design and quality of construction. By focusing on one machine with one operator
during one shift, it limits the influence of other outside long-term factors, including operator,
environmental conditions such as temperature and humidity, machine wear and different material
lots.
Thus, short-term capability can measure the machine's ability to produce parts with a specific
variability based on the customer's requirements. Short-term capability uses a limited amount of
data relative to a short time and the number of pieces produced to remove the effects of long-term
components. If the machines are not capable of meeting the customer's requirements, changes
may have a limited impact on the machine's ability to produce acceptable parts. Remember,
though, that short-term capability only provides a snapshot of the situation. Since short-term data
does not contain any special cause variation (such as that found in long-term data), short-term
capability is typically rated higher.
When a process capability is determined using one operator on one shift, with one piece of
equipment, the process variation is relatively small. Control limits based on a short-term process
evaluation are closer together than control limits based on the long-term process.
A modified and R chart can be used for short runs, based on an initial 3 to 10 pieces, using a
calculated value compared with a critical value. Inflated D4 and A2 values are used to establish
control limits. Control limits are recalculated after additional groups are run.
The X and MR chart can also be used for small runs, with a limited amount of data. The X
represents individual data values, and the MR is the moving range, a measure of piece to piece
variability.
Process capability or Cpk values determined from either of these methods must be considered
preliminary information. As the number of data points increases, the calculated process capability
will approach the true capability.
Process capability for attributes data
The control chart represents the process capability, once special causes have been identified and
removed from the process. For attribute charts, capability is defined as the average proportion or
rate of nonconforming product.
For p charts, the process capability is the process average nonconforming, .The proportion
conforming to specification, 1- , may be used.
For np charts, the process capability is the process average non-conforming, .
For c charts, the process capability is the average number of nonconformities, , in a sample
of fixed size n.
For u charts, the process capability is the average number of nonconformities per reporting
unit, .
The average proportion of nonconforming may be reported on a defects per million opportunities
scale by multiplying times 1,000,000.
4. ANALYZE
This phase is the starting of the statistical analysis of the problem. This phase statistically reviews
the families of variation to determine which significant contributors to the output are. The
statistical analysis is done with the development of a theory, null hypothesis. The analysis will "fail
to reject" or "reject" the theory. The families of variation and their contributions are quantified and
relationships between variables are shown graphically and numerically to provide the team
direction for improvements. The main objectives of this phase are
Reduce the number of inputs (X’s) to a manageable number

Determine the presence of noise variables through Multi-Vari Studies
Plan first improvement activities
4.1. Exploratory Data Analysis

Exploratory data analysis or EDA, is the important first step in analyzing the data from an
experiment as it is used for
Detection of mistakes
Checking of assumptions
Preliminary selection of appropriate models
Determining relationships among the explanatory variables, and
Assessing the direction and rough size of relationships between explanatory and outcome
variables.
EDA does not include any formal statistical modeling and inference. The four types of EDA are
univariate non-graphical, multivariate non-graphical, univariate graphical, and multivariate
graphical.
Multi-
Multi-vari studies
Usually the variation is within piece and the source of this variation is different from piece-to-piece
and time-to-time variation. The multi-vari chart is a very useful tool for analyzing all three types of
variation. Multi-vari charts are used to investigate the stability or consistency of a process. The
chart consists of a series of vertical lines, or other appropriate schematics, along a time scale. The
length of each line or schematic shape represents the range of values found in each sample set.
The multi-vari chart presents an analysis of the variation in a process, hereby differentiating
between three main sources
Intra-piece, the variation within a piece, batch, lot, etc

Inter-piece, the additional variation between pieces.
Temporal variation, variation which is related to time.
Data can be grouped in terms of sources of variation to help define the way measurements are
partitioned. These sources describe characteristics of populations, and the few common types are
classifications (by category), geography (of a distribution center or a plant), geometry (chapters of a
book or locations within buildings), people (tenure, job function or education) and time (deadlines,
cycle time or delivery time).
We can stratify the data to help us understand the way our processes work by categorizing the
individual measurements. This helps us understand the variation of the components as it relates to
the whole process. For example, errors are being tracked in a process. The variation could be
within a subgroup (within a certain batch), between subgroups (from one batch to another batch)
or over time (time of day, day of week, shift or even season of the year). Interpretation of the chart
is apparent once the values are plotted. The advantages of multi-vari charts are
It can dramatize the variation within the piece (positional).

It can dramatize the variation from piece to piece (cyclical).
It helps to track any time related changes (temporal).
It helps minimize variation by identifying areas to look for excessive variation. It also identifies
areas not to look for excessive variation.
Sources of variation in multi-vari analysis can be

Within Individual Sample - Variation is present upon repeat measurements within same
sample.
Piece to Piece - Variation is present upon measurements of different samples collected within a
short time frame.
Time to Time - Variation is present upon measurements collected with a significant amount of
time between samples.
Multi-vari analysis is applicable to either product or service as it can control variation for both as
Within Individual Sample variations like Measurement Accuracy, Out of Round, Irregularities
in Part, Measurement Accuracy and Line Item Complexity
Piece to Piece variations like Machine fixturing, Mold cavity differences, Customer
Differences, Order Editor, Sales Office and Sales Rep
Time to Time variations like Material Changes, Setup Differences, Tool Wear, Calibration
Drift, Operator Influence, Seasonal Variation, Management Changes, Economic Shifts and
Interest Rate
Steps to develop multi-vari chart

Plot the first sample range with a point for the maximum reading obtained, and a point for the
minimum reading. Connect the points and plot a third point at the average of the within
sample readings
Plot the sample ranges for the remaining “piece to piece” data. Connect the averages of the
within sample readings.
Plot the “time to time” groups similarly.
Interpreting the multi-vari chart

Within Piece - It is characterized by large variation in readings taken of the same single sample,
often from different positions within the sample, as shown below
Piece to Piece - It is characterized by large variation in readings taken between samples taken
within a short time frame, as shown below
Time to Time - It is characterized by large variation in readings taken between samples taken
in groups with a significant amount of time elapsed between groups, as shown below
Simple linear correlation and regression

Correlation
Correlation is tool that is with a continuous x and a continuous y. The Pearson correlation
coefficient (r) measures the linear relationship between the x and y as discussed earlier. Causation
is different from correlation as the correlation is the mutual relation that exists between two or
more things while causation is the fact that something causes an effect. The correlation between
two variables does not imply that one is as a result of the other. The correlation value ranges from -
1 to 1. The closer to value 1 signify positive relationship with x and y going in same direction
similarly if nearing -1, both are in opposite direction and zero value means no relationship between
the x and y.
Confidence in a relationship is computed both by the correlation coefficient and by the number of
pairs in data. If there are very few pairs then the coefficient needs to be very close to 1 or –1 for it
to be deemed ‘statistically significant’, but if there are many pairs then a coefficient closer to 0 can
still be considered ‘highly significant’. The standard method used to measure the ‘significance’ of
analysis is the p-value. It is computed as

For example to know the relationship between height and intelligence of people is significant, it
starts with the ‘null hypothesis’ which is a statement ‘height and intelligence of people are
unrelated’. The p-value is a number between 0 and 1 representing the probability that this data
would have arisen if the null hypothesis were true. In medical trials the null hypothesis is typically
of the form that the use of drug X to treat disease Y is no better than not using any drug. The p-
value is the probability of obtaining a test statistic result at least as extreme as the one that was
actually observed, assuming that the null hypothesis is true. Project team usually "reject the null
hypothesis" when the p-value turns out to be less than a certain significance level, often 0.05. The
formula to calculate the p value for Pearson's correlation coefficient (r) is p=r/Sqrt(r^2)/(N—2).
Linear Regression
When the input and output variables are both continuous and to see a relationship between the
two variables, regression and correlation are used. Determining how the predicted or dependent
variable (the response variable, the variable to be estimated) reacts to the variations of the
predicator or independent variable (the variable that explains the change) involves first to
determine any relationship between them and it's importance. Regression analysis builds a
mathematical model that helps making predictions about the impact of variable variations.
Usually, there is more than one independent variable causing variations of a dependent variable
like changes in the volume of cars sold depends on the price of the cars, the gas mileage, the
warranty, etc. But the importance of all these factors in the variation of the dependent variable (the
number of cars sold) is disproportional. Hence, project team should concentrate on one important
factor instead of analyzing all the competing factors.
In simple linear regression, prediction of scores on one variable is done from the scores on a
second variable. The variable to predict is called the criterion variable and is referred to as Y. The
variable to base predictions on is called the predictor variable and is referred to as X. When there
is only one predictor variable, the prediction method is called simple regression. In simple linear
regression, the predictions of Y when plotted as a function of X form a straight line.
As an example, data for X and Y are listed below and having a positive relationship between X and
Y. For predicting Y from X, the higher the value of X, the higher prediction of Y.
X Y
1.00 1.00
2.00 2.00
3.00 1.30
4.00 3.75
5.00 2.25

Linear regression consists of finding the best-fitting straight line through the points. The best-fitting
line is called a regression line. The diagonal line in the figure is the regression line and consists of
the predicted score on Y for each possible value of X. The vertical lines from the points to the
regression line represent the errors of prediction. As the line from 1.00 is very near the regression
line; its error of prediction is small and similarly for the line from 1.75 is much higher than the
regression line and therefore its error of prediction is large.
The error of prediction for a point is the value of the point minus the predicted value (the value on
the line). The below table shows the predicted values (Y') and the errors of prediction (Y-Y') like,
for the first point has a Y of 1.00 and a predicted Y (called Y') of 1.21 hence, its error of prediction
is -0.21.
X Y Y' Y-Y' (Y-

(Y-Y')2
1.00 1.00 1.210 -0.210 0.044
2.00 2.00 1.635 0.365 0.133
3.00 1.30 2.060 -0.760 0.578
4.00 3.75 2.485 1.265 1.600
5.00 2.25 2.910 -0.660 0.436
The most commonly-used criterion for the best-fitting line is the line that minimizes the sum of the
squared errors of prediction. That is the criterion that was used to find the line in the figure. The
last column in the above table shows the squared errors of prediction. The sum of the squared
errors of prediction shown in the above table is lower than it would be for any other regression
line.
The regression equation is calculated with the mathematical equation for a straight line as y = b0+
b1 X where, b0 is the y intercept when X= 0 and b1 is the slope of the line with the assumption
that for any given value of X, the observed value of Y varies in a random manner and possesses a
normal probability distribution. For calculations are based on the statistics, assuming MX is the
mean of X, MY is the mean of Y, sX is the standard deviation of X, sY is the standard deviation of
Y, and r is the correlation between X and Y, a sample data is as
MX MY sX sY r
3 2.06 1.581 1.072 0.627
The slope (b) can be calculated as b = r sY/sX and the intercept (A) as A = MY - bMX. For the
above data, b = (0.627)(1.072)/1.581 = 0.425 and A = 2.06 - (0.425)(3) = 0.785. The calculations
have all been shown in terms of sample statistics rather than population parameters. The formulas
are the same but need the usage of the parameter values for means, standard deviations, and the
correlation.
Least Squares Method – In this method, for computing the values of b1 and b0, the vertical
distance between each point and the line called the error of prediction is used. The line that
generates the smallest error of predictions will be the least squares regression line. The values of
b1 and b0 are computed as

The P-value is determined by referring to a t-distribution with n-2 degrees of freedom.

Simple Linear Regression Hypothesis Testing
Hypothesis tests can be applied to determine whether the independent variable (x) is useful as a
predictor for the dependent variable (y). The following are the steps using the cost per transaction
example for hypothesis testing in simple regression
Determine if the conditions for the application of the test are met. There is a population
regression equation Y = β0+ β1 so that for a given value of x, the prediction equation is
Given a particular value for x, the distribution of y-values is normal. The distributions of y-values
have equal standard deviations. The y-values are independent.
Establish hypotheses.
Ho:b1= 0 (the equation is not useful as a predictor of y - cost per transaction)
Ha:b1≠ 0 (the equation is useful as a predictor of y - cost per transaction)
Decide on a value of alpha.

Find the critical t values. Use the t–table and find the critical values with +/- tσ/2 with n – 2 df.
Calculate the value of the test statistic t. The confidence interval formula is used to determine
the test statistic
Interpret the results. If the test statistic is beyond one of the critical values greater than tα/2 OR less
than -tα/2 reject the null hypothesis; otherwise, do not reject.
Multiple Linear Regression
Multiple linear regression expands on the simple linear regression model to allow for more than
one independent or predictor variable. The general form for the equation is y = b0+ b1x + ... bn+ e
where, (b0,b1,b2…) are the coefficients and are referred to as partial regression coefficients. The
equation may be interpreted as the amount of change in y for each unit increase in x (variable)
when all other xs are held constant. The hypotheses for multiple regression are Ho:b1=b2= ... =bn
Ha:b1≠ 0 for at least one i.
It is an extension of linear regression to more than one independent variable so a higher

proportion of the variation in Y may be explained as first-order linear model

And second-order linear model
R2 the multiple coefficient of determination has values in the interval 0<=R2<=1
Source DF SS MS
Regression k SSR MSR=SSR/k
Error n-(k+1) SSE MSE=SSE[n-(k+1)]
Total n-1 Total SS
Where k is the number of predictor variables.

Coefficient of Determination
Coefficients are estimated by minimizing the sum of squares (SS) residuals. The coefficients follow
a t-distribution, which allows us to use t-tests to assess their significance. The coefficient of
determination, R2, or multiple regression coefficients, is the proportion of variation in Y that can
be explained by the regression model and is the square of r. In multiple regression, R2adj
(adjusted value) represents the percent of explained variation when the model is adjusted for the
number of terms in it. Ideally, R2 should be equal to 1, indicating that all of the variation is
explained by the regression model. 0 ≤ R2≤ 1 Related to the coefficient of determination is the
correlation coefficient, which ranges from -1 ≤ r ≤ 1 and determines whether there is a positive or
negative correlation in the regression analysis, where r is the coefficient of correlation determined
by sample data and an estimate of ρ(rho), the population parameter.
Coefficient of Determination Equation
R2 Equation
R2= SSregression / SStotal = (SStotal – SSerror) / SStotal = 1- [SSerror/ SStotal]
Where SS = the sum of squares R2 adj Equation
R2adj= 1- [SSerror/ (n – p)] / [SS total / (n -1)]
Where
n = number of data points
p = number of terms in the model including the constant
Unlike R2, R2 adj can become smaller when added terms provide little new information and as the
number of model terms gets closer to the total sample size. Ideally, R2 adj should be maximized
and as close to R2 as possible
4.2. Hypothesis Testing

A hypothesis is a theory about the relationships between variables. Statistical analysis is used to
determine if the observed differences between two or more samples are due to random chance or
to true differences in the samples.

Basics
A hypothesis is a value judgment, a statement based on an opinion about a population. It is
developed to make an inference about that population. Based on experience, a design engineer
can make a hypothesis about the performance or qualities of the products she is about to produce,
but the validity of that hypothesis must be ascertained to confirm that the products are produced to
the customer’s specifications. A test must be conducted to determine if the empirical evidence
does support the hypothesis.
Practical Significance - Practical significance is the amount of difference, change or improvement

that will add practical, economic or technical value to an organization.
Statistical Significance - Statistical significance is the magnitude of difference or change required to

distinguish between a true difference, change or improvement and one that could have occurred by
chance. The larger the sample size, the more likely the observed difference is close to the actual
difference.
To achieve victory in a project, both practical and statistical improvements are required. It is
possible to find a difference to be statistically significant but not of practical significance. Because of
the limitations of cost, risk, timing, etc., project team cannot implement practical solutions for all
statistically significant Xs. Determining practical significance in a Six Sigma project is not the
responsibility of the Black Belt alone. Project team need to collaborate with others such as the
project sponsor and finance manager to help determine the return on investment
(ROI) associated with the project objective.
Null Hypothesis - The first step consists in stating the hypothesis. It is denoted by H0, and is read
“H sub zero.” The statement will be written as H0: µ = 20%. A null hypothesis assumes no
difference exists between or among the parameters being tested and is often the opposite of what is
hoped to be proven through the testing. The null hypothesis is typically represented by the symbol
Ho.
Alternate Hypothesis - If the hypothesis is not rejected, exactly 20 percent of the defects will
actually be traced to the CPU socket. But if enough evidence is statistically provided that the null
hypothesis is untrue, an alternate hypothesis should be assumed to be true. That alternate
hypothesis, denoted H1, tells what should be concluded if H0 is rejected. H1 : µ ≠ 20%.
An alternate hypothesis assumes that at least one difference exists between or among the
parameters being tested. This hypothesis is typically represented by the symbol Ha.
Test Statistic - The decision made on whether to reject H0 or fail to reject it depends on the
information provided by the sample taken from the population being studied. The objective here
is to generate a single number that will be compared to H0 for rejection. That number is called the
test statistic. To test the mean µ, the Z formula is used when the sample sizes are greater than 30,
and the t formula is used when the samples are smaller,

The level of risk - It addresses the risk of failing to reject a hypothesis when it is actually false, or
rejecting a hypothesis when it is actually true.
Type I error (False Positive)

Positive) – It occurs when one rejects the null hypothesis when it is true. The
probability of a type I error is the level of significance of the test of hypothesis, and is denoted by
*alpha*. Usually a one-tailed test of hypothesis is used when one talks about type I error or alpha
error.
Type II
II error (False Negative) – It occurs when one rejects the alternative hypothesis (fails to reject
the null hypothesis) when the alternative hypothesis is true. The probability of a type II error is
denoted by *beta*. One cannot evaluate the probability of a type II error when the alternative
hypothesis is of the form µ > 180, but often the alternative hypothesis is a competing hypothesis of
the form: the mean of the alternative population is 300 with a standard deviation of 30, in which
case one can calculate the probability of a type II error.
Decision Rule Determination - The decision rule determines the conditions under which the null
hypothesis is rejected or not. The critical value is the dividing point between the area where H0 is
rejected and the area where it is assumed to be true.
Decision Making - Only two decisions are considered, either the null hypothesis is rejected or it is
not. The decision to reject a null hypothesis or not depends on the level of significance. This level
often varies between 0.01 and 0.10. Even when we fail to reject the null hypothesis, we never say
“we accept the null hypothesis” because failing to reject the null hypothesis that was assumed true
does not equate proving its validity.
Testing for a Population Mean - When the sample size is greater than 30 and σ is known, the Z
formula can be used to test a null hypothesis about the mean. The Z formula is as
Phrasing - In hypothesis testing, the phrase “to accept” the null hypothesis is not typically used. In
statistical terms, the Six Sigma Black Belt can reject the null hypothesis, thus accepting the alternate
hypothesis, or fail to reject the null hypothesis. This phrasing is similar to jury's stating that the
defendant is not guilty, not that the defendant is innocent.

One Tail Test - In a one-tailed t-test, all the area associated with a is placed in either one tail or the
other. Selection of the tail depends upon which direction to bs would be (+ or -) if the results of the
experiment came out as expected. The selection of the tail must be made before the experiment is
conducted and analyzed.
Two Tail Test - If a null hypothesis is established to test whether a population shift has occurred,
in either direction, then a two tail test is required. The allowable α error is generally divided into
two equal parts.
Sample Size - It has been assumed that the sample size (n) for hypothesis testing has been given
and that the critical value of the test statistic will be determined based on the α error that can be
tolerated. The sample size (n) needed for hypothesis testing depends on the desired type I (α) and
type II (β ) risk, the minimum value to be detected between the population means (µ - µ0) and
the variation in the characteristic being measured (S or σ).
The steps for performing hypothesis testing are
Define the practical problem. From the Define and Measure phases, we have used tools such
as the cause-and-effect diagram, process mapping, matrix diagrams, FMEA and graphical data
analysis to identify potential Xs. Now statistical testing is needed to determine significance.
Define the practical objective. Define logical categorizations where differences might exist so
that meaningful action can be taken. Determine what to prove (i.e., what questions will the
hypothesis test answer?)
Establish hypotheses to answer the practical objective. The following is an example of
hypotheses using a test of means where the mean of each shift is equal against the alternative
where they are not equal Null Hypothesis is Ho: µ1st shift= µ2nd shift= µ3rd shift then,
alternate hypothesis is Ha: At least one mean is different
Select the appropriate statistical test. Based on the data that has been collected and the
hypothesis test established to answer the practical objective, refer to the Hypothesis Testing
Road Map to select statistical tests. The roadmap is a very important tool to use with each
hypothesis test.
Define the alpha (α) risk. The Alpha Risk (i.e., Type II Error or Producer’s Risk) is the
probability of rejecting the null hypothesis when it is true (i.e., rejecting a good product when i
meets the acceptable quality level). Typically, (α) = 0.05 or 5%. This means there is a 95% (1-α)
probability it is failing to reject the null hypothesis when it is true (correct decision).

Define the beta (β) risk. The Beta Risk (i.e., Type II Error or Consumer’s Risk) is the
probability of failing to reject the null hypothesis when there is significant difference (i.e., a
product is passed on as meeting the acceptable quality level when in fact the product is bad).
Typically, (β) = 0.10 or 10%.
Establish delta (δ). Delta (δ) is the practical significant difference detected in the hypothesis test
between
Determine the sample size (n). Sample size depends on the statistical test, type of data and
alpha and beta risks. There are statistical software packages to calculate sample size. It is also
possible to manually calculate sample sizes using a sample size table. Using the formula below,
where the data is continuous and Z = 1.96 at a confidence level of 95% (α/2 = 0.025 Two Tail
Test), σ2 is the standard deviation, and E is the margin of error (the range of values around the
estimate that probably contains the true value).
Conduct the statistical tests. Use the Hypothesis Test blueprint to lead in the right direction for
the type of data collected.
Collect the data. Data collection is based on the sampling plan and method.
Develop statistical conclusions. The p–value is the smallest level of significance that would lead
to rejection of the null hypothesis (Ho). Usually, if α = 0.05 and the p-value ≤ 0.05, then reject
the null hypothesis and conclude that there is a significant difference. But, if α = 0.05 and the
p-value > 0.05, then fail to reject the null hypothesis and conclude that there is not a significant
difference.
Determine the practical conclusions. Restate the practical conclusion to reflect the impact in
terms of cost, return on investment, technical, etc. Remember, statistical significance does not
imply practical significance.

Tests for means, variances, and

and proportions
Confidence Intervals for the Mean - The confidence interval for the mean for continuous data with
large samples is
where, is the normal distribution value for a desired confidence level. If a relatively small
sample is used (<30) then the t distribution must be used. The confidence interval for the mean for
continuous data with small samples is
The distribution value for a desired confidence level, , uses (n - 1) degrees of freedom.
Confidence Intervals for Variation - The confidence intervals for variance is based on the Chi-
Square distribution. The formula is
S2 is the point estimate of variance and are the chi-square table values for (n - 1)
degrees of freedom
Confidence Intervals for Proportion - For large sample sizes, with n(p) and n(1-p) greater than or
equal to 4 or 5, the normal distribution can be used to calculate a confidence interval for
proportion. The following formula is used
is the appropriate confidence level from a Z table.
Population Variance - The confidence interval equation is as
Population Standard Deviation - The equation is

The statistical tests for means usually are

One-sample Z-test: for population mean
Two-sample Z-test: for population mean
One-sample T-test: single mean (one sample versus historic mean or target value)
Two-sample T-test : multiple means (sample from each of the two categories)
One-
One-Sample Z- Z-Test for Population Mean - The One-sample Z-test for population mean is used
when a large sample (n≥ 30) is taken from a population and we want to compare the mean of the
population to some claimed value. This test assumes the population standard deviation is known
or can be reasonably estimated by the sample standard deviation and uses the Z distribution. Null
hypothesis is Ho: µ = µ0 where, µ0 is the claim value compared to the sample.
Two-
Two-Sample Z- Z-Test for Population Mean - The Two-sample Z-test for population mean is used
after taking 2 large samples (n≥30) from 2 different populations in order to compare them. This
test uses the Z-table and assumes knowing the population standard deviation, or estimated by using
the sample standard deviation. Null hypothesis is Ho: µ1= µ2
One-
One-Sample T-T-test - The One-sample T-test is used when a small sample (n< 30) is taken from a
population and to compare the mean of the population to some claimed value. This test assumes
the population standard deviation is unknown and uses the t distribution. The null hypothesis is
Ho: µ = µ0 where µ0 is the claim value compared to the sample. The test statistic is
where x is the sample mean, s is the sample standard deviation and n is the sample size.
Two-
Two-Sample T- T-test - The Two-sample T-test is used when two small samples (n< 30) are taken
from two different populations and compared. There are two forms of this test: assumption of
equal variances and assumption of unequal variances. The null hypothesis is Ho: µ1= µ2. The Test
Statistic with assumption of equal variances is
Where pooled variance is
Taking assumption of unequal variances then the test statistic is as

Paired-
Paired-comparison tests
They are powerful ways to compare data sets by determining if the means of the paired samples
are equal. Making both measurements on each unit in a sample allows testing on the paired
differences. An example of a paired comparison is two different types of hardness tests conducted
on the same sample.
Once paired, a test of significance attempts to determine if the observed difference indicates
whether the characteristics of the two groups are the same or different. A paired comparison
experiment is an effective way to reduce the natural variability that exists among subjects and tests
the null hypothesis that the average of the differences between a series of paired observations is
zero.
2 Mean, Equal Variance, t Test - Tests the difference between 2 sample means ( vs ) when
σ1 and σ2 are unknown but considered equal.
Paired t Test - In paired t-test for two population means each paired sample consists of a member
of one population and that member’s corresponding member in the other population. It tests the
difference between 2 sample means. Data is taken in pairs with the difference calculated for each
pair. H0: µ1 = µ2 and H1: µ1 ≠ µ2 and DF = n - 1. A paired t test is always a two tail test where
d = average of differences of pairs of data.
The paired method (dependent samples), compared to treating the data as two independent
samples, will often show a more significant difference because the standard deviation of the d’s
(Sd) includes no sample to sample variation. This comparatively more frequent significance occurs
despite the fact that “n - 1” represents fewer degrees of freedom than “n1 + n2 -2.” The paired t
test is a more sensitive test than the comparison of two independent samples.
The steps are

Establish the hypotheses - The established hypothesis test is a two-tail test when Ha is a
statement of does not equal (≠), left-tail test when Ha has the < sign and right-tail test when Ha
has the > sign
Calculate the test statistic which is

Determine the critical value - Find the critical value using degrees freedom of n-1.
Draw the statistical conclusion, if tcalc > tcritical or tcalc < - tcritical, reject the null hypothesis.
Otherwise, do not reject the null hypothesis.
Paired F Test
Test - If independent random samples are drawn from two normal populations with
equal variances, the ratio of (s1)2/(s2)2 creates a sampling distribution known as the F distribution.
The hypothesis tests for comparing a population variance, , with another population variance,
, are given as
The number of degrees of freedom associated with and are represented by v1 and v2. v1 is
the DF in the numerator. The F statistic is the ratio of two sample variances (two chi-square
distributions) and is
Where and are sample variances.
Goodness of Fit - GOF (Goodness of Fit) tests are part of a class of procedures that are structured
in cells. In each cell there is an observed frequency, ( ). In the goodness-of-fit tests, one is
comparing and observed (O) frequency distribution to an expected (E) frequency distribution. The
relationship is statistically described by a hypothesis test
Ho: Random variable is distributed as a specific distribution with given parameters.

Ha: Random variable does not have the specific distribution with given parameters.
The formula for calculating the chi-square test statistic for this one-tail test is
We can calculate the expected or theoretical frequency, (Fe). Chi Square ( X2) is then summed
across all cells as
Single-
Single-factor analysis of variance (ANOVA)
Sometimes it is essential to compare three or more population means at once with the assumptions
as the variance is the same for all factor treatments or levels, the individual measurements within
each treatment are normally distributed and the error term is considered a normally and
independently distributed random effect. With analysis of variance, the variations in response

measurement are partitioned into components that reflect the effects of one or more independent
variables. The variability of a set of measurements is proportional to the sum of squares of
deviations used to calculate the variance .
Analysis of variance partitions the sum of squares of deviations of individual measurements from
the grand mean (called the total sum of squares) into parts: the sum of squares of treatment means
plus a remainder which is termed the experimental or random error.
ANOVA is a technique to determine if there are statistically significant differences among group
means by analyzing group variances. An ANOVA is an analysis technique that evaluates the
importance of several factors of a set of data by subdividing the variation into component parts.
ANOVA tests to determine if the means are different, not which of the means are different Ho:
µ1= µ2= µ3 and Ha: At least one of the group means is different from the others.
ANOVA extends the Two-sample t-test for testing the equality of two population means to a more
general null hypothesis of comparing the equality of more than two means, versus them not all
being equal.
One-
One-Way ANOVA
Terms used in ANOVA

Degrees of Freedom (df) - The number of independent conclusions that can be drawn from
the data.
SSFactor - It measures the variation of each group mean to the overall mean across all groups.
SSError - It measures the variation of each observation within each factor level to the mean of
the level.
Mean Square Error (MSE) - It is SSError/ df and is also the variance.
F-test statistic - The ratio of the variance between treatments to the variance within treatments =
MS/MSE. If F is near 1, then the treatment means are no different (p-value is large).
P-value - It is the smallest level of significance that would lead to rejection of the null
hypothesis (Ho). If α = 0.05 and the p-value ≤ 0.05, then reject the null hypothesis and
conclude that there is a significant difference and if α = 0.05 and the p-value > 0.05, then fail to
reject the null hypothesis and conclude that there is not a significant difference.
One-way ANOVA is used to determine whether data from three or more populations formed by
treatment options from a single factor designed experiment indicate the population means are
different. The assumptions in using One-way ANOVA is all samples are random samples from
their respective populations and are independent, distributions of outputs for all treatment levels
follow the normal distribution and equal or homogeneity of variances.
Steps for computing one-way ANOVA are
Establish the hypotheses. Ho: µ1= µ2= µ3 and Ha: At least one of the group means is different
from the others.
Calculate the test statistic. Calculate the average of each call center (group) and the average of
the samples.

Calculate SSFactor as
Calculate SSError as
Calculate SSTotal as
Calculate the ANOVA table with degrees of freedom (df) is calculated for the group, error and
total sum of squares.
Determine the critical value. Fcritical is taken from the F distribution table.
Draw the statistical conclusion. If Fcalc< Fcritical fail to reject the null hypothesis and if Fcalc >
Fcritical, reject the null hypothesis.
Chi square
Usually the objective of the project team is not to find the mean of a population but rather to
determine the level of variation of the output like to know how much variation the production
process exhibits about the target to see what adjustments are needed to reach a defect-free process.
If the means of all possible samples are obtained and organized we can derive the sampling
distribution of the means similarly for variances, the sampling distribution of the variances can be
known but, the distribution of the means follows a normal distribution when the population is
normally distributed or when the samples are greater than 30, the distribution of the variance
follows a Chi square (χ2) distribution. As the sample variance is computed as
Then the χ2 formula for single variance is given as
The shape of the χ2 distribution resembles the normal curve but it is not symmetrical, and its
shape depends on the degrees of freedom. The χ2 formula can be rearranged to find σ2. The
value σ2 with a degree of freedom of n−1, will be within the interval

The chi-square test compares the observed values to the expected values to determine if they are
statistically different when the data being analyzed do not satisfy the t-test assumptions. The chi-
square goodness-of-fit test is a non-parametric test which compares the expected frequencies to the
actual or observed frequencies. The formula for the test is
with fe as the expected frequency and fa as the actual frequency. The degree of freedom will be
given as df= k − 1. Chi-square cannot be negative because it is the square of a number. If it is equal
to zero, all the compared categories would be identical, therefore chi-square is a one-tailed
distribution. The null and alternate hypotheses will be df= k − 1.
Chi-square cannot be negative because it is the square of a number. If it is equal to zero, all the
compared categories would be identical, therefore chi-square is a one-tailed distribution. The null
and alternate hypotheses will be H0: The distribution of quality of the products after the parts were
changed is the same as before the parts were changed. H1: The distribution of the quality of the
products after the parts were changed is different than it was before they were changed.

5. IMPROVE
IMPROVE AND CONTROL
This phase uses the tools and methods for determining and verifying the sources of variation (input
variables - x) as identified by well-designed experiments who input variables to demonstrate the Y =
f(x) relationship, where Y is a dependent variable and a function of x. Using a design of
experiments (DOE) approach produces information on the relationships between factors so that
project team improves the process.
5.1. Design of Experiments (DOE)

It is a method of varying a number of input factors simultaneously in a planned manner, so that
their individual and combined effects on the output can be identified. It develops well-designed
efforts to identify which process changes yield the best possible results for sustained improvement
as mostly experiments address only one factor at a time, the Design of Experiments (DOE)
method focuses on multiple factors at one time. It provides the data that illustrates the significance
to the output of input variables acting alone or interacting with one another. Various DOE
advantages include evaluation of multiple factors simultaneously, controlling of input factors to
make the output insensitive to noise factors, experiments highlight important factors, and there is
confidence in the conclusions drawn. the factors can easily be set at the optimum levels and quality
and reliability can be improved without cost increase or cost savings can be achieved.
Basic terms
Basic DOE terms are
Factor - A predictor variable that is varied with the intent of assessing its effect on a response
variable. Most often referred to as an "input variable."
Factor Level - It is a specific setting for a factor. In DOE, levels are frequently set as high and
low for each factor. A potential setting, value or assignment of a factor of the value of the
predictor variable like, if the factor is time, then the low level may be 10 minutes and the high
level may be 30 minutes.
Response variable - A variable representing the outcome of an experiment. The response is
often referred to as the output or dependent variable.
Treatment - The specific setting of factor levels for an experimental unit. For example, a level
of temperature at 65° C and a level of time at 45 minutes describe a treatment as it relates to an
output of yield.
Experimental error - An error from an experiment reveals variation in the outcome of identical
tests. The variation in the response variable beyond that accounted for by the factors, blocks,
or other assignable sources while conducting an experiment.
Experimental run - A single performance of an experiment for a specific set of treatment
conditions.
Experimental unit - The smallest entity receiving a particular treatment, subsequently yielding a
value of the response variable.
Predictor Variable - A variable that can contribute to the explanation of the outcome of an
experiment. Also known as an independent variable.
Repeated Measures - The measurement of a response variable more than once under similar
conditions. Repeated measures allow one to determine the inherent variability in the
measurement system. Repeated measures are known as "duplication" or 'repetition."

Replicate - A single repetition of the experiment.

Replication - Performance of an experiment more than once for a given set of predictor
variables. Each of the repetitions of the experiment is called a "replicate." Replication differs
from repeated measures in that it is a repeat of the entire experiment for a given set of
predictor variables, not just repeat of measurements of the same experiment.
Replication increases the precision of the estimates of the effects in an experiment. Replication
is more effective when all elements contributing to the experimental error are included. In
some cases replication may be limited to repeated measures under essentially the same
conditions. In other cases, replication may be deliberately different, though similar, in order to
make the results more general.
Repetition - When an experiment is conducted more than once, repetition describes this event
when the factors are not reset. Subsequent test trials are run again but not necessarily under the
same conditions.
Blocking - When structuring fractional factorial experimental test trials, blocking is used to
account for variables that the experimenter wishes to avoid. A block may be a dummy factor
which doesn’t interact with the real factors.
Box-Behnken - When full second-order polynomial models are to be used in response surface
studies of three or more factors, Box- Behnken designs are often very efficient. They are highly
fractional, three-level factorial designs.
Confounded - When the effects of two factors are not separable. As in the image below
A is confounded with BC, B with AC, and C with AB

Correlation Coefficient (r) - A number between -1 and 1 that indicates the degree of linear
relationship between two sets of numbers. Zero (0) indicates no linear relationship.
Covariates - Things which change during an experiment which had not been planned to
change, such as temperature or humidity. Randomize the test order to alleviate this problem.
Record the value of the covariate for possible use in regression analysis.
Degrees of Freedom - The term used is DOF, DF, df or v. The number of measurements that
are independently available for estimating a population parameter.
EVOP - It stands for evolutionary operation, a term that describes the way sequential
experimental designs can be made to adapt to system behavior by learning from present results
and predicting future treatments for better response.
First-order - It refers to the power to which a factor appears in a model. If “X1” represents a
factor and “B” is its factor effect, then the model Y = B0 + B1X1 + B2X2 + ,is first-order in
both X1 and X2.
Fractional - An adjective that means fewer experiments than the full design.
Full Factorial - It describes experimental designs which contain all combinations of all levels of
all factors. No possible treatment combinations are omitted.
Interaction - It occurs when the effect of one input factor on the output depends upon the level
of another input factor.

Level - It is a given factor or a specific setting of an input factor like three levels of a heat
treatment may be 100°C, 120°C and 150°C.
Main Effect- An estimate of the effect of a factor independent of any other factors.
Mixture Experiments - They are experiments in which the variables are expressed as
proportions of the whole and sum to 1.0.
Nested Experiments - An experimental design in which all trials are not fully randomized.
Optimization - It involves finding the treatment combinations that gives the most desired
response. Optimization can be maximization or minimization
Orthogonal - It is a design is orthogonal if the main and interaction effects in a given design can
be estimated without confounding the other main effects or interactions.
Paired Comparison - The basis of a technique for treating data so as to ignore sample-to-
sample variability and focus more clearly on variability caused by a specific factor effect. Only
differences in response for each sample are tested because sample-to-sample differences are
irrelevant.
Fixed Effects Model - If the treatment levels are specifically chosen by the experimenter, then
conclusions reached will only apply to those levels.
Random Effects Model – If the treatment levels are randomly chosen from a population of
many possible treatment levels, then conclusions reached can be extended to all treatment
levels in the population.
Residual Error () or (E) - The difference between the observed and the predicted value for that
result, based on an empirically determined model. It can be variation in outcomes of virtually
identical test conditions.
Residuals - The difference between experimental responses and predicted model values.
Resolution - A fractional factorial design in which no main effects are confounded with each
other but the main effects and two factor interaction effects are confounded.
Response Surface Methodology (RSM) - The graph of a system response plotted against
system factors. RSM employs experimental design to discover the “shape” of the response
surface and uses geometric concepts to take advantage of the relationships.
Experiments can be designed to meet a wide variety of experimental objectives as
Fixed-effects model - An experimental model where all possible factor levels are studied. For
example, if there are three different materials, all three are included in the experiment.
Random-effects model - An experimental model where the levels of factors evaluated by the
experiment represent a sample of all possible levels. For example, if we have three different
materials but only use two materials in the experiment.
Mixed model - An experimental model with both fixed and random effects.

Completely randomized design - An experimental plan where the order in which the
experiment is performed is completely random.
The steps for conducting DOE are
Gain complete knowledge of inputs and outputs with a process flow diagram or process map.
Finalize the output measure usually variable measure is taken and attribute measures avoided
with stable and repeatable measurement system.
Develop design matrix for the factors under investigation showing all possible combinations of
high and low levels for each input factor.
Obtain extreme high and low levels for every input to investigate.
Enter the factors and levels for the experiment into the design matrix.
Perform each experiment and record the results.
Compute the effect of a factor by averaging the data collected at the low level and subtracting it
from the average of the data collected at the high level.
ANOVA is a basic step in the DOE that is a formidable tool for decision-making based on data
analysis. The types of ANOVA that are more commonly used are
The completely randomized experimental design, or one-way ANOVA. One-way ANOVA
compares several (usually more than two) samples’ means to determine if there is a significant
difference between them.
The factorial design, or two-way ANOVA, which takes into account the effect of noise factors.
Design Guidelines for DOE are as
Number of Comparative Objective Screening Objective Response Surface

Factors Objective
1 1-factor completely _ _
randomized design
2-4 Randomized block design Full or fractional Central composite or
factorial Box-Behnken
5 or more Randomized block design Fractional factorial or Screen first to reduce
Plackett-Burman number of factors
Main effects
Randomized Block Plans - When each homogeneous group in the experiment contains exactly
one measurement on every treatment, the experimental plan is called a randomized block plan.
For example, an experimental scheme may take several days to complete. If we expect some
biasing differences among days, we might plan to measure each item on each day, or to conduct
one test per day on each item. A day would then represent a block. A randomized Incomplete
block (tension response) design is shown below.

Latin Square Designs - A Latin square plan is useful to allow for two sources of non-homogeneity
in the conditions affecting test results. A third variable, the experimental treatment, is then applied
to the source variables in a balanced fashion. A Latin square design is essentially a fractional
factorial experiment, restricted by two conditions
The number of rows, columns and treatments must be the same.

There should be no expected interactions between row and column factors.
For example, 5 automobiles, 5 carburetors are used to evaluate gas mileage by five drivers in the 5
x 5 Latin square
Full and Fractional Factorial - A full factorial is an experimental design which contains all levels of
all factors. No possible treatments are omitted. A fractional factorial is a balanced experimental
design which contains fewer than all combinations of all levels of all factors. Listed below are full
and half fractional factorial designs for 3 factors at two levels
The half fractional factorial also requires an equal number of plus and minus signs in each column.

Taguchi Designs - The Taguchi philosophy emphasizes two tenets of reducing the variation of a
product or process which reduces the loss to society and using a proper development strategy to
intentionally reduce variation.
Orthogonal Arrays Degrees of Freedom - Let df = Degrees of Freedom, k = number of factor

levels then for factor A, dfA = kA - 1 and for factor B, dfB = kB – 1. For A x B interaction, dfAB =
dfA x dfB. dfmin= ∑df all factors + ∑df all interactions of interest.
The simplest orthogonal array (OA) is an L4 (four trial runs). Factors A and B can be assigned to
any two of the three columns. The remaining column is the interaction column. The effects of
factors A, B, AxB are determined for each column. Then calculate SST, SSA, SSB, SSAXB and
SSe. A standard ANOVA table can now be set up to determine factor significance at a selected
alpha value.
If the two factors are assigned to columns 1 and 2, the interaction will be in column 3. The L4
triangular Table shows that if the two factors are put in columns 1 and 3, the other point of the
triangle for the interaction is in column 2.
Design resolution is the degree of confounding in two-level fractional screening designs. That is,
the degree to which factors is entangled so that one cannot separate their effects. Resolution II
designs will have some main effects confounded with some other main effects. Resolution III
designs do not confound main effects with each other, but do confound main effects with two-
factor interactions. Resolution IV designs do not confound main effects and two-factor interactions,
but do confound two-factor interactions with other two-factor interactions. Resolution V designs do
not confound main effects and three-factor interactions, but do confound two-factor interactions
with higher-order interactions.
A Three Factor, Three Level Experiment - Often a three factor experiment is required after
screening a larger number of variables. These experiments may be full or fractional factorial.
Shown below is a 1/3 fractional factorial design. Generally the (-) and (+) levels in two level designs
are expressed as 0 and 1 in most design catalogues. Three level designs are often represented as 0,
1 and 2.

The Factorial Design with Two Factors - It tests several treatments simultaneously with the level of
very treatment is tested for all treatments. For example the heat generated by a generator depends
on its RPM and on the time it is operating. Samples taken while two generators are running for
four hours are summarized in table below.
Hours 500RPM 550RPM 600RPM

1 65 80 84
65 81 85
2 75 83 85
80 85 86
3 80 86 90
85 87 90
4 85 89 92
88 90 92
With the two-way ANOVA, all the RPMs for every time-frame are observed as well as the
timeframes (hours) for every RPM. The row effects and the column effects are called the main
effects, and the combined effects of the rows and columns is called the interaction effect.
In this example, a table's cell is the intersection between an RPM (column) and a length of time
(row) and has two observations. Cell 1 is comprised of observations (65, 65). So we have three cells
per row and four rows, which make a total of 12 cells. As the two-way ANOVA is also a hypothesis
test but with three hypotheses
The first hypothesis will stipulate that there is no difference between the means of the RPM
treatments. H0: µr 1 = µr 2 = µr 3 where µ1, µ2, and µ3 are the means of the RPM treatments.
The second hypothesis will stipulate that the number of hours that the generators operate does
not make any difference on the heat. H0: µh1 = µh2 = µh3 where µh1, µh2, and µh3 are the
means of the hours that generators were operating.
The third stipulation will be that the effect of the interaction of the two main effects (RPM and
time) is zero. If the interaction effect is significant, a change in one treatment will have an effect
on the other treatment. If the interaction is very important, we say that the two treatments are
confounded.
Now, the formulas are used and mathematically the problem is solved with the results being
verified. Rejection of Null hypothesis - It involves the assessment the sources of the variations from
the means. As for the example, all the observations are not identical; they range between 65 and 92

degrees. The means of the different main factors (the different RPMs and the different
timeframes) are not identical, either. For a confidence level of 95 percent (an α level of 0.05),
ANOVA seeks to determine the sources of the variations between the main factors. If the sources
of the variations are solely within the treatments (in this case, within the columns or rows), we
would not be able to reject the null hypothesis. If the sources of variations are between the
treatments, we reject the null hypothesis.
The formulas for the sums of squares (or deviations from means) to solve a two-way ANOVA with
interaction are given as
where lSS is the sum of squares for the rows, tSS is the sum of squares for the treatments, ISS is
the sum of squares for the interactions, SSE is the error of the sum of squares, TSS is the total sum
of squares, n is the number of observed data in a cell ( n = 2),t is the number of treatments, l is the
number of row treatments, i is the number of treatment levels, j is the column treatment levels, k is
the number of cells, Xijk is any observation, Xij is the cell mean, Xi is the level mean, Xj is the
treatment mean, and X is the mean of all the observations. The table with the means of the
numbers is added and the final table is as

1 65 80 84
65 81 85 76.66667


2 75 83 85
80 85 86 82.33333
3 80 86 90
85 87 90 86.33333
4 85 89 92
88 90 92 89.33333
The F-table with F-statistic is as
Sources of variation Sums of squares Degrees of Mean F-statistic F-critical

freedom square
RPM 435.45 2 217.79 76.86 3.89
TIME 540 3 180 63.53 3.49
Interaction Time RPM 147.75 6 24 8.69 3
Error 34 12 2.833
Total 1157.333 23
To reject the null hypothesis, compare the F-statistic to the F-critical value. If the F-critical value
(the one on the F-table) is greater than the F-statistic (the calculated), do not reject the null
hypothesis else, do it. In this case, the F-statistics for all the main factors and interaction are greater
than their corresponding F-critical values, so reject the null hypotheses. The length of time the
generators are operating, the RPM variations, and the interaction of RPM and time have an impact
on the heat that the generators produce. But after determining the interaction between the two
main factors is significant, it is unnecessary to investigate the main factors.
Determination of the significance of results is based on the P-value. In this example, all the P-
values are infinitesimal, much lower than 0.05, which confirms the conclusion made earlier i.e. the
RPMs, the time, and interactions have an effect on the heat produced by the generators. But after

knowing that the interaction between the two main factors is significant, it is unnecessary to
investigate the main factors.
Factorial (2k) Designs - They are experiments involving several factors ( k = # of factors) where it is
necessary to study the joint effect of these factors on a specific response. Each of the factors are set
at two levels (a “low” level and a “high” level) which may be qualitative (machine A/machine B, fan
on/fan off) or quantitative (temperature 800/temperature 900, line speed 4000 per hour/line speed
5000 per hour).
Factors are assumed to be fixed (fixed effects model) and designs are completely randomized
(experimental trials are run in a random order, etc.). The usual normality assumptions are
satisfied. Further, particularly useful in the early stages of experimental work when likely to have
many factors being investigated and to minimize the number of treatment combinations (sample
size) but, at the same time, study all k factors in a complete factorial arrangement (the experiment
collects data at all possible combinations of factor levels). As k gets large, the sample size will
increase exponentially. If experiment is replicated, the # runs again increases.
k No of runs
2 4
3 8
4 16
5 32
Two factors set at two levels (normally referred to as low and high) would result in the following
design where each level of factor A is paired with each level of factor B.
Generalized Settings Orthogonal Settings

RUN Factor A Factor B Response RUN Factor A Factor B Response
1 low low y1 1 -1 -1 y1
2 high low y2 2 +1 +1 y2
3 low high y3 3 -1 +1 y3
4 high high y4 4 +1 -1 y4
Estimating main effects associated with changing the level of each factor from low to high. This is
the estimated effect on the response variable associated with changing factor A or B from their low
to high values as
If neither factor A nor factor B have an effect on the response variable.

If factor A has an effect on the response variable, but factor B does not.
If factor A and factor B have an effect on the response variable.
If factor B has an effect on the response variable, but only if factor A is set at the “High” level. This
is called interaction and it basically means that the effect one factor has on a response is dependent
on the level, being set other factors at. Interactions can be major problems in a DOE if fail to
account for the interaction when designing experiment.

5.2. Statistical Process Control (SPC)

Statistical process control (SPC) is a technique for applying statistical analysis to measure, monitor,
and control processes.
Objectives and benefits
benefits
The major component of SPC is the use of control charting methods. It had been pioneered by
Walter Shewhart in the 1920s and later enhanced by W. Edwards Deming, statistical process
control (SPC)is a statistical method for measuring, monitoring, controlling, and improving a
process. The basic rule of SPC is to leave the variations from common causes to chance, but to
identify and eliminate special causes. Since all processes are subject to variation, SPC relies on the
statistical evidence instead of on intuition.
SPC focuses on optimizing continuous improvement by using statistical tools for analyzing data,
making inferences about process behavior, and then making appropriate decisions. Variation is
defined as "a change in the process data; a characteristic or a function that results from some
cause." Statistical process control begins with the recognition that all processes contain variation.
No matter how consistent the production appears to be, measurement of the process data will
indicate a level of dispersion or variability in the data. The management and improvement of
variation are at the very heart of the strategy of statistical process control.
The basic assumption made in SPC is that all processes are subject to variation. This variation may
be classified as one of two types, random or chance cause variation and assignable cause variation.
Benefits of statistical process control include the ability to monitor a stable process and identify if
changes occur that are due to factors other than random variation. When assignable cause
variation does occur, the statistical analysis facilitates identification of the source so that it may be
eliminated. The objectives of statistical process control are to determine process capability,
monitor processes and identify whether the process is operating as expected or whether the
process has changed and corrective action is required.
The objectives of SPC are

Using the data generated by the process, called the “voice of the process,” to inform the Six
Sigma Black Belt and team members when intervention is or is not required.
Reducing variation, increase knowledge about a process and steer the process in the desired
way.
Detecting quickly the occurrence of special causes of process shifts so that investigation of the
process and corrective action may be undertaken before many nonconforming (defective) units
are manufactured.
SPC accrues various benefits as it

Monitor processes for maintaining control
Detect special causes
Serve as decision-making aids
Reduce the need for inspection
Increase product consistency
Improve product quality
Decrease scrap and rework

Increase production output

Streamline processes
Interpretation of control charts may be used as a predictive tool to indicate when changes are
required prior to production of out of tolerance material. As an example, in a machining
operation, tool wear can cause gradual increases or decreases in a part dimension. Observation of
a trend in the affected dimension allows the operator to replace the worn tool before defective
parts are manufactured. An additional benefit of control charts is to monitor continuous
improvement efforts. When process changes are made which reduce variation, the control chart
can be used to determine if the changes were effective. Costs associated with SPC include the
selection of the variable(s) or attribute(s) to monitor, setting up the control charts and data
collection system, training of operators, and investigation and correction when data values fall
outside control limits.
The basic rule of SPC is that variation from common causes (controlled) should be left to chance,
but special causes (uncontrolled) should be identified and eliminated. Shewhart called the causes
“common” and “assignable” respectively however, the terms common and special are more
frequently used today.
Selection of Variable - The risk of charting many parameters is that the operator will spend so
much time and effort completing the charts, that the actual process becomes secondary. When a
change does occur, it will most likely be overlooked. In the ideal case, one process parameter is
the most critical, and is indicative of the process as a whole. Some specifications identify this as a
critical to quality (CTQ) characteristic. CTQ may also be identified as a key characteristic.
Key process input variables (KPIVs) may be analyzed to determine the degree of their effect on a
process. For some processes, an input variable such as temperature may be so significant that
control charting is mandated. Key process output variables (KPOVs) are candidates both for
determining process capability and process monitoring using control charting.
Design of experiments (DOE) and analysis of variance (ANOVA) methods may also be used to
identify variable(s) that are most significant to process control.
Because of the Improve Phase of the DMAIC process, the project team has implemented
improvements to the variables or inputs (Xs) in the process causing variation in the output (Y).
Once these improvements are in place, it is important to monitor the process. Select statistically
and practically significant variables for monitoring that are critical to quality (CTQ) when
establishing control charts. It is possible to monitor multiple variables using separate control charts.
Common causes - Common causes are sources of process variation that are inherent in a process
over time. A process that has only common causes operating is said to be in statistical control. A
common cause is sometimes referred to as a "chance cause" or "random cause". Examples includes
variation in raw material, variation in ambient temperature and humidity, variation in electrical or
pneumatic sources, variation within equipment (worn bearings) or variation in the input data

Special causes - Special causes or assignable causes are sources of process variation (other than
inherent process variation) periodically disrupting the process. A process that has special causes
operating is said to lack statistical control. Examples include tool wear, large changes in raw
materials or broken equipment.
Type I SPC Error - It occurs when we treat a behavior as a special cause when no change has
occurred in the process. It is also referred to as "over control".
Type II SPC Error - Occurs when we do not treat a behavior as a special cause when in fact it is a
special cause. It is also referred to as under control.
Defect-
Defect An undesirable result on a product; also known as "a nonconformity".
Defective -An entire unit failing to meet specifications; also known as "a nonconformance".
Rational sub-
sub-grouping
Rational sub-grouping is a subset defined by a specific factor. As a sample with variations caused by
conditions producing random effects, the rational subgroup identifies and separates variations by
special causes. Rational subgroups are our attempt to be sure that we are asking the right questions
about the data. Selecting the appropriate control chart to use depends on the subgroups.
A control chart provides a statistical test to determine if the variation from sample to sample is
consistent with the average variation within the sample.Generally, subgroups are selected in a way
that makes each subgroup as homogeneous as possible. This provides the maximum opportunity
for estimating expected variation from one subgroup to another. In production control charting, it
is very important to maintain the order of production. Data from a charted process, which shows
out of control conditions, may be mixed to create new - R charts which demonstrate remarkable
control. By mixing, chance causes are substituted for the original assignable causes as a basis for
the differences among subgroups.
Sub-grouping Schemes - Where order of production is used as a basis for sub-grouping, two
fundamentally different approaches are possible. The subgroup consists of product all produced as
nearly as possible at one time. The subgroup consists of product intended to be representative of
all the production over a given period of time.
The second method is sometimes preferred where one of the purposes of the control chart is to
influence decisions on acceptance of product. In most cases, more useful information will be
obtained from, five subgroups of 5th an from one subgroup of 25. In large subgroups, such as 25,
there is likely to be too much opportunity for a process change within the sub-group.
The steps for sub-grouping are
Select the measurement

Identify the best data to track.
Focus on the vital few, not the trivial many.
Select the best data for a few charts.
Produce elements of the subgroup in closely similar identical ways.

Identify number of subgroups

Establishing rational subgroups is important for dividing observations.
Compute statistics for each subgroup separately before plotting on the control chart.
Desire a minimal chance for variations within each subgroup
Sources of Variability - The long term variation in a product will, for convenience, be termed the
product (or process) spread. One of the objectives of control charting is to markedly reduce the
lot-to-lot variability. The distribution of products flowing from different streams may produce
variability’s greater than those of individual streams. It may be necessary to analyze each stream-to-
stream entity separately. Another main objective of control charting is to reduce time-to-time
variation.
Physical inspection measurements taken at different points on a given unit are referred to as
within-piece variability. Another source of variability is the piece-to piece variation. Often, the
inherent error of measurement is significant. This error consists of both human and equipment
components. The remaining variability is referred to as the inherent process capability.
For example the project team desires to monitor a process that manufactures PET (plastic) bottles
for the beverage industry. The bottles are injection-molded on a multi-cavity carousel. The
particular carousel contains 4 cavities and the team initially decides to take 3 bottles from each
cavity each hour and measure a critical characteristic.
Option 1 - Every hour, take 3 samples (subgroups) of 4 bottles (n= 4) at random. Plot the
process (on one chart).
Option 2 - Every hour, take 3 samples (subgroups) of 4 bottles (n= 4) or one bottle from each
cavity. Plot chart for process on 1 chart.
Option 3 - Every hour, take 4 samples (subgroups) and 3 bottles (n= 3) with each sample from
a different cavity. Plot each cavity on separate charts.
Selection and application of control charts
Control charts are the most powerful tools to analyze variation in most processes - either
manufacturing or administrative.Control charts were originated by Walter Shewhart in 1931 with a
publication called Economic Control of Quality of Manufactured Product.
Originated by Walter Shewhart, control charts are a type of graph for studying how a process
changes over time. By comparing data points to a central line average, with an upper control limit
(UCL) and lower control limit (LCL), users can note variation, track common causes, and seek
special causes. Alternative names are "statistical process control charts" and "Shewhart charts". Run
charts display data measures over time without the central line average and the limits.
Control charts using variables data are line graphs that display a dynamic picture of process
behavior. Control charts for attributes data require 25 or more subgroups to calculate the control
limits. A process which is under statistical control is characterized by plot points that do not exceed
the upper or lower control limits.When a process is in control, it is predictable.

Control Chart have various benefits as the addition of calculated control limits facilitates the ability
to detect special or assignable causes of variation, the current process is displayed and compared to
the improved process by identifying shifts in either average or variation and since every process
varies within predictable limits, identifying assignable causes and addressing them will save money.
Control charts are used to control ongoing processes by finding and correcting problems as they
occur, to predict the expected range of outcomes from a process, determine if a process is in
statistical control, differentiate variation from non-routine events or common causes and determine
whether the quality improvement should aim to prevent specific problems or make fundamental
process changes.
Types of control charts - Different types of control charts exist depending on the measurement
used and two basic categories are
Variable charts - It is constructed from variable data (data that consists of measurements like
weight, length, etc.). Variable data contains more information than data that simply qualifies or
counts something. Consequently, variable charts are some of the most powerful tools in quality
improvement. In it the samples are taken in 2-10 subgroups at predetermined intervals with the
statistic (mean, range, or standard deviation) calculated and recorded on the chart. Various
types of variable charts are
X - R Charts (when data is readily available)
Run Charts (limited single-point data)
M X - MR Charts (moving average/moving range)
X - MR Charts (I - MR, individual moving range)
X - S Charts (when sigma is readily available)
Median Charts
Short Run Charts
Attribute charts - It Uses attribute data(data that counts items, such as the number of rejects or
the number of errors). Control charts based on attribute data are generally less powerful and
sometimes more difficult to interpret than variable charts. Samples are taken from lots of
material where the number of defective units in the sample are counted (for p and np-charts)
or the number of individual defects are counted for a defined unit (c and u-charts). Various
types of attribute charts are
p Charts (for defectives - sample size varies)
np Charts (for defectives - sample size fixed)
c Charts (for defects - sample size fixed)
u Charts (for defects - sample size varies)
The structure of both types of control charts is similar, but the statistical construction of the control
limits is quite different due to the differences in the distributions in each.

X - R Chart – The X and R (average and range chart) is most widely used by many companies as
they implement statistical process control. These charts are very useful because they are sensitive
enough to detect early signals of process drift or target shift. It's main advantages are easy to
construct and interpret, information from data is needed to perform process capability studies,
when a process can be sufficiently monitored by collecting variable data in small subgroups and
can be sensitive to process changes and provide early warning; providing opportunity to act before
situation worsens. But it has the disadvantage of only being used when data is available to collect in
subgroups.
The CL is determined by averaging the X s as, X =( X 1 + X 2 + X n )/n where, n is the number of

samples. The UCL and the LCL are UCL= X + 3σ, CL= X and LCL= X + 3σ. The mean
range and the standard deviation for normally distributed data are linked as σ =R/d2 where, the
constant d2 is function of n. Various terms are used in this type of chart, which includes
n - Sample size (subgroup size)

X -A reading (the data)
X - Average of readings in a sample
X - Average of all the X s. It is the value of the central line on the chart.
R - The range.The difference between the largest and smallest value in each sample.
R - Average of all the Rs.It is the value of the central line on the R chart.
UCL/LCL - Upper and Lower/control limits - The control boundaries for 99.73% of the
population.They are not specification limits.
Steps for constructing X - R Charts

Determine the sample size (n = 3, 4, or 5) and the frequency of sampling.
Collect 20 to 25 sets of time - sequenced samples.
Calculate the average = X for each set of samples.
Calculate the range = R for each set of samples.
Calculate X (the average of all the X ’s). This is the center line of the chart.
Calculate R (the average of all the R’s). This is the center line of the R chart.
Calculate the control limits as
Plot the data and interpret the chart for special or assignable causes.

X-Bar and Sigma Charts - X-bar ( X ) and sigma (S) charts are often used for increased sensitivity
to variation (especially when larger sample sizes are used).The sample standard deviation (S)
formula is
The X chart is constructed in the same way as described earlier,except that sigma ( S ) is used for
the control limit calculations via the following formulas
The estimated standard deviation ( ) called sigma hat, can be calculated by
The A3, B3, B4 and C4 factors are based on sample size and are obtained from tables. The X
and s (average and standard deviation) chart is complex and not used extensively. It has the
advantage that, when the subgroup sizes are fairly large (greater than 10), it is often beneficial to
consider the average and standard deviation chart, since using the range as the measure of
dispersion may not yield a good estimate of process variability and it may also be used when more
sensitivity in detecting a process shift is desired, as in the case where the product being
manufactured is quite expensive and any change in the process could either cause quality problems
or add unnecessary costs. It has the disadvantage that it may issue false signals at a much higher
rate than other types of control charts and is complex to construct and use.

Median (X-
(X-tilde and R) Chart - The median control chart or X~ and R chart is calculated using the
same formulas as the X and R chart. The median control chart is different from the average and
range chart in that it is easier to use and requires fewer calculations because the median is plotted
rather than the average of the sample. Typically, the ease of using arithmetic is the advantage of
using a median chart. It is easy to use, shows the process variation, the median and the spread but
has the disadvantage of being less efficient, as exhibiting more variation than the X and R chart
and difficult to detect trends and other anomalies in the range.

Moving Range - It is because of the type of data available and the situation, various control charts
may be applicable. Given the unknowns of future projects and situations, the project team may
prefer to use the individual and moving range (X-MR, I-MR) control chart. The project team often
use this chart with limited data, such as when production rates are slow, testing costs are very high,
or there is a high level of uncertainty relative to future projects. It has also found use where data
are plentiful, such as in the case of automatic testing of every unit where no basis exists for
establishing subgroups.
M X -MR (moving average-moving range) charts are a variation of the X -R chart where data is less
readily available.There are several construction techniques, the one most sensitive to change is n =
3.Control limits are calculated using the X -R formulas and factors.
X-MR Chart - The individual and moving range chart (X-MR, I-MR) is applicable when the
sample size used for process monitoring is n= 1. X-MR charts are used in various applications like
the early stages of a process when one is not quite sure of the structure of the process data, when
analyzing every unit, slow production rates with long intervals between observations, when
differences in measurements are too small to create an objective difference, when measurements
differ only because of laboratory or analysis error or when taking multiple measurements on the
same unit (as thickness measurements on different places of a sheet).
Control charts plotting individual readings and a moving range may be used for short runs and in
the case of destructive testing.X-MR charts are also known as I -MR, individual moving range
charts.The formulas are as
The control limits for the range chart are calculated exactly as for the X -R chart. The X-MR chart
is the only control chart which may have specification limits shown. M X -MR charts with n = 3 is
recommended by the authors when information is limited.
An X-MR (individuals and moving range) chart is useful as, it is made with individual measures
(when the subgroup size is one). The X-MR chart is applicable to many different situations, since
there are many scenarios when the most obvious subgroup size is one (monthly data, etc.).
It has the advantage of being useful even in a situation with small amounts of data, easy to construct
and useful in the early stages of a new process when not much is known about the structure of the
data but it has the disadvantage as it cannot discern between common cause and special cause
variation.

Attribute Charts - Attributes are discrete, counted data, such as defects or no defects.Only one
chart is plotted for attributes.
Chart Records Subgroup size

p Fraction Defective Varies
np Number of Defectives Constant
c Number of Defects Constant
u Number of defects per unit Varies
Normally the subgroup size is greater than 50 (for p charts). The average number of
defects/defectives is equal to or greater than 4 or 5. The most sensitive attribute chart is the p chart.
The most sensitive and expensive chart is the X -R.
p-Charts - The p-chart is one of the most-used types of attribute charts. It shows the proportion of
defective items in successive samples of equal or varying size. Consider the proportion as the
number of defectives divided by the number in the sample. To develop the control limits for a p-
chart, consider the case where we are inspecting a variable sample size and recording the number
of nonconforming items in each sample.
The p-chart is used when dealing with ratios, proportions, or percentages of conforming or
nonconforming parts in a given sample. A good example for a p-chart is the inspection of products
on a production line. They are either conforming or nonconforming.The probability distribution
used in this context is the binomial distribution with p for the nonconforming proportion and q
(which is equal to 1 − p) for the proportion of conforming items. Because the products are only
inspected once, the experiments are independent from one another. The first step when creating a
p-chart is to calculate the proportion of nonconformity for each sample as p =m/b where, m

represents the number of nonconforming items, b is the number of items in the sample, and p is
the proportion of nonconformity. The mean proportion is computed as
where, k is the number of samples audited and pk is the kth proportion obtained. The control
limits of a p-chart are
The benefit of the p-chart is that the variations of the process change with the sizes of the samples
or the defects found on each sample.
np-
np-Charts - The np-chart, number of defective units, is related to the p-chart. The np-chart is a
control chart of the counts of nonconforming items (defectives) in successive samples of constant
size. The np-chart can be used in place of the p-chart to plot the counts of non-conforming items
(defectives) when there is a constant sample size. In effect, using np-charts involves converting from
proportions to a plot of the actual counts.
The np-chart is one of the easiest to build. While the p-chart tracks the proportion of
nonconformities per sample, the np-chart plots the number of nonconforming items per sample.
The audit process of the samples follows a binomial distribution—in other words, the expected
outcome is “good” or “bad,” and therefore the mean number of successes is np. The control limits
for an np-chart are
C-Chart - The c-chart is based on Poisson distribution and work with the count of individual
defects rather than numbers of defective units. It's formula assume counting the number of defects
in the same area of opportunity. The c in the formulas is the number of defects found in the
defined inspection unit, and that is plotted on the chart.
The c-chart monitors the process variations due to the fluctuations of defects per item or group of
items. The c-chart is useful for the process engineer to know not just how many items are not
conforming but how many defects there are per item. Knowing how many defects there are on a
given part produced on a line might in some cases be as important as knowing how many parts are
defective. Here, non-conformance must be distinguished from defective items because there can
be several nonconformities on a single defective item.
The probability for a nonconformity to be found on an item in this case follows a Poisson
distribution. If the sample size does not change and the defects on the items are fairly easy to
count, the c-chart becomes an effective tool to monitor the quality of the production process. If c is
the average nonconformity on a sample, the UCL and the LCL limits will be given as

U-Chart - It is also based on Poisson distribution and work with the count of individual defects
rather than numbers of defective units. With a u-chart, the number of inspection units may vary.
The u-chart requires an additional calculation with each sample to determine the average number
of defects per inspection unit. The n in the formulas is the number of inspection units in the
sample.
The sample sizes can vary when a u-chart is being used to monitor the quality of the production
process, and the u-chart does not require any limit to the number of potential defects. Further, for
a p-chart or an np-chart the number of nonconformities cannot exceed the number of items on a
sample, but for a u-chart it is conceivable because what is being addressed is not the number of
defective items but the number of defects on the sample. The first step in creating a u-chart is to
calculate the number of defects per unit for each sample as u = c/ n. where u represents the
average defect per sample, c is the total number of defects, and n is the sample size. Once all the
averages are determined, a distribution of the means is created and then the mean of the
distribution is to be computed as
where k is the number of samples. The control limits are determined based on u and the mean
of the samples, n as
Analysis of control charts

Interpreting control charts is a learned behavior based upon increased process knowledge. No
shortcuts exist to becoming competent at the skill of interpreting control charts and it is most
certainly not a skill learned without practice. The distinction between common and special causes
is critical in statistical process control. For Shewhart and Deming, this distinction is the distinction
between a process surrounded by "noise" and one sending a "signal."
Improving the process is the central goal of using control charts. Control charts provide a "voice of
the process" that enables a Black Belt to identify special causes of variation and remove them, thus
allowing for a stable and more consistent process. A control chart becomes a useful tool after initial
development. After establishing and basing the control limits on a stable, in-control process, charts
put in the work area allow operational personnel to monitor the process by collecting data and
plotting points on a regular basis. Personnel can act upon the signals from the chart when
conditions indicate the process is moving or has gone out of control.
Basic rules for control chart interpretation are

Specials are any points above the UCL or below the LCL.
A run violation is seven or more consecutive points on one side of the centerline.
A 1-in-20 violation is more than one point in twenty consecutive points close to control limits.
A trendviolation is any upward or downward movement of 5 or more consecutive points or
drifts of 7 or more points.
Process Stability
Stability - Before taking appropriate action, a SSBB must identify the state the process. A
process can occupy one of 4 states as

Ideal state - A predictable process fully meeting the requirements.

Threshold state - A predictable process that is not always meeting the requirements.
Brink of chaos - An unpredictable process currently meeting the requirements.
State of chaos - An unpredictable process that is currently not meeting the requirements.
Out-
Out-of-
of-control - If a process is “out-of-control,” then special causes of variation are present in
either the average chart or range chart, or both. These special causes must be found and
eliminated in order to achieve an in-control process. A process out-of-control is detected on a
control chart either by having any points outside the control limits or by unnatural patterns of
variability.
Usually the following conditions are based on Western Electric Rules for out of control, though
the lists of conditions may vary depending on the resource used.
1 point more than 3σ from the center line (either side)

9 points in a row on the same side of the center line
6 points in a row, all increasing or decreasing
14 points in a row, alternating up and down
2 out of 3 points more than 2σ from the center line (same side)
4 out of 5 points more than 1σ from the center line (same side)
15 points in a row within 1σ from the center line (either side)
8 points in a row more than 1σ from the center line (either side)
The Pre-
Pre-control Technique - Pre-control was developed by a group of consultants (including
Dorin Shainin) in an attempt to replace the control chart.Pre-control is most successful with
processes which are inherently stable and not subject to rapid process drifts once they are set up. It
can be shown that 86% of the parts will be inside the P-C lines with 7% in each of the outer
sections, if the process is normally distributed and Cpk= 1.
The chance that two parts in a row will fall outside either P-C line is 1/7 times 1/7, or 1/49.This
means that only once in every 49 pieces can we expect to get two pieces in a row outside the P-C
lines just due to chance.

Pre-control is a simple algorithm based on tolerances which is used for controlling a process. Pre-
control is a method of detecting and preventing failures and assumes the process is producing a
measurable product, with varying characteristics according to some distribution. Pre-control zones
include halfway between the target and each specification limit. Each zone between the lines has
colors resembling a traffic signal with green (acceptable), yellow (alert), and red (unacceptable).
The Pre-control utilizes process capability limits instead of specification limits to set the green,
yellow, and red zones and is therefore considered more robust than the traditional use of pre-
control charts. The limits of each zone are calculated based on the distribution of the characteristic
measured, not on the tolerances. Units that fall in the yellow or red zones trigger an alarm before
defects are produced. Pre-control rules are as follows
Rule 1 - If two parts are in the green zone, take no action – continue to run.
Rule 2 - If the first part is in the green or yellow zones, then check the second part. If second
part is in the green zone, then continue to run. If first part is in the yellow zone and the second
part is also in the yellow zone on the same side, adjust the process. If first part is in the yellow
zone and the second part is also in the yellow zone on the opposite side, stop and investigate
the process.
Rule 3 - If any part is in the red zone, then stop. Investigate, adjust, or reset the process. Re-
qualify the process and begin again with Rule 1.
It has the advantage of easy to implement and interpret, being used in initial setup operations to
determine if the product is centered between the tolerances, easy to detect shifts in process
centering or increases in process spread and it serves as a set up plan for short production runs.
But, it has the disadvantages of lacking information about how to reduce variability or how to
return the process into control, too limited to use for process with a capability ratio greater than 1.0
and small sample size limits the ability of the chart to detect moderate to large shifts.
Runs Test for Randomness - A run is a sequence of data that exhibit the same characteristic.Time
sequence analysis can apply to both variable and attribute data. As an example, results of surveys of
individuals who prefer Diet Pepsi or Diet Coca Cola

Test I PPPPPPPPPCCCCCCCCC
Test II PCPCPCPCPCPCPCPCPC
In both examples, eighteen samples were taken. Test I, there were only 2 runs. In Test II, there
were 18 runs. Both examples suggest non-random behavior. To perform a runs test
Determine the value of n1 and n2(either the total of two attributes or the readings above and
below the center line on a run or control chart).
Determine the number of runs (R).
Consult a critical value table or calculate a test statistic.
Consult the Critical Value Table for the expected numbers of runs.The expected number of runs
can be approximated by adding the smallest and largest values together and dividing by two.
Short-
Short-Run SPC - Short-run or low-volume production is common in manufacturing systems and
includes manufacturing processes that produce built-to-order products or quick turnaround
production. The short-run control chart can also be used in other industries such as general
services and healthcare when data are collected infrequently. These processes often are so short
that not enough data can be collected to construct standard control charts.
Statistical process control techniques have been developed to accommodate short-run production
for both variables data and attributes data. Examples of control charts for both situations are
presented. If possible, collect approximately 20 samples before constructing the control charts for
short production runs are constructed. In the examples presented in this subtopic, ten samples will
be used for illustration purposes.
Short run charting may be desirable when the production lot size is extremely small (10-20) pieces
or when the sample size, under typical operating conditions, is small.Two limited data charts may
be used X - MR Charts M- MR Charts.
The emphasis has been on short runs and multiple variables per chart. Consider a part which has
four key dimensions.Each dimension has a different target but expected similar variances.The for
each variable are coded by subtracting the target value. Calculating Centerlines is done as
Exponentially Weighted Moving Average (EWMA)

(EWMA) - The exponentially weighted moving average
(EWMA) is a statistic for monitoring a process by averaging the data in a way that gives less and
less weight to data as they are further removed in time. By the choice of a weighting factor, 8, the
EWMA control procedure can be made sensitive to a small or gradual drift in the process. The
statistic that is calculated is

where, EWMA0is the mean of historical data (target), Yt is the observation at time t, n is the
number of observations to be monitored including EWMA0, 0 < λ <= 1 is a constant that
determines the depth of memory of the EWMA . It is a variable control chart where each new
result is averaged with the previous average value using an experimentally determined weighting
factor, λ (lambda).
The parameter, λ determines the rate at which “older” data enters into the calculation of the
EWMA statistic. A large value of λ gives more weight to recent data and a small value of8 gives
more weight to older data.The value of 8 is usually set between 0.2 and 0.3 although this choice is
somewhat arbitrary. The estimated variance of the EWMA statistic is approximately
when t is not small, and where s is the standard deviation calculated from the historical data. The
center line for the control chart is the target value or EWMA0.The control limits are
Where the factor kis either set equal to 3 or chosen using the Lucas and Saccucci tables. The data
are assumed to be independent and these tables also assume a normal population.
It usually only averages plotted and range omitted and the action signal, a single point out of limits.
It is also known as the Geometric Moving Average (GMA) chart and used extensively in time-
series modeling and in forecasting. It allows the user to detect smaller shifts in the process than
with traditional control charts and is ideal to use with individual observations.
CUSUM Charts - It is called as the cumulative sum control chart (CUSUM) and is used with
variable data and calculates the cumulative sum of the deviations from target to detect shifts in the
level of the measurement. It may be suitable when necessary to detect small process shifts faster
than with a comparable Shewhart control chart.
The chart is effective with samples of size n= 1 where rational subgroups are frequently of size one.
Examples of utilization are in the chemical and process industries and in discrete parts
manufacturing. The CUSUM chart can be graphical (V-mask) or tabular (algorithmic) and unlike
standard charts, all previous measurements for CUSUM charts are included in the calculation for
the latest plot. But, establishing and maintaining the CUSUM chart is complicated.
V-mask - A V-mask resembles a sideways V. The chart is used to determine whether each plotted
point falls within the boundaries of the V-mark. Points falling outside are considered to signal a
shift in the process mean. Each time a point is plotted, the V-mask is shifted to the right. The
geometry associated with the construction of the V-mask is based on a combination of specified
and computed values. The graph below shows how the formulas relate.

The behavior of the V-Mask is determined by the distance k (which is the slope of the lower arm)
and the rise distance h.The team could also specify d and the vertex angle (or, as is more common
in the literature, q = 1/2 the vertex angle). For an alpha and beta design approach, we must specify
α, the probability of concluding that a shift in the process has occurred, when in fact it did not.
β, the probability of not detecting that a shift in the process mean has, in fact, occurred.
δ(delta), the detection level for a shift in the process mean, expressed as a multiple of the
standard deviation of the data points.
These charts have been shown to be more efficient in detecting small shifts in the mean of a
process than Shewhart charts.They are better to detect 2 sigma or less shifts in the mean. To create
a CuSum chart, collect m sample groups, each of size n, and compute the mean x of each sample.
Determine Sm or S'm from the following equations
where, µ0 is the estimate of the in-control mean andσx is the known (or estimated) standard
deviation of the sample means.The CuSum control chart is formed by plotting Sm or S'm as a
function of m.If the process remains in control, centered at µ0, the CuSum plot will show variation
in a random pattern centered about zero.
A visual procedure proposed by Barnard, known as the V-Mask, may be used to determine
whether a process is out of control.A V-Mask is an overlay V shape that is superimposed on the
graph of the cumulative sums.As long as all the previous points lie between the sides of the V, the
process is in control.

For example a process has an estimated mean of 5.000 with h set at 2 and k at 0.5.As h and k are
set to smaller values, the V-Mask becomes sensitive to smaller changes in the process
average.Consider the following 16 data points, each of which is average of 4 samples (m=16, n=4).
The CuSum control chart with 16 data groups and shows the process to be in control.
If data collection is continued until there are 20 data points (m=20, n=4), the CuSum control chart
shows the process shifted upward, as indicated by data points 16, 17 and 18 below the lower arm
of the V-Mask.
5.3. Implement and Validate

The final phase validate or verify which is about validating/verifying whether the design suits the
requirements of customers and business. Before implementing the design, there are some quick
steps or pre-requisites to be done to validate that the design is appropriate in normal working
conditions which are
Pilot Run / Simulation - Even though the design is now complete and a pilot run is completed
in the experimental set up, when it is implemented live on the floor there could be some
changes in the output. So, before going live, a simulation in the usual factory set up is done.
The team should try to use a factory set up (Ex - Layout of machinery) with raw materials,
working conditions (Ex - Temperature), machineries and people (Skill level & knowledge) as in
real time. All minute changes in the output and any hiccups faced during the process should be
recorded. A detailed RCA (Root cause Analysis) should be done and the root causes for
variation should be arrived at.
Make Corrections - Along with the Control recommendations that we arrived at in FMEA in
the design phase, the recommendations from this RCA should be added in the FMEA and

implemented. It is better to use check sheets, with the list of CTQs and other key things that
should be recorded during the simulation. This will help the team to ensure that all aspects of
the design are performing as intended and expected. No single aspect will be left out from
being monitored, however insignificant it might be. Corrections should be done to the process
wherever necessary.
Integrate Related Processes - In case the upstream and downstream processes need
modification, to cater to the requirements of the new design, it should be done and a fresh
Process flow chart should be created.
Document the Process - The process should be documented in the form of Standard
Operating Procedures, Process Manuals, Work Instructions, Supplier Manual, Quality Check
Manual, Process flow charts, etc.
Train the Employees & Communicate change - The Implementation plan that was drawn
during the Design phase is now implemented. Employees who will be deployed in the new
process, upstream and downstream process owners, Quality and training team, should be
trained on the new process to be followed. One of the very critical tasks in this step is Change
management, which is a challenge faced by many organizations. Depending on the organization
culture and employees’ profile, organizations’ choose their methodology of adapting to the
change. The new design is now implemented in full scale.
Draw Control Plan - A Control plan is drawn to observe any variations in the output. The
control plan will have the list of CTQ metrics to be monitored, a list of metrics that could bring
out the standard and quality of the input materials and variables, and mid-process performance
metrics. Review of these metrics should happen very frequently in the initial phase to identify
and fix the gaps immediately. The project team should meet at defined frequency to monitor
the progress.
Go Live - The process will now go live and is handed over to the operations Lead (the process
owner). The transition should be smooth (as mentioned in the Implementation plan). He
should be provided with the complete background of the design and the FAQs. This is the
reason why the process manuals, work instructions should be prepared in full detail.
Various improvement methods are used which are
Brainstorming - Brainstorming is a technique to systematically generate ideas usually to handle a

challenging situation, from a group of people by nurturing free-thinking. There are several such
opportunities in any organisation, e.g. Improving productivity, increasing sales, finding new
business development areas, launching new products or defining new processes.
While there may be well defined techniques or processes to handle these situations, but
brainstorming is a critical activity in all of these processes. Techniques such as Affinity, Nominal
group technique, Cause and Effect Diagram, Failure mode effect analysis, 5 whys, Fault tree
analysis, Decision matrix, and Risk analysis require brainstorming as an integral part of their
execution. The list is endless!
Brainstorming required for generating inputs for the above techniques is complex as compared to
the free flow ideation that one usually associates with the term brainstorming. An example of the
kind of brainstorming required here can be observed in a 5-why analysis, where brainstorming
occurs for every why in a hierarchical manner until a root cause is discovered.

Brainstorming session must be orchestrated by a facilitator. The number of participants in a

session must be limited to a manageable number - typically between 5 and 15. There are few rules
for a successful brainstorming, which should be enforced by the facilitator. These rules are listed
below.
Focus on generating a large number of ideas

Active involvement of every participant in the process
Encourage out-of-the-box thinking and creativity
Promote criticism free environment - encourage all types of ideas including wild or seemingly
ridiculous ideas while keeping the purpose of the brainstorming in mind
Combine ideas to create newer ideas
Setup a reasonable time limit based on the challenge in hand
Process to conduct brainstorming is as
Select and block a (lively) room free from interruptions and distractions for brainstorming.
Identify and invite the participants. The invite must clearly state the purpose of brainstorming.
Before the start, ensure that the room is equipped with basic essentials like blackboard,
flipcharts, pens, and large size post-its, etc.
Initiate the session by clearly explaining the purpose, possibly already written and highlighted
on the board. Also set the basic rules for the session. Set some time towards the end of the
session for organizing the ideas generated.
Invite people to come up with ideas. One of the participants may be designated to record each
idea or alternatively each participant may be requested to pen his/her idea on a post-it to speed
up the process. Maintain a lively environment, monotony must be avoided at every cost.
Ensure that the rules of a successful brainstorming are followed properly.
Towards the end, focus on organizing ideas and eliminating the duplicate ones. If the number
of ideas generated is sufficiently large, affinity diagram may be used to organize the ideas.
Close the session with a note collectively appreciating each ones contribution.
Multi-
Multi-vari studies - Multi-Vari Analysis is a tool that graphically displays patterns of variation.
These studies are used to identify possible X’s and/or families of variation. These families of
variation can frequently hide within a subgroup, between groups or over time.
It is a technique for viewing multiple sources of process variation. Different sources of variation are
categorized into families of related causes and quantified to reveal the largest causes. Multi-vari
analysis provides a breakdown for example, that machine B on shift 1 is causing the most
variation.It won’t quantify the variation just show where it is. Multi-Vari is the perfect tool to
determine where the variability is coming from in process (lot-to-lot, shift-to-shift, machine-to-
machine, etc.), because it does not require to manipulate the independent variables (or process
parameters) as with design of experiments. It enables analyzing the effects of multiple factors,
multi-vari analysis is widely used in six sigma projects.
Also, the effect of categorical type inputs can be displayed on a response on a multi-vari chart. It is
one of the tools used to reduce the trivial many inputs to the vital few. In other words it is used to
identify possible Xs or families of variation, such as variation within a subgroup, between
subgroups, or over time. Multi vari charts are useful for quickly identifying positional, temporal
and cyclical variation in processes.

FMEA - The acronym FMEA stands for "Failure Modes and Effects Analysis". It represents a
technique aimed at averting future issues in project processes and eliminating risks that may
hamper a solution.
It identifies and evaluates defects which could potentially result in reducing quality of a
product.Defects within the methodology are defined as anything that reduces the speed or quality
at which a product or service is delivered to customers. FMEA is used to discover and prioritize
aspects of the process that demand improvement and also to statistically analyze the success of a
preemptive solution. There various types of FMEA are
System FMEA - Used to analyze complete systems and/or sub-systems during the concept of
design stage.
Design FMEA - Used the analyze a product design before it is released to manufacturing.
Process FMEA -Used to analyze manufacturing and/or assembly process.
The steps to creating a FMEA are

List the key process steps in the first column usually from the Cause & Effect Matrix.
List the potential failure mode for each process step.
List the effects of this failure mode.
Rate how severe this effect is with 1 being not severe at all and 10 being extremely
severe.Ensure the team understands it.
Identify the causes of the failure mode/effect and rank it as the effects in the occurrence
column but, the name implies the score for how likely this cause will occur.
Identify the controls in place to detect the issue and rank its effectiveness in the detection
column.Here a score of 1 would mean we have excellent controls and 10 would mean no
controls or extremely weak controls.
Multiply the severity, occurrence, and detection numbers and store this value in the RPN (risk
priority number) column.This is the key number that will be used to identify where the team
should focus first.
Sort by RPN number and identify most critical issues.
Assign specific actions with responsible persons.
Once actions have been completed, re-score the occurrence and detection.
Measurement
Measurement system capability re- re-analysis - It is a method to re-identify the components of
variation in the measurement. It is used to re-quantify the impact of measurement errors and to
ensure the integrity of data used for analysis.
Just as a process has inherent variations, the process of measurement has variations too. Therefore,
when making decisions that relies on data, it is important to ensure that the systems that collect that
data are accurate and precise. Although it may not be possible to totally eliminate measurement
errors, the objective of it is to ensure that measurement variance is relatively much smaller than the
observed variance. It uses scientific tools to determine the amount of total variation is from the
measurement system.
The areas of measurement error are analyzed and quantified

Accuracy / Bias - The difference from the true value and the value from the measurement
system. Accuracy represents the closeness to a defined target.
Resolution / Discrimination - The goal is to have at least 5 distinct values or categories of
readings. The lack of resolution will not allow a measurement system detect change.
Linearity - It examines the performance of the measurement system throughout the range of
measurements.
Stability - It is analyzed using control charts. Ensuring the measurements taken by appraiser(s)
for the process is stable and consistent over time.
Repeatability & Reproducibility (Gage R&R) –
Reproducibility is the ability of one appraiser to get the same result and another appraiser
or the ability of all appraisers to get the same results among each other.
Repeatability - It is the ability for an appraiser to repeat his/her measurements each time
when analyzing the same part, unit, etc. In destructive testing (such as tensile testing) these
reading will not be possible and some statistical software programs have options to select
for destructive testing.
Post-
Post-improvement capability analysis – Post improvement re-measuring and re-analyzing the
capability and performance of a process under review enables organizations to numerically
represent and interpret its current state, and to report its sigma level. When done correctly,
process capability analyses enable project team to precisely assess current performance in light of
future goals, and ultimately, to determine the need and targets of process improvement. Process
capabilities can be determined for normal and non-normal data, and for variable (continuous) and
attribute (discrete) data alike. It is reported through the statistical measurements of Cp, Cpk, Pp,
and Ppk which are
Cp= Process Capability. A simple and straightforward indicator of process capability.

Cpk= Process Capability Index. Adjustment of Cp for the effect of non-centered distribution.
Pp= Process Performance. A simple and straightforward indicator of process performance.
Ppk= Process Performance Index. Adjustment of Pp for the effect of non-centered
distribution.
It also involves validating solutions through F-test, t-test and other similar tests.
5.4. Control Plan

A Control Plan is the key to sustaining the gains from a Six Sigma project. A Control Plan exists to
ensure that we consistently operate our processes such that product always meets customer
requirements as it ties together the elements of Six Sigma improvementactivity. it allows
Champions and other stakeholders to have confidence that the process improvements made will
be robust. The Control Plan is a guide for the Process Owner to assist in tracking and correcting
the performance of the KPIV's and KPOV's.
It is used to develop a detailed but simple document to clarify the details around controlling the
key inputs and key outputs for which all the improvement efforts have implemented. Once the
project is closed it is not necessarily over. The Control Plan is one part of ensuring the gains are
maintained. If process performance strays out of control there are details and tools to adjust and
re-monitor to ensure there has not been an over adjustment.

It is possible that the new performance capability warrants the calculation of new control limits. If
so, the test(s) used, evidence, and new control limits should also be a part of this document. It is
ideal to have a simple one-page document but if appendices and attachments are needed to ensure
understanding and control then include this information.
It includes all relevant material and information it takes to ensure the gains are sustained and even
further improved should be included. Often times there are long-term action items and the project
list (possibly utilize a Gantt Chart) shall be updated and followed by the Process Owner until those
actions are complete. The GB/BB will often move on to another project after the team disbands
but follow up is often required weeks or months later. The follow up is done in conjunction with
the Process Owner and possibly the Controller. Adhering to the details of the Control Plan will
standardize these efforts and allow quick analysis of current performance.
A Control Plan provides a single point of reference for understanding process characteristics,
specifications, and standard operation procedures also known as SOP for the process. A control
plan enables assignment of responsibility for each activity within the process. This ensures that the
process is executed smoothly and is sustainable in the long run.
A good Control Plan needs to be based on a well thought out strategy. A good control plan strategy
should minimize the need of tampering the process. It should also clearly state the actions to be
taken for out-of-control conditions. It should also raise appropriate indicators to indicate the need
for Kaizen activities. A Control Strategy should describe the training requirements to ensure that
everyone on the team is familiar with the standard operating procedures. In the case of an
equipment control plan, it should also include details about maintenance schedule requirements.
The intent of an effective Control Plan Strategy is to

Operate our processes consistently on target with minimum variation
Minimize process tampering (over-adjustment)
Assure that the process improvements that have been identified and implemented become
institutionalized
Provide for adequate training in all procedures
Include required maintenance schedules
Control Plan inputs includes all processes, measurement systems and resources that need to be
monitored and controlled
The Elements of control plan includes process map steps, key process output variables, targets &
specs, key and critical process input variables with appropriate working tolerances and control
limits, important noise variables (uncontrollable inputs) and short and long term capability analysis
results. Other element of control plan is designated control methods, tools and systems (spc,
automated process control, checklists, mistake proofing systems and standard operating
procedures),training materials, maintenance schedules and reaction plan and responsibilities
The plan should be developed by the project team in conjunction with those who will be
responsible for the day to day running of the processes. The plan should be validated and then be
subject to regular review, as part of the overall management system


Six Sigma Green Belt Material

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Six Sigma Green Belt Material

Uploaded by

Copyright:

Available Formats

Certified Six Sigma - Green

Belt Proficiency Testing Programme

Professional Intelligent Communication Systems India

VS-1103 JV of DSIIDC ( Govt of NCT Delhi)

This document describes deploying,

Copyright© 2014 Cubezoid Solutions Private Limited

Content, design, typesetting and published by Cubezoid Solutions Private Limited,

All rights reserved

1. Six Sigma and Organization .................................................................................................... 4

5. Improve and Control ............................................................................................................ 116

1. SIX SIGMA AND ORGANIZATION

1.1. Six Sigma and Organizational Goal

Various models and tools emerged which are

Kaizen – It refers to any improvement, one-time or continuous, large or small

Create constancy of purpose for improving products and services.

Quality is defined by conformance to requirements.

Arman Feigenbaum - He developed a systems approach to quality (all organizations must be

True characteristics are the customer’s view

Quality first and not short term profits.

External failure cost: warranty claims, service cost

Sigma Level Defects per million opportunities

Y x1, x2, x3, …., xn

Having a standard model such as DMAIC (Define-Measure-Analyze-Improve-Control) makes

Types of Processes - Processes can be classified as management processes, operational processes

Management processes - These processes administer the operation of a system. Some

Gather All possible x's

This Phase involves the usage of following tools

Process Maps, Value Stream Mapping

Attribute Gage R&R

Design of Experiment (DOE)

Cost of Quality (COQ)

The various costs which constitute cost of quality are

Examples of the various costs are

Prevention - Training Programme, Preventive Maintenance

Identifying COQ can have several benefits, as

It provides a standard measure across the organisation & also inter-organisation

Organizational Drivers and Metrics

Gathering VOC information can be done by

Conducting VOC helps by

The Balanced Scorecard is needed due to various factors, as

1.2. Lean Principles

Define value from the customer’s perspective

Few icons used for mapping and development of VSM, includes

Icon Name Description

Go See Scheduling Glasses represent collecting information visually. It can

Kanban Post This represents a location for kanban signal pickup.

Draw customer, supplier and production control icons.

1.3. Design for Six Sigma (DFSS)

QFD analysis is conducted in six steps as

Steps in the process

Failure Mode: What could go wrong?

The DFMEA helps the DFSS team in

Estimating the effects on all customer segments

Identifying potential manufacturing/assembly or production process causes in order to place

The application of DMADOV is aimed at creating a high-quality product keeping in mind

Is data available or easy to obtain?

2.1. Process Management

Flowchart Process Matrix

Process Identification – Process to be improved or optimized need to be identified by the process

The steps to create a SIPOC are

Owners and Stakeholders

2.2. Project Management

2.3. Management and Planning Tools