DMAIC: Analyzing: Depicting and Analyzing Data Through Charts and Graphs

DMAIC: Analyzing

Depicting and Analyzing Data

through Charts and Graphs
Learning objectives
• Understanding the power inherent in basic charts and graphs
• Creating and analyzing variation and distributions through
histograms or dot plot
• Comparing distributions with box and whisker plots
• Exploring variable relationships with scatter plots
• Using process behavior charts to see true performance
• The most important — and often the most powerful
— tools for analyzing and communicating data are
• The basic graphical tools are called graphs or plots
• The chief purpose of plotting and charting data is to
graphically show the central tendency and the
spread of variation in a measured item of interest.
• Different types of graphs are
• Histograms or dot plots,
• Box and whisker plots,
• Scatter plots,
• Process behavior charts.
Histograms and Dot Plots
• A dot plot shows the scatter and grouping of a data
from a single characteristic using dots
• A histogram takes the data from the dot plot and
replaces the dots with bars
Creating dot plots and histograms
• Create a horizontal line that represents the scale
of measure for the characteristic
• Divide the horizontal scale of measure into equal
chunks or “buckets” along its length.
• For each observed measurement of the
characteristic, locate its value along the
horizontal scale and place a dot for it in its
corresponding bucket.
• To create a histogram, replace each of the stacks
of dots with a solid vertical bar of the same
height as its corresponding stack of dots.
Interpreting dot plots and
• Plot height :
The frequency — the height of dots or the bar
• Variation shape
Three basic Variations:
• Normal
• Uniform
• Skewed
• Normally distributed, or bell-shaped
• Most of the observed values of the characteristic
are close to a central point
• Fewer values appearing from the central
• The variation is evenly spread out across a bounded
• Likely to observe a value for a characteristic at one
end of the interval, or anywhere in between.
• A variation shape that isn’t symmetrical
• One side of the distribution extends out farther than
the other side
Variation mode:
• The mode of a distribution is its most often repeated
value, or in other words, its peak
• Single peak
• Multi-modal
Variation average
• Visually estimate a characteristic’s mean, or average
Variation range
• The extent or width of variation
• Measured observations that don’t seem to fit the grouping of
the rest of the observations
Box and Whisker Plots
• A box and whisker plot is made up of a box, which
represents the central mass of the variation,
• Thin lines, called whiskers, that extend out on either
side and represent the thinning tails of the
Creating Box and Whisker plot
1. Rank the data measurements in order from
least to greatest.
2. Determine the median of the data.
3. Find the first quartile, Q1- 25-percent point in
the rank-ordered sequence
4. Find the third quartile, Q3-75-percent point in
your rank-ordered sequence
5. Draw a horizontal line, representing the
scale of measure for the characteristic.
6. Mark the median and quartile values from
Steps 2 through 4 and construct the box.
7. The next step is to draw the whiskers on the ends of the
box. Find the inter-quartile range (IQR) by subtracting
the value of the first quartile boundary from that of the
third quartile boundary.
• Smallest data point is bigger than or equal to Q1 -1.5 IQR
• Largest data point is less than or equal to Q3 +1.5 IQR
• Any points not in the interval [Q1-1.5 IQR; Q3+1.5 IQR] are
plotted separately.
8. Find the smallest and largest data point in the data set
9. It is possible that some data points in your list will lie
outside of the ends of the whiskers . These points are
called outliers. Plot any outliers as dots beyond the
• Create a box plot for the following set of monthly sales
Making sense of box and whisker
• Box and whisker plots are ideal for comparing two or more
variation distributions
• Differences or similarities in location of the median
• Differences or similarities in box widths
• Differences or similarities in whisker-to-whisker spread
• Overlap or gaps between distributions
• Skewed or asymmetrical variation in distributions
• The presence of outliers
Scatter Plots
• A scattered cluster of dots on a graph
• Extremely powerful tool to explore and quantify the relationship
between two or more characteristics
• Scatter plots start to get to the root of how certain variables impact
other variables
• the start of getting to the fundamental Y = f(X)+ ε relationship at the
heart of Six Sigma improvement
Developing a scatter plot
• Capture measurements from two characteristics
• Can be two inputs; one can be an input and the other
can be an output; or they can be two outputs
1. Form points from the collected data.
2. Create a two-axis plotting framework
3. Plot each formed point on the two-axis framework.
Drawing correlations from a scatter plot
Assessing the amount of correlation
• The amount of correlation in a scatter plot is determined by
how closely or tightly the plotted points fit a drawn line.
• It helps to find out the factors or variables that can positively
influence the desired performance improvement outcome as
defined by the project objective statement, and the amount of
correlation is a key indicator of this relationship.
Direction of correlation
• Direction of correlation comes in two types:
• Positive
• Negative.
• Two characteristics are positively correlated if the relationship
indicates that an increase in one characteristic translates into
an increase in the other
• Two characteristics are negatively correlated if the relationship
indicates that an increase in one characteristic translates into
a decrease in the other, and vice versa.
Surveying strength of effect
• Scatter plots also graphically show the strength or magnitude
of the effect one characteristic has on the other.
• Two characteristics may be strongly correlated, but
• large change in one characteristic may still lead to only a small
change in the other, or
• a small change in one characteristic can be magnified as a large
change in the other.
• The slope of a line shows how steep it is.
• The slope is quantified mathematically by comparing how
much the line climbs to how much it runs across between two
points. This comparison is formed from a ratio of rise to run.
• For example, given two points on a line (x1, y1) and (x2, y2),
you calculate the slope with the following equation:

• If the calculated slope is zero, the line is horizontal or flat. A

negative slope means that the line slopes down from left to
right, and a positive slope indicates a line that slopes up from
left to right.
• As the calculated slope value gets farther away from zero
(either positively or negatively), the steepness of the line
increases. When you get to a slope of positive infinity or
negative infinity, you have yourself a vertical, straight up and
down line.
Process Behavior Charts
• A behavior chart graphically shows how that
variation plays out over time
• Behavior charts form the foundation of detecting
and finding the root cause of non-normal
Creating a characteristic or process
behavior chart
• Create a horizontal scale representing time or order.
• Create a vertical axis representing the scale of measure
for the characteristic.
• lot each observation as a dot, using its order and
• Connect the dots.
Interpreting characteristic or
process behavior charts
• A behavior chart shows the normal behavior of a process
or characteristic and also detect non-normal behavior-
variation above and beyond the expected normal level.
Variation beyond expected limits
• Outliers are measurement observations that occur beyond the
limits of the normal short-term variation
• When excessive, non-normal variation are detected, time scale
or run order of the behavior chart are used as a starting point
for investigating what conditions or factors are at fault.
• Typical causes of outliers include worker inattention,
measurement errors, and other one-time changes to the
process’s or characteristic’s environment.
• A trend is a steady, gradual increase or decrease in the central
tendency of the process or characteristic as it plays out over
• If all the conditions in the system stay constant, the level of
performance of the process or characteristic also stays level
• The presence of a trend in a graphical behavior plot is evidence
that something out of the ordinary has happened to move the
location of the process or characteristic behavior.
• A run is a sequence of consecutive observations that are
each increasingly larger or smaller than the previous
• Runs can be caused by faulty equipment, calibration
issues, and cumulative effects, among other things
• Shifts are sudden jumps, up or down, in the process’s or
characteristic’s center of variation
• Something in the system changes permanently — a piece
of equipment, a new operator, a change in material, a
new procedure, or whatever — to cause these
• Shifts are clearly non-normal behavior

