Statistical Learning Methods Applied To Process Monitoring An Overview and Perspective - JQT - 2016 PDF

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 24

Statistical Learning Methods

Applied to Process Monitoring:


An Overview and Perspective
MARIA WEESE and WALDYN MARTINEZ
Miami University, Oxford, OH, USA

FADEL M. MEGAHED
Auburn University, Auburn, AL, USA

L. ALLISON JONES-FARMER
Miami University, Oxford, OH, USA

The increasing availability of high-volume, high-velocity data sets, often containing variables of di↵erent
data types, brings an increasing need for monitoring tools that are designed to handle these big data sets.
While the research on multivariate statistical process monitoring tools is vast, the application of these
tools for big data sets has received less attention. In this expository paper, we give an overview of the
current state of data-driven multivariate statistical process monitoring methodology. We highlight some
of the main directions involving statistical learning and dimension reduction techniques applied to control
charts in research from supply chain, engineering, computer science, and statistics. The goal of this paper
is to bring into better focus some of the monitoring and surveillance methodology informed by data mining
techniques that show promise for monitoring large and diverse data sets. We introduce an example using
Wikipedia search information and illustrate a few of the complexities of applying the available methods to
a high-dimensional monitoring scenario. Throughout, we o↵er advice to practitioners and some suggestions
for future research in this emerging area of research.

Key Words: Control Charts, Ensembles; Neural Networks; Regression; Support Vector Machines; Variable
Selection.

1. Introduction

C were proposed by Walter A.


ONTROL CHARTS Dr. Weese is Assistant Professor of Analytics in the Farmer
Shewhart in the 1920s as a tool to distin- School of Business. She is a senior member of ASQ. Her email
guish between the inherent (common-cause) varia- address is weeseml@miamioh.edu.
tion within the process and variations due to un- Dr. Martinez is Assistant Professor of Analytics in the
wanted process disruptions (special cause). Control Farmer School of Business. His email address is martinwg@
charts can be used retrospectively, in Phase I, or miamioh.edu.
prospectively, in Phase II. The goal of Phase I is to
understand the sources of process variability, define Dr. Megahed is Assistant Professor of Industrial and Sys-
tems Engineering at Auburn University. He is a member of
the in-control state of the process, and determine an
the ASQ. His email address is fmegahed@auburn.edu.
in-control reference sample to design a control chart
for Phase II monitoring. See, e.g., Jones-Farmer et Dr. Jones-Farmer is Professor and Van Andel Chair of Ana-
al. (2014b) or Chakraborti et al. (2008) for a review lytics in the Farmer School of Business. She is a Senior Member
of Phase I control charts. In Phase II, the process is of ASQ. Her email address is farmerl2@miamioh.edu.

Journal of Quality Technology 4 Vol. 48, No. 1, January 2016


STATISTICAL LEARNING METHODS APPLIED TO PROCESS MONITORING 5

monitored for departures from the in-control state. of this, we consider methods that have been devel-
The goal of Phase II is to detect such changes as oped for larger and more diverse data sets than those
quickly as possible. From the 1920s to the present, typically considered in the statistical process control
there have been many developments in control chart (SPC) literature and consider the potential scalabil-
methodologies; however, the applications of these ity of these methods. We do not provide an exten-
newer methods in practice are limited. For example, sive review of all papers that use statistical learning
the website used by the American Society of Qual- and/or dimension reduction methods along with con-
ity to explain control charts is limited to a discus- trol charts, but highlight the basic directions of this
sion of the traditional Shewhart charts with 3 limits research.
(Teague (2004)). Our experience indicates that, when
control charts are applied in industry, the applica- Our paper is focused, specifically, on how statis-
tions are typically limited to Shewhart-type charts tical learning methods have been used in developing
and Shewhart-type charts with runs rules. Our view SPC charts. Because the literature describing these
is supported by several other researchers in the field. methods has developed in a number of di↵erent re-
For example, Crowder et al. (1997, p. 139) stated search areas (e.g., manufacturing, operations man-
that “[t]here are few areas of statistical application agement, statistics, and process control), SPC re-
with a wider gap between methodological develop- searchers may be unfamiliar with the developments
ment and application than is seen in SPC”. Woodall in some of these other fields. Thus, our goal is to sum-
(2000, p. 346) agreed by stating that “another unfor- marize the main research directions in several areas
tunate fact is that some useful advances in control applying statistical learning to control charts in such
charting methods have not had a sufficient impact a way that researchers and practitioners from dif-
in practice”. More recently, in his Youden Memorial ferent fields can understand, apply, and extend these
Address, Vijay Nair (2008) stated that “there are methods for monitoring larger and more diverse data.
far too many papers developing yet another charting There have been some focused reviews on data min-
procedure without considering whether the problem ing applications in manufacturing and quality con-
is important and whether the method can be actu- trol, and we discuss these in the appropriate sections
ally used”. We believe that this issue remains valid, of our paper. For example, Choudhardy et al. (2009)
and it is a primary motivator for this paper. gave a high-level overview of data mining methods
used in manufacturing, but only briefly mentioned
The goal of this paper is to bring to better focus
quality control applications.
some of the methods that rely on statistical learn-
ing and/or dimension reduction methods that show
We assume that the reader is somewhat familiar
promise for monitoring large and diverse data sets.
with the basic concepts behind the construction and
These large data sets, often termed big data, typically
use of control charts (for detailed introductions, see
require more advanced statistical methods and often
Woodall and Adams (1998), Wheeler and Chambers
more computing power than smaller, more manage-
(2010), Montgomery (2013)). On the other hand, it is
able data sets. Unlike the origins of statistical mon-
assumed that the reader is somewhat unfamiliar with
itoring, these applications are not limited to manu-
statistical learning methods. Accordingly, in Section
facturing, but also include opportunities in several
2, we define our view of the term big data and pro-
application areas, including social media, gaming,
vide some background information and references on
travel, insurance, healthcare, utility demand, and
several fundamental statistical learning methods. In
others (e.g., see Ning and Tsung (2010)).
Sections 3 and 4, we discuss the application of unsu-
The size of the data sets in these application areas pervised learning and supervised learning methods,
is difficult to quantify and vary considerably by in- respectively, to process monitoring. Throughout each
dustry and application. Further, many of the meth- section, we review major research streams, discuss
ods we discuss are yet to be applied to data that how these methods can be used in big data settings,
would be considered big data relative to what is ob- and o↵er some advice for practitioners, as well as
served in industry. For example, Wal-Mart, the lead- suggestions for future research. In Section 5, we give
ing U.S. discount retailer, processes more than 1 an example related to monitoring Wikipedia search
million customer transactions per hour, resulting in data to illustrate some of the complexities of apply-
databases estimated to be in the magnitude of 2,500 ing control charts to high-dimensional data. Finally,
terabytes (“Data, Data Everywhere” (2010)). In light in Section 6, we provide our concluding remarks.

Vol. 48, No. 1, January 2016 www.asq.org


6 MARIA WEESE ET AL.

2. Background Information is least squares regression, where H is linear in its pa-


rameters and the generalization error is the sum of
2.1. Big Data the squared model errors. Other common examples
The term big data is used to describe large, di- of supervised learning methods include logistic re-
verse, complex, and/or longitudinal data sets that gression, artificial neural networks (ANNs), support
are generated from a variety of equipment, sensors, vector machines (SVMs), and decision trees (DTs).
and/or computer-based transactions. Big data is of-
In some situations, several supervised learning
ten distinguished from other data by the 3V’s, the
models can be combined to obtain better predictive
volume, variety, and/or velocity (Megahed and Jones-
performance than one could obtain from fitting a sin-
Farmer (2015)). Other important characteristics of-
gle model. Algorithmically combining multiple mod-
ten associated with big data are veracity and value,
els together to improve model performance is com-
but these are important characteristics for all data,
monly referred to as ensemble modeling. Ensemble
not just big data (Jones-Farmer et al. (2014a)). The
models are often used to combine learning models,
methods associated with the analysis of big data are
such as decision trees that are considered to be weak
important for analyzing high-volume, varied, high-
on their own but tend to be quite powerful when
velocity process data. While many of these methods
multiple trees are combined into a classifier. Boost-
have been used in the practice of process monitoring
ing (Schapire (1990)) refers to a family of meth-
for years (e.g., dimension reduction and regression-
ods that combine sequences of individual classifiers
based methods), other methods originate from the
into highly accurate ensembles. AdaBoost (Freund
fields of machine learning or statistical learning and
and Schapire (1995)) and gradient boosting (Fried-
are less familiar to researchers and practitioners in
man (2001)) are two common boosting algorithms in
SPC. Many of these methods have foundations in
which each subsequent model is trained to empha-
data-driven (as opposed to model-driven) analysis
size the cases that were misclassified from the pre-
and may require a paradigm shift for many in SPC
vious modeling instance. Other ensemble approaches
research and in practice. Nonetheless, there is much
include random forests (Breiman (2001a)) and bag-
that can be learned in the field of SPC from the meth-
ging (Breiman (1996)). For an in-depth treatment of
ods that have been developed for use in solving sim-
ensemble modeling, the reader is referred to Hastie
ilar big data problems in other domains.
et al. (2009, p. 605–624).
2.2. Statistical Learning Methods Unsupervised learning describes an area of statis-
Statistical learning techniques have become very tical learning that does not benefit from the avail-
popular in the last two decades due to their versatil- ability of an outcome variable. The goal of unsuper-
ity and power. James et al. (2013) refer to statistical vised learning is to develop a framework or under-
learning as a vast set of tools for understanding data. stand a pattern in the structure of the input variables
These tools can be broadly classified as supervised or {x1 , x2 , . . . , xp }. Examples of unsupervised learning
unsupervised learning. methods include cluster analysis, principal compo-
nents analysis (PCA), latent variable methods, and
Supervised learning refers to inferring a mapping mixture modeling.
between a set of input variables x = {x1 , x2 , . . . , xp }
and an output variable y, given a training sample The choice of which statistical learning method to
S = {(x1 , y1 ), . . . , (xn , yn )} of data pairs generated use often depends on the structure of the data. For
according to an unknown distribution Pxy with den- example, some methods perform better when the in-
sity p(x, y). The main goal of supervised learning put variables within x are scaled to a similar range,
is to estimate a function H : X ! Y such that H others perform poorly when the variables within x
will correctly classify unseen examples (xi , yi ). The are highly redundant/correlated. Often, some form
function is selected such that the generalization error of data preprocessing is required prior to using statis-
R[H] (also called the expected risk of the function) tical learning methods. There are some models (e.g.,
is minimized, SVMs, neural networks, mixture models) that may
Z be used in either a supervised or unsupervised way.
R[H] = g(y, H(x))dp(x, y), (1) Although we chose the broad classification of super-
vised vs. unsupervised learning methods to organize
where g(y, H(x)) is a suitable loss function. One of our paper, we readily admit that the distinction for
the most common supervised learning methodologies particular applications of learning methods to statis-

Journal of Quality Technology Vol. 48, No. 1, January 2016


STATISTICAL LEARNING METHODS APPLIED TO PROCESS MONITORING 7

tical monitoring is often unclear, even within certain nential increase in the number of variables and, more
methodological papers. importantly, their types. In a medical study, for ex-
ample, the variables collected on the participant are
Statistical monitoring methods have benefited often more complex and can now include electromyo-
from the use of statistical learning techniques, es- graphy (EMG) signals, oxygen in-take profile, and
pecially through the application of statistical learn- medical images or movies, which make the number
ing and dimension reduction methods. For example, of dimensions associated with a single patient in the
ANNs, inductive learning, SVMs, and decision trees thousands or even millions. In this example, n, the
have all been suggested as methods to be used to number of patients, is likely to be much smaller than
build control charts for monitoring and/or pattern p, the number of variables. Sall (2013) referred to this
recognition. Unsupervised learning methods, such as phenomenon as wide data (as opposed to tall data).
cluster analysis and kernel estimation, have been sug-
gested for Phase I analysis in both traditional and There are several approaches to reducing high-
profile monitoring applications of control charts. Di- dimensional problems to lower-dimensional represen-
mension reduction methods, such as PCA and factor tations. Generally speaking, these approaches can be
analysis, have been widely applied to control charts, classified into two main groups. In the first group, the
often in conjunction with other supervised and un- focus is on selecting a subset of important variables,
supervised methods. As statistical learning methods k, and ignoring the remaining not-so-important p k
become more mainstream, newer methods, such as variables. A classic example of this group in statis-
ensemble models, are being considered for applica- tics involves the choice of predictors through variable
tion to control charts as well. In the next sections, selection methods in regression. The second group
we will discuss an overview of each of these areas involves projecting the original set of variables into
(unsupervised vs. supervised learning) as they have a lower-dimensional subspace. Principal components
been applied to process monitoring. analysis (PCA), partial least squares (PLS), and fac-
tor analysis (FA) are all examples of such approaches.
3. Unsupervised Learning In this section, we consider some recent developments
Approaches to Process Monitoring that are relevant to both Phase I and Phase II ap-
plications of control charts. The reader should note
Unsupervised learning methods are applicable
that, throughout this section, we use variables to de-
when little is known about the process and there is
note the original/raw input variables and features to
no information given as to what constitutes an out-
denote latent variables that are constructed from the
of-control event. This makes unsupervised learning
input variables.
methods particularly applicable in Phase I of process
monitoring. Jones-Farmer et al. (2014b) discussed
Prior to explaining the process monitoring appli-
some Phase I applications of unsupervised learn-
cations related to the two main areas of dimension
ing methods. In this section, we consider two broad
reduction, it is important to note that the choice of
classes of unsupervised learning methods: dimension
whether the dimension should be reduced based on
reduction methods, and cluster and one-class classi-
selecting a subset of variables or projection to a lower
fication methods. Although we realize that there is a
dimension is application dependent. In certain appli-
di↵erence between clustering methods and one-class
cations, it may be more meaningful to maintain a
classification methods, both unsupervised clustering
subset of the original variables based on some ranking
and one-class classification methods are applied us-
criterion if this will facilitate the monitoring, diagno-
ing a very similar framework in process monitoring.
sis, and decision making. If there is no need to main-
3.1. Dimension Reduction Methods tain that original form of the variables, projection
(or feature extraction) methods may be more suit-
In traditional statistical analyses, an observation able. Guyon and Elissee↵ (2003) have constructed
typically refers to a certain phenomenon (e.g., a par- a heuristic-based checklist that summarizes the dif-
ticipant in a medical study) and we have a vector ferent steps that may be needed to approach feature-
of values on several variables of interest (e.g., age, selection problems. There are several streams of SPC
gender, weight, height, etc.). In such analyses, the research that discard the original set of variables
assumption is that the number of observations, n, is (e.g., profile monitoring, risk-adjusted control charts,
much larger than the number of variables, p. How- and much of the image-monitoring literature). The
ever, in the age of big data, there has been an expo- distinction between whether the deployed method se-

Vol. 48, No. 1, January 2016 www.asq.org


8 MARIA WEESE ET AL.

lected a subset of the variables or extracted features ing features from the p-dimensional dataset has been
from these variables is often not clear. implemented since the late 1980s in chemometrics
(Wise et al. (1988), Wise and Ricker (1989), Kresta
3.1.1. Variable Selection Approaches Applied to et al. (1991)). In these applications, the dimension
Process Monitoring of the data is often reduced based on a model from
There is an increasing number of applications PCA or PLS (Kourti and MacGregor (1995), Mac-
where there is a need to monitor high-dimensional Gregor and Kourti (1995), Ferrer (2014)). The resul-
process data. In such applications, selecting a sub- tant components are then monitored using a mul-
set of “the most important” quality characteristics tivariate control chart, such as the Hotelling’s T 2
from the data may be sufficient for process monitor- chart.
ing. Wang and Jiang (2009) suggested that the num-
Recently, the use of PCA, PLS, and their exten-
ber of simultaneously shifted variables is typically
sions with control charts have been applied to a num-
small in practice and it would be both more beneficial
ber of di↵erent high-dimensional domains. For exam-
and practical to reduce the monitoring to a smaller
ple, Megahed et al. (2011) presented a discussion of
subset of variables that are responsible for the out-
the use of projection methods with control charts
of-control conditions. Because the shifted variables
in the context of multivariate image analysis. Gron-
are unknown in advance, Wang and Jiang (2009)
skyte et al. (2013) extended the use of PCA and the
proposed a procedure that combines a forward se-
Hotelling T 2 control chart to monitor the motion of
lection method with multivariate control charting.
pigs through video sequences. Yan et al. (2014) mod-
Zou and Qiu (2009) investigated the use of the least
eled the high-dimensional structure of image data
absolute shrinkage and selection operator (LASSO)
with tensors and employed low-rank tensor decom-
variable selection technique (Tibshirani (1996)) to
position techniques, including several extensions of
create a multivariate test statistic and integrated
PCA and a tensor rank-one decomposition approach,
this statistic with a multivariate EWMA control
to extract important features that are monitored
chart. Capizzi and Masarotto (2011, 2013) recom-
using multivariate control charts. While these re-
mended using least angle regression (LAR) (Efron et
cent examples have all been in the image/video-
al. (2004)) with a multivariate EWMA chart. LASSO
monitoring domain, these approaches are highly ef-
and LAR for variable selection in multivariate SPC
fective in other domains of SPC. The reader is re-
have also been suggested for profile monitoring (Zou
ferred to Colosimo and Pacella (2007) for an exam-
et al. (2012)), diagnosing process changes (Zou et al.
ple of using PCA in the context of functional data
(2011)), and monitoring for changes in the covariance
analysis, where PCA is used to identify systematic
matrix (Maboudou-Tchao and Diawara (2013)).
patterns in roundness profiles of manufactured parts.
From our perspective, the use of variable selec-
Two important points should be made based
tion methods applied to the multivariate monitoring
on the use of PCA/PLS with multivariate control
problem can be an efficient approach, but should be
charts. First, it is possible to move from the projec-
applied with caution. For example, one or more of
tion space back to the original data space. The sec-
the statistically “not so important variables” may
ond, and perhaps more important, point is that, in
suddenly become very important in a high veloc-
the chemometric approaches involving PCA, a con-
ity situation and, if eliminated, this change may
trol chart is applied to the squared prediction error
go undetected. We recommend supplementing these
(also known as the Q-statistic or SPE statistic) to en-
approaches with expert knowledge on the process
sure that the process variability is still well modeled
data and suggest that users continuously monitor the
by the maintained principal components. We believe
goodness of fit of their models. A significant change
that this is a very important step because it ensures
in a measure of the goodness of fit may be an indi-
that the assumption that no significant information
cator of an out-of-control condition. The interested
is lost due to the projection is valid when an unknown
reader should refer to the discussion of the Q-statistic
out-of-control condition occurs.
below for an example of how a similar problem is
handled in chemometrics. Projection methods such as PCA can also be very
useful in generating additional knowledge about the
3.1.2. Projection and Feature Extraction Methods
process being studied. Woodall et al. (2004) noted
Applied to Process Monitoring
that PCA can be very useful in understanding pro-
The use of projection techniques and/or extract- cess variation, an important step needed prior to

Journal of Quality Technology Vol. 48, No. 1, January 2016


STATISTICAL LEARNING METHODS APPLIED TO PROCESS MONITORING 9

moving to Phase II monitoring. Wells et al. (2012) often determined according to some criterion, e.g.,
showed that PCA can be used to describe unique Bayesian Information Criterion (BIC). Nylund et al.
geometric variation modes occurring during auto- (2007) and Tofighi and Enders (2008) discussed cri-
motive manufacturing. We see four primary bene- teria for selecting the number of clusters in a mix-
fits of using PCA-like methods with larger data sets. ture model. Once the number of clusters and the dis-
The use of PCA-like methods in process monitor- tributional and mixture parameters are estimated,
ing: 1) can improve the analysis by removing the this information is used to establish a control region
influence of redundant and noisy variables; 2) can for the normal operating conditions of the process.
make data processing easier; 3) can help to im- Figure 1, reprinted from Thissen et al. (2005), com-
prove the interpretation and visualization of larger pares the control region for two bivariate data sets.
data sets; and 4) can help the practitioner to dis- The graphs on the left show the in-control regions
cover and understand the correlation structure of established using a Hotelling’s T 2 approach versus
the data through the investigation of the component the graphs on the right which show the in-control
scores. These are all important aspects when mon- regions established using the mixture modeling ap-
itoring larger data sets. The reader is referred to proach. The mixture modeling approach produces an
the package bigpca (http://cran.r-project.org/web/ irregular in-control region, whereas the T 2 approach
packages/bigpca/bigpca.pdf) in the statistical soft- gives an elliptical region. Although Figure 1 is based
ware R for implementations of PCA for massive on simulated data, in real applications, the existence
datasets. of multiple clusters within a reference sample could
be an indicator of an out-of-control situation. We
3.2. Clustering/One-Class Classification caution practitioners to conduct a thorough Phase
Methods I analysis of the data to understand the sources of
Clustering methods may be based on an a pri- variability and potential clustering of observations.
ori model, such as mixture modeling, or algorithmic References for those interested in the use of model-
methods like k-means or hierarchical agglomerative based clustering methods for defining the in-control
clustering. The line between model-free and algorith- state of a process include, e.g., Chen and Liu (1999),
mic clustering methods is not clear, as many algorith- Doymaz et al. (2001), Choi et al. (2004), Thissen et
mic methods have been shown to be special cases of al. (2005), Chen et al. (2006).
the model-based methods under certain model con-
Similar to the model-based clustering methods,
ditions. An excellent overview of clustering meth-
the one-class classification (OCC) approaches to pro-
ods from a statistical perspective is given in Fraley
cess monitoring reframe the monitoring problem into
and Raftery (2002). There are numerous clustering
a classification problem that classifies observations as
methods, and many of these have been applied to
either in- or out-of-control. This stream of research
control charting. For example, in the chemical in-
began with the introduction of the k-chart (Sun and
dustry, model-based clustering methods such as mix-
Tsung 2003). The k-chart is a control chart based
ture modeling have been used to define the in-control
on the support vector data description (Tax and
state of the process or normal operating conditions.
Duin 1999, 2004) and designed for non-normal pro-
Mixture modeling is a method in which the distribu-
cess data. Support vector data description (SVDD) is
tion of independent variables is considered a mixture
an unsupervised application of support vectors orig-
of two or more distributions that may di↵er in loca-
inally applied to machine fault diagnosis (Tax et al.
tion, scale, or correlation structure. Mixture models
1999, Ypma et al. 1999). “The main idea of SVDD
can be used alone (to create clusters) or in conjunc-
is to envelop the samples within a high-dimensional
tion with a regression model where the clusters of
space with the volume as small as possible” (Sun
independent variables predict a target variable.
and Tsung 2003, p. 2979). Support vector methods
The use of model-based clustering with control use hyperplanes to divide multidimensional data into
charts usually combines a dimension reduction tech- groups or classes. When hyperplanes are used to sep-
nique (e.g., PCA) with a mixture model approach. arate the data into classes, several difficult-to-classify
When estimating a mixture model, two sets of pa- observations will lie close to the separating planes.
rameters are estimated: the distributional parame- These difficult-to-classify points, known as support
ters (µ and ⌃ in the case of a Gaussian mixture); vectors, are influential in determining the separating
and mixture parameters that give the fraction of ob- hyperplane for correctly classifying observations. The
servations in each cluster. The number of clusters is selection of the separating hyperplanes can be deter-

Vol. 48, No. 1, January 2016 www.asq.org


10 MARIA WEESE ET AL.

FIGURE 1. A Comparison of Control Regions Defined Using a T 2 Approach (left) and the Mixture Modeling Approach
(right) for Two Di↵erent Data Sets (One Data Set per Row). Reprinted from Thissen et al. (2005) with permission.

mined based on the number of support vectors (in- The shape of the boundary determined using the
fluential observations) as well as the number of mis- SVDD method di↵ers based on the di↵erent types of
classified observations in the training sample. When kernel functions used. Kernel functions used in sup-
a training sample is available, support vector meth- port vector machines allow the user to implement a
ods allow for some degree of misclassification in order nonlinear boundary to separate the two classes of
to obtain a solution that is more robust to individual data (see pages 93–121 in Cristianini and Shawe-
observations. Most applications of support vectors to Taylor 2000). Figure 2 compares two kinds of bound-
process monitoring that we found used unsupervised aries for enclosing the data. The first boundary is
support vector methods (e.g., SVDD) as an OCC to based on a fitting a hypersphere and the second is
classify observations as either in- or out-of-control. based on SVDD.

FIGURE 2. Comparison of Two Kind of Boundaries: Hypersphere (Left) and Support Vectors (Right). Figure is from Sun
and Tsung (2003). Reprinted with permission.

Journal of Quality Technology Vol. 48, No. 1, January 2016


STATISTICAL LEARNING METHODS APPLIED TO PROCESS MONITORING 11

The kernel distance-based control chart, k-chart, Note that all the methods summarized in Table 1
proposed by Sun and Tsung (2003) and updated by are reported to be robust to non-normality and can
Ning and Tsung (2013) is designed for non-normal be applied in Phase II. Although it has been sug-
process data. The distance between an observation gested that some of these methods apply to Phase I,
and the kernel center is the monitoring statistic used it is not clear to us how well these methods will work
in the k-chart. The boundary for the data repre- in the Phase I context. The number of observations
sents the control limit which distinguishes in-control required to establish an in-control reference sample is
observations from potential out-of-control observa- unclear, and little advice is given as to how to obtain
tions. Note that the right-hand boundary in Figure this reference sample. The performance of several of
2 is not a regular boundary and cannot be expressed OCC control charts, including both the k-chart and
in explicit mathematical terms. Instead the bound- the K 2 chart, has been compared by Tuerhong and
ary is determined by the support vectors (observa- Kim (2014) using the average run length (ARL) met-
tions nearest to boundary), which can be obtained ric, and their results show that increasing the number
through solving a quadratic optimization problem of observations improves the performance of the all
(Sun and Tsung 2003, see Equations 22–23). A prac- of the charts compared.
titioner can adjust the boundary limits by the win-
In quality control applications, it is generally im-
dow width of the Guassian kernel function and a
constraint in the optimization. Or in other words, portant to maintain the time ordering of the process
how tightly the boundary fits the sample data. Opti- observations. Because many clustering/OCC meth-
mal determination of this boundary for use as a con- ods do not preserve the time order of the data, it
trol limit is a topic open for future research and is may be difficult to interpret signals to potential out-
discussed in Ning and Tsung (2013). Control charts of-control events. There are several applications of
based on SVDD have the advantage of only depend- clustering methods that attempt to preserve the se-
ing on the support vectors; therefore, they are ap- quential nature of process observations. For exam-
plicable to large amounts of process data and vari- ple, Sullivan (2002) used a clustering method to de-
ables. For an industrial application of the k-chart, tect multiple change points in a univariate process
the reader is referred to Gani et al. (2011). and Ghazanfari et al. (2008) introduced a cluster-
ing approach to identify a step change in a Shewhart
Several di↵erent modifications to the k-chart have control chart. Zhang et al. (2010) introduced a time-
been proposed in the literature. For example, Ku- based univariate clustering method for determining
mar et al. (2006) and Camci et al. (2008) have pro- an in-control baseline from a historical data stream.
posed modifications to the k-chart by using robust Zhang et al. (2010) used the idea of subsequence clus-
SVM (RSVM) to establish the minimum volume tering by clustering the relative frequency distribu-
and improve upon the sensitivity of SVM to out- tion of a moving sequence of observations. We should
liers present in the reference sample. Liu and Wang note that the use of subsequence clustering is contro-
(2014) have proposed an adaptive-kernel-based (AK) versial within the machine learning literature; thus,
control chart to improve the sensitivity of the mul- we recommend that practitioners consult Keogh and
tivariate chart to small process shifts. In addition Lin (2005) prior to implementing this approach.
to SVDD, other OCC control charts include the K 2
Clustering methods that aim to preserve the time
chart, based on the-nearest neighbors data descrip-
ordering of the data have also been applied to the
tion (kNNDD) algorithm (Sukchotrat et al. 2009,
problem of multivariate outlier detection in SPC.
Kang and Kim 2011). A summary of the di↵erent
For example, Jobe and Pokojovy (2009) introduced a
variants of the OCC charts is presented in Table 1.
computer intensive multi-step clustering method for
For each chart, we include an original citation intro-
retrospective outlier detection in multivariate pro-
ducing or studying the methods, the motivation to
cesses. Jobe and Pokojovy (2009) compared their
introduce the new method, and the type of classifi-
method with the retrospective use of the T 2 chart
cation method used. We also include the basis for the with robust estimators for the covariance matrix and
control limit, and where applicable, we list the type showed that their method was equal to or better than
and if a kernel method is employed, provide informa- the robust T 2 approaches in most situations consid-
tion as to whether the method was developed based ered.
on independent and identically distributed (i.i.d.) ob-
servations, and how the method handles misclassifi- Clustering methods have also been suggested for
cation errors. use in the Phase I analysis of profiles. Chen et al.

Vol. 48, No. 1, January 2016 www.asq.org


12 MARIA WEESE ET AL.

TABLE 1. A Summary of the Di↵erent Variants of OCC Charts

Chart Control Kernel Assumes Misclassification


name Motivation Method limit method i.i.d. error

k-chart Eliminate SVDD Kernel Gaussian Yes Gives information on


(Sun and underlying radius radial-based changes in error rates
Tsung (2003)) distributional function with di↵erent number
assumptions for (RBF) of support vectors
Multivariate SPC

Robust Reduce the RSVM Kernel Compared Yes Gives information on


k-chart sensitivity to radius 4 methods changes in error rates
(Kumar outliers in the and showed with di↵erent number
et al. (2006)) reference data and Gaussian of support vectors
to reduce potential RBF
over-fitting issues performed
that can arise with best
the k-chart

rk-chart Eliminate SVDD and Kernel Gaussian Yes Employed an


(Camci underlying support radius RBF iterative procedure
et al. (2008)) distributional vector based on the data
assumptions, representation number of support
requires only and vectors to balance
in-control data, discrimination type I/type II
o↵ers methods for machine errors
selecting limits (SVRDM)
based on type I
and type II errors

kNNDD/ Computationally kNNDD Bootstrap None Shown to Employed a


kNN/K 2 more efficient percentile have better bootstrap procedure
chart than k-chart procedure performance based on process
(Sukchotrat methods than a T 2 data to select a
et al. (2009)) chart when control limit with
data are non- a specified
i.i.d. (Kim misclassification
et al. (2010)) rate

K-means More quickly KMDD Specified None Yes Used an iterative


chart detects small distance process to determine
(Kang and shifts in the from the control limit with
Kim (2011)) mean vector individual specified
than the k-chart cluster misclassification
center rate for di↵ering
number of clusters

(continued on next page)

Journal of Quality Technology Vol. 48, No. 1, January 2016


STATISTICAL LEARNING METHODS APPLIED TO PROCESS MONITORING 13

TABLE 1. (Continued)

Chart Control Kernel Assumes Misclassification


Name Motivation Method Limit Method i.i.d. Error

AK-chart More quickly SVDD Genetic Gaussian Yes Employed a genetic


(Liu and detects small algorithm RBF algorithm based on
Wang (2014)) shifts in the to establish process data and
mean vector action and number of support
than the waning regions vectors to determine
k-chart based on a control limit with
variable a specified
sampling misclassification
intervals rate

(2015) suggested replacing the process observations the observed data is too big to be stored/analyzed
with an estimated profile that is determined using a in memory. This requirement is somewhat limiting
regression method. These profiles are then clustered with standard desktop computers; however, such ap-
and the cluster containing more than half of the data proaches can be implemented with cloud computing.
is identified as the set of in-control profiles. Chen et Therefore, there are opportunities for exploring clus-
al. (2014) also consider the use of cluster analysis tering algorithms that do not store the cluster’s en-
with nonparametric profiles. tire data in memory and evaluate their performance
in SPC applications. See Zhang et al. (1996), Bradley
The applications of clustering and one-class clas- et al. (1998), and Guha et al. (1998) for three highly-
sification methods in quality control are diverse (see cited examples of clustering algorithms that are suit-
Ge et al. (2011), Grasso et al. (2015), Gani and Li- able for big data sets.
man (2013)), and there remain many opportunities
for research in this area, particularly in Phase I appli- 4. Supervised Learning Methods
cations. For example, a noteworthy feature of many
big data sets is the variety of the data, which often In this section, we discuss the use of supervised
contain a mix of continuous, discrete, and possibly learning methods in SPC. First, we discuss the prob-
categorical variables. Future research should investi- lem of using computational methods based on su-
gate the use of clustering, classification, and mixture pervised learning to detect abnormal patterns on
modeling approaches for the Phase I analysis of data control charts. While the majority of the literature
with multiple data types. We note that there may in this area pertains to univariate data (which is
be some limitations to these approaches, such as re- not big data, per se), we briefly discuss this work
quirements of very large sample sizes or the failure to because the bulk of the application of DT, ANN,
preserve time ordering. There are also opportunities and SVM models in SPC has been in the control
to consider the application of time series clustering chart pattern-recognition (CCPR) literature. After
methods to the analysis of both univariate and multi- discussing CCPR methods, we discuss control charts
variate data that occur in streams. Because preserv- based on single supervised learning methods. Finally,
ing the time order of process data is often critical, we highlight the few papers that investigated using
these methods may show promise in both Phase I ensemble methods in SPC. As in Section 3, we o↵er
and Phase II applications. advice to practitioners and highlight areas for future
work whenever possible.
It should also be noted that most methods dis-
cussed in this section assumed that the data within 4.1. Control Chart Pattern Recognition
the clusters or classes can be stored in memory. This
may not be feasible in big data applications. Rajara- CCPR has its origins in the early days of SPC,
man et al. (2014, Chapter 12.3) provide an excellent starting with the Western Electric runs rules in 1956.
introduction to SVMs and how to develop a paral- Champ and Woodall (1987) evaluated control charts
lel implementation schema that is necessary when with supplemental runs rules and showed that these

Vol. 48, No. 1, January 2016 www.asq.org


14 MARIA WEESE ET AL.

charts can have a high incidence of false alarms. Like types and number of patterns studied, whether real
the classical approach to CCPR, the recent research data or simulated data were evaluated, and the per-
on this topic is dedicated to identifying and classi- formance measures used. Interestingly, the major-
fying out-of-control patterns such as trends, cycli- ity of the papers they reviewed (61.47%) used an
cal patterns, and specific types of process shifts. In ANN approach to pattern recognition. Their study
the modern CCPR literature, a supervised learner is revealed that only nine authors have published nearly
trained to recognize specific types of process changes. half of the 122 CCPR papers reviewed. Addition-
The input into the statistical learning methods may ally, only 16 out of the 122 papers reviewed consid-
be raw variables or linear or nonlinear combinations ered multivariate processes, and only 5 out of the
of the raw variables (i.e., features). ANN or SVM 122 papers evaluated involved applying the proposed
learners are used most often due their strong predic- method on real-process data (Hachicha and Ghorbel
tive ability. However, ANN and SVM learners can be (2012), p. 210–213).
difficult to interpret; thus, DT methods have been
suggested to provide the user with more interpretive Woodall and Montgomery (2014) stated that “De-
models pertaining to process changes. spite the large number of papers on this topic [neural
network control charts including those for CCPR] we
Many of the CCPR methods begin by simulating have not seen much practical impact on SPC”. We
a reference data set containing in-control and out- believe that this lack of impact on the practice of
of-control data. The in- and out-of-control data are SPC is due to several reasons. In our review of the
labeled as such, and this label serves as the target or CCPR literature, we did not find any references or
outcome variable for a statistical learner. A statisti- discussion addressing the baseline operation of a pro-
cal learner (e.g., ANN) is trained to distinguish the cess or a mention of a Phase I analysis. Little advice
in-control from the out-of-control data. The model is given as to how to apply the methods in prac-
that is developed is then applied to future observa- tice, including how to establish an in-control base-
tions, and these observations are either classified as line sample, how large the reference sample should
in or out of control. be for the method to work e↵ectively, and how to
distinguish among the many choices of ANN archi-
There are many simple and complex variations tectures. These gaps make it difficult to apply the
of the CCPR approaches to process monitoring. Al- CCPR methods in practice and provide ample op-
though much of the literature in this area has typ- portunities for future research in the practical appli-
ically considered low-dimensional data (e.g., Zorri- cation of CCPR methods. More work is needed to
assatine et al. (2003), Cheng and Cheng (2011)), determine if these methods are truly beneficial for
several authors focused on higher-dimensional data monitoring high-dimensional process data. Consid-
(e.g., Deng et al. (2012), Dávila et al. (2011)). Deng eration needs to be given to the robustness of these
et al. (2012) provided an overview of the major devel- methods to the baseline training sample, including
opments in this area as it relates to statistical moni- the baseline sample size. Guidelines need to be devel-
toring. Deng et al. (2012) also noted that a limitation oped for practitioners as to how to select among the
in these methods is that, most of the time, the classi- many types of CCPR methods. Further, we recom-
fiers are trained once based on a single artificial data mend that researchers study the ability of the CCPR
set. They recommend a dynamic approach where the charts to detect changes other than those for which
classifier is retrained with each new observation, and the learners were specifically trained.
a statistic such as the classification error rate or class
probability is monitored.
4.2. Regression-Based Methods
Zorriassatine and Tannock (1998) and Psarakis
(2011) reviewed the literature on the use of neu- One approach to reducing the dimensions of a data
ral networks with control charts, and many of these set is through the construction of an outcome vari-
papers focus on the CCPR problem. Hachicha and able (or a smaller set of new features) that “summa-
Ghorbel (2012) provided a comprehensive review rizes” the data contained within the original p-variate
and analysis of the CCPR literature from 1991 to vector. In addition to the unsupervised dimension re-
2010 and highlighted several open research questions duction methods discussed earlier, supervised learn-
within this field. Hachicha and Ghorbel (2012) classi- ing methods can be used to achieve a similar goal. In
fied 122 CCPR papers according to a detailed schema the SPC literature, there are two main streams for
that includes, e.g., the data model assumptions, the dimension reduction using regression methods: pro-

Journal of Quality Technology Vol. 48, No. 1, January 2016


STATISTICAL LEARNING METHODS APPLIED TO PROCESS MONITORING 15

file monitoring and risk-adjusted control charts for are designed to detect a doubling and a halving of
monitoring healthcare outcomes. the odds of deaths. There are several other con-
trol charts used for this problem. For detailed re-
4.2.1. Profile Monitoring views, the reader is referred to Grigg and Farewell
(2004), Woodall (2006), and Steiner (2014). In addi-
Profile monitoring is used to describe monitoring
tion, Steiner (2014) and Fogel et al. (2015) provide
applications when the quality of a process/product
excellent discussions on future research needs in this
is characterized by a relationship between a response
area.
variable and one or more explanatory variables. At
each time point, the observed data can be explained 4.3. Neural Networks
by fitting a profile. This can be achieved through
simple linear, nonlinear, or nonparametric methods. Most applications of neural networks in control-
Also, wavelets may be used if the data is projected charting research are in the area of CCPR. In this
to a frequency domain. In such situations, instead of subsection, we introduce a similar application where
monitoring and maintaining the entire set of obser- neural network models are used in the simultaneous
vations, it is sufficient to maintain/monitor the pa- detection and diagnosis of process faults. The main
rameters of the fitted model. Thus, the number of di- motivation behind these methods lies in attempting
mensions is reduced significantly. Woodall and Mont- to bridge between SPC research (where the focus has
gomery (2014), Woodall et al. (2004), and Woodall been on detecting an out-of-control condition) and
(2007) provided detailed reviews on this topic, with engineering practitioners (where fault detection rep-
explanations of several applications for using pro- resents the first aspect of process monitoring). Chi-
file monitoring. Some applications in profile moni- ang et al. (2001) identified four di↵erent stages of
toring include high-dimensional 2D images (Wang (engineering) process monitoring. The first step in-
and Tsung (2005)) and 3D surface scans (Wells et al. volves fault detection, where a statistical approach
(2013)). Recently, Dai et al. (2014) have proposed a determines whether a fault, an out-of-control con-
method for monitoring profile trajectories based on dition, has occurred. The next step, fault identifi-
a dynamic time warping alignment for monitoring cation, involves identifying the subset of input vari-
ingot growth in semi-conductor manufacturing. We ables/features that are most relevant to diagnosing
believe that the concept of profile trajectories can the fault. This is followed by fault diagnosis, where
be extended to applications involving cyber security the root cause of the observed fault is identified. The
and credit card fraud, among other business trans- final stage involves process recovery, where the fault
actions where an intervention may be needed prior is fixed and the process is returned to its in-control
to obtaining the full profile/signal. For a more de- condition. The methods described within this subsec-
tailed discussion on the statistical analyses of profile tion attempt to assist practitioners with the identifi-
monitoring, we refer the reader to Noorossana et al. cation and diagnosis aspects because quick detection
(2011), who provide a detailed overview and a dis- without identification and diagnosis is not informa-
cussion of research needs. tive, especially because, in many applications, it is
assumed that the process is stopped once a control
4.2.2. Risk-Adjusted Control Charts chart signals until the underlying issue is identified
and the process is recovered (Montgomery (2013)).
Risk-adjusted control charts have been recom-
mended for monitoring post-treatment outcomes in Due to their predictive properties, ANN models
healthcare (Steiner et al. (2000)). Unlike many in- typically assist in both the fault detection/identifica-
dustrial processes, the monitoring of post-treatment tion stages of process monitoring. Venkatasubrama-
outcomes provides the additional challenge that pa- nian et al. (2003) discussed the application of ANN
tients are not homogeneous and can have di↵erent models in a review of what they call “process his-
risks prior to treatment of the underlying health con- tory based methods” for simultaneously detecting
ditions. For this reason, statistical methods used to and identifying a process problem. Because it is not
monitor post-treatment outcomes involve some sort our objective to repeat the references they have re-
of risk adjustment. Steiner et al. (2000) recommend viewed, we will only highlight two of their key obser-
using logistic regression to compute the odds of death vations: (1) ANN models trained on historical pro-
for an individual patient based on a score that con- cess data are “limited in the sense of generalization”
sidered the patient’s preoperative health. They fur- due to the fact they are trained on a sample of data
ther suggested using two CUSUM control charts that to recognize certain process changes and (2) there

Vol. 48, No. 1, January 2016 www.asq.org


16 MARIA WEESE ET AL.

have been very few published papers that consider application of SVM is given by Chin et. al. (2010),
the application of ANN models to real industrial pro- where SVM is integrated with independent compo-
cesses. It is important to note that the first observa- nent analysis (ICA) to improve fault detection in au-
tion holds true for all supervised learning methods tocorrelated processes. Cheng et al. (2011) used SVM
and not just neural networks. As for the second ob- and ANN to estimate the magnitude of the shift in
servation, we assert that the lack of guidance for a the process mean as detected by a CUSUM chart.
practitioner on how to apply an ANN even on the
most basic level, like how to choose a baseline sam- In addition to process monitoring and fault detec-
ple, is likely the reason that these methods have not tion, SVM has been applied to fault identification,
been widely used. In our experience, an understand- that is, identifying the variable or groups of process
ing of the true root cause of a fault is often difficult. variables that have changed, either in mean or co-
We attribute this to the complexity of most manu- variance structure, leading to an out-of-control situ-
facturing processes and a lack of understanding of ation. Moguerza et al. (2007) used SVMs for profile
mechanisms for fault propagation. monitoring. Mahadevan and Shah (2009) suggested
using SVM and ANN as an alternative to the T 2
Although most of the methods using ANN for and SPE charts for monitoring and using a resid-
fault detection and identification consider continu- uals plot for fault identification. The authors use
ous data that is not correlated over time, there have a one-class SVM plot for fault detection and SVM
been a few papers that considered autocorrelated recursive feature elimination for fault identification.
processes and attribute data. For example, Chiu et They applied this technique to two case studies using
al. (2003) used an ANN to detect shifts in an auto- real process data and show its superiority over con-
correlated process and compare its ability to iden- ventional methods. Cheng and Cheng (2008) com-
tify which observations cause the shift with that of pared SVM and ANN for fault identification, where
a cumulative sum (CUSUM) and an X chart. Ni- the fault is a shift in the process covariance, and
aki and Abbasi (2008) propose using ANN to detect found that SVM and ANN methods performed sim-
and classify mean shifts in multi-attribute processes. ilarly. They recommended SVM because it requires
They varied the counts as well as the proportions fewer tuning parameters when compared with ANN.
in di↵erent attributes and showed that the ANN is Chiang et. al. (2004) compared SVM witho Fisher
able to detect the shift quicker than a multi-attribute discriminant analysis using data from the Tennessee
np-chart (M np) while simultaneously identifying the Eastman Simulator for fault identification. In this
cause of the shift. It is surprising that there is not work, a genetic algorithm was used to select key vari-
more work that takes advantage of ANN’s ability to ables prior to using the fault identification methods.
use either continuous or categorical data, as we did It is also assumed that there is prior knowledge re-
not find any mixed data applications in our search. garding which variables are at fault and the type of
process behavior that was exhibited from this fault
4.4. Support Vector Methods is known.
Although we discussed the unsupervised use of
Although they can be computationally burden-
support vectors in Section 3, there are some appli-
some, both supervised and unsupervised applications
cations of supervised SVM approaches to process
of SVM have potential in the future of SPC methods
monitoring and fault detection that warrant men-
applied to big data. In particular, we see that SVM
tion. Recently, Zhang et al. (2015) recommended a
may hold promise in studying process data comprised
robust framework for detecting location shifts in a
of a variety of data types because SVM is capable of
multivariate process using an SVM combined with a
handling both categorical and continuous data simul-
multivariate exponentially weighted moving average
taneously. Interestingly, almost all the uses of SVM
(MEWMA) control chart. SVM has also been ap-
applications we found in SPC considered continuous
plied to batch process monitoring (Yao et al. (2014))
data.
and in the use of support vector regression (SVR) as
a precursor to residual based multivariate cumula- 4.5. Ensemble Methods
tive sum (MCUSUM) charts for monitoring autocor-
related data (Issam and Mohamed (2008)). Chong- Ensemble methods can be considered as a com-
fuangprinya et al. (2011) recommended monitoring posite classification model, made up of di↵erent clas-
the predicted probability of class membership from sifiers. The individual classifiers vote, and a class la-
an SVM using bootstrap control limits. An earlier bel prediction is returned by the ensemble method.

Journal of Quality Technology Vol. 48, No. 1, January 2016


STATISTICAL LEARNING METHODS APPLIED TO PROCESS MONITORING 17

Ensemble methods are typically more accurate than of ensemble methods in public health surveillance by
their component classifiers (Han and Kamber (2011), Dávila et al. (2014) illustrated the use of an ensem-
p. 377). The applications of ensemble methods in pro- ble of decision trees to monitor counts (or rates) of
cess monitoring are similar to the di↵erent supervised a disease. This greatly improves on current methods
methods. of public health surveillance, which typically involve
only low-dimensional data and cannot take into ac-
Du and Xi (2011) developed an interesting ap-
count external data such as demographic informa-
proach for fault diagnosis in assembly systems
tion.
that combine multivariate control charts, engineer-
ing knowledge, and ensemble methods. In a similar
5. Example
method, Alfaro et al. (2009) used a T 2 control chart
to detect an out-of-control signal and applied boosted Although the methods we discussed above rely on
DT models as an alternative to a single ANN to iden- statistical learning methods, many of these methods
tify which of the variables caused the signal. Jianbo have not been applied (at least not in the literature)
et al. (2009) used an ensemble of ANN models re- to large, high-dimensional data sets. The purpose of
ferred to as discrete partial swarm optimization (DP- this example is to illustrate some of the complexities
SOEN), which improves on the use of a single ANN associated with monitoring big data. It is impossible
model. Similar work was done with an ensemble of to find a scenario that concisely presents all, or even
SVM classifiers by Cheng and Lee (2012). Yu and several, of the methods we describe above. Our intent
Xi (2009) also use the DPSOEN algorithm applied is to simply give an example of one potential moni-
to linear combinations of the data to simultaneously toring scenario and the strengths and limitations of
monitor and identify a fault in a multivariate process. applying one type of method we describe above.
In combined applications of fault detection/identi- One way to understand public interest that is
fication/diagnosis, Li et al. (2006) used random generated by the popular press is to consider mon-
forests (see Breiman (2001a)) to find the change itoring social media (e.g., Twitter, Facebook, etc.)
point and identify the at fault variables in a high- and/or data from web searches (e.g., Google, Ya-
dimensional multivariate process and showed that hoo, Wikipedia). In our example, we consider re-
this supervised learning method outperforms a mul- cently generated data from Wikipedia searches re-
tivariate exponentially weighted moving average con- lated to the National Football League (NFL). In par-
trol chart. Not only does this method show promise ticular, we developed a dictionary of the NFL team
in the realm of big data, but it forgoes the usual names, coaches, managers, and all currently active
distributional assumptions that can be troublesome players as of 09/15/2014. We downloaded the number
with multivariate SPC methods. It should be noted of Wikipedia searches per hour for all terms in our
that Li et al. (2006) considered the time order of dictionary between 09/01/2014 00:00 UTC (Coordi-
the data, whereas many learning methods applied to nated Universal Time) and 09/15/2014 17:00 UTC.
process monitoring do not preserve the time order- We specifically chose this data because (1) it repre-
ing of the process data. Hwang et al. (2007) applied sents modern data streams that would be considered
random forests and regularized least squares to iden- big data by many and (2) the data are counts (not
tify a multivariate control region. While a control multivariate normal), contain many zero values, have
region as defined by a multivariate T 2 chart has a a nested correlation structure, and contain evidence
set false alarm probability under multivariate nor- of some high-profile events that spurred intense pub-
mality, their work aimed to define a region with a lic interest.
set false alarm probability but without the burden
The data used for our example were gathered from
of a distributional assumption. Recently, Hwang and
http://dumps.wikimedia.org/other/pagecounts-raw/,
Lee (2015) proposed a new approach using random
which contains the hourly number of hits on all
forests with artificially shifted data to improve clas-
Wikipedia pages. Every hour contains a compressed
sification of failures in situations when failures occur
file of approximately 100 MB and a week of data
rarely.
holds over 16 GB. The data contain all traffic for the
The application of ensemble methods in SPC time period on over two million Wikipedia pages. To
seems to show the most promise for the challenges keep to our hypothetical example (and reduce the
in monitoring big data with di↵ering variable types computational burden), we consider Wikipedia hits
and large dimensions. In fact, a recent application on only those pages listed in our NFL dictionary,

Vol. 48, No. 1, January 2016 www.asq.org


18 MARIA WEESE ET AL.

FIGURE 3. Wikipedia Traffic on “New England Patriots” from 9/1/2014 to 9/5/2014 to Illustrate Cyclic Behavior of the
Wikipedia Traffic Flow.

which reduces our dimension to p = 1,916 pages, in- lance exercise, where the process cannot be stopped
cluding all active players, coaches, teams, and man- and recalibrated once a signal is observed. Shmueli
agers. Our data set considers a two week period be- and Burkom (2010) discussed many of the statis-
ginning on 9/1/2014 and was chosen to include the tical challenges in bio-surveillance. In this applica-
first two weeks of the 2014 season. The first week tion, the observed counts are cyclical, with the num-
is used to establish a baseline for monitoring and ber of Wikipedia search hits declining late at night
the second week constitutes our monitored observa- and peaking at specific times, especially on game
tions. In this example, a signal to a potential out-of- days during the season (see Figure 3). Further, the
control event is defined as an unusually high number data observed on each of the p = 1,916 pages are
of Wikipedia hits on a particular team, coach, man- zero-inflated counts that are autocorrelated and also
ager, or player. cross-correlated due to the natural nesting structure
of players and coaches within teams. These correla-
There are a number of interesting challenges in tions depend on the performance of the players, play-
this monitoring problem. The scenario is a surveil- ing time, injuries, etc. (see, e.g., Figure 4). Further

FIGURE 4. Overlay of Wikipedia Traffic of Tom Brady and the New England Patriots over the Selected Time Period
Illustrating the Highly Correlated and Nested Structure of the Wikipedia Data.

Journal of Quality Technology Vol. 48, No. 1, January 2016


STATISTICAL LEARNING METHODS APPLIED TO PROCESS MONITORING 19

FIGURE 5. Wikipedia Traffic on Green Bay Packers and Seattle Seahawks over the Selected Time Period Illustrating the
Correlation Between Teams in the NFL.

correlations exist between teams, especially those Peterson’s Wikipedia page on September 12, 2014.
paired as opponents during a game (see, e.g., Fig- This is not surprising, and the “signal” is apparent
ure 5). In our example, the number of variables (team without the use of statistical limits. This approach
names, coaches, managers, and players) is larger than to the retrospective identification of a signal assumes
the number of observations (hourly hits). All of these that one would know to monitor Adrian Peterson to
data characteristics are expected for this type of in- begin with, and uses only this single data stream. In
ternet traffic data, but constitute a challenge in the other words, hindsight is 20/20, or it is relatively easy
application of statistical monitoring. to pinpoint an event that we know has already oc-
curred. More interesting, however, is monitoring the
We readily admit that certain events that “go vi-
entire set of p = 1,916 variables to determine if there
ral” may not need a statistical method to detect a
is a change in Wikipedia interest in the larger set of
process anomaly. For example, consider the player,
NFL teams, coaches, managers, and players.
Adrian Peterson, who was indicted on child abuse
charges on September 12, 2014. Figure 6 shows a The first step in defining a monitoring scheme for
dramatic increase in the number of hits on Adrian this data set is to define an appropriate method. To

FIGURE 6. Illustrating the Wikipedia Traffic for Adrian Peterson over the Selected Time Period and the Spike on 9/12/2014
Relating to the Alleged Child Abuse.

Vol. 48, No. 1, January 2016 www.asq.org


20 MARIA WEESE ET AL.

FIGURE 7. Phase I and II K 2 Charts of the NFL Wikipedia Data over the Two-Week Time Period.

do so, we use the process of elimination to elimi- cyclic autocorrelation in this data, we borrowed from
nate those approaches that are not applicable to our the bio-surveillance literature and used the residu-
scenario. Through our literature search, we found als from Holt-Winters model lagged by 24 hours on
no methods that applied to high-dimensional zero- each player with seasonal and trend component (see
inflated counts that are both auto-correlated and Shmueli and Fienberg (2006), Burkom et al. (2007)).
cross-correlated with a natural nesting structure. We then analyzed the multivariate data set contain-
This problem does not have a target or outcome vari- ing the p = 1,916 sets of residuals using the K 2
able, so it falls naturally within the realm of unsuper- chart. The first 168 observations taken during the
vised methods. Thus, we focus our attention on the first week of the season were used to establish the
methods discussed in Section 3. Although the num- baseline Phase I sample. The remaining 162 observa-
ber of variables (p = 1,916) is much larger than the tions (note the residuals for the lagged 24 observa-
number of available hourly Wikipedia hits (n = 354) tions were not used) taken during the second week
we did not use a dimension reduction technique. of the season are used for Phase II monitoring. Fig-
Many dimension eduction methods are unstable in ure 7 shows the Phase I and Phase II K 2 charts.
high-dimensional scenarios, and the nested structure All the K 2 chart computations were done on Mat-
of this data may result in components or features lab using the Protools package (Duin et al. (2007)).
that are not meaningful. One could make an argu- When applying the K 2 chart, the choice of k, the
ment to monitor features of the data that take into number of nearest neighbors, determines the plotted
account the nested structure of the data (i.e., dif- statistic, which is the mean of the squared distance
ferent divisions, teams, players, etc.), but methods between each observation and each of the k nearest
for feature extraction in nested data have not been neighbors in a reference sample. The control limit
directly applied to statistical monitoring. Our pur- for the K 2 chart is determined using the percentile
pose here is to illustrate the application of existing bootstrap method, where the mean of the squared
methods to high-dimensional data. Thus, we consider distances for each observation in the reference sample
unsupervised methods that do not require distribu- are bootstrapped and the 1 ↵ quantile of the boot-
tional assumptions. strapped distribution gives the control limit. Breunig
et al. (2000) recommended a range of k between 10
Although we recognize several limitations to this
and 50. In this example, the choice of k made lit-
approach, we considered the K 2 (kNN) chart, an
tle di↵erence because there were a number of hours
OCC control chart, which requires less computa-
in the reference sample with no Wikipedia hits, and
tional cost than the k-chart (see Sukchotrat et al.
these observations formed the k nearest neighbors;
(2009)), is more robust to the i.i.d assumption re-
thus, we selected k = 20 and ↵ = 0.01.
quirement (Kim et al. (2010)), and is more suit-
able for this application. Because we have obvious As stated earlier, the literature we reviewed gives

Journal of Quality Technology Vol. 48, No. 1, January 2016


STATISTICAL LEARNING METHODS APPLIED TO PROCESS MONITORING 21

TABLE 2. Summary of Signals for Phase I and Phase II Application of K 2 Chart to


Wikipedia NFL Players’, Coaches’, and Team’s Page Hits

Date and time of signal Possible assignable causes

Phase I 09/04/2014 20:00:00 UTC Packers vs. Seahawks game. Aaron Rodgers had a poor performance
and Russell Wilson had a particularly good game.
09/07/2014 17:00:00 UTC Sunday Football games

Phase II 09/09/2014 19:00:00 UTC LeSean McCoy was called out by a restaurant owner for leaving a $0.20
tip on $61.56 meal. This incident was highly publicized (see link 1,
link 2 and link 3 for examples of media coverage)

ESPN’s E:60 aired an episode on Marquise Goodwin and his sister


Deja, born with cerebral palsy

no guidance on proper Phase I analysis for process Phase II limits. Figure 7 also shows the Phase II ap-
data such as this. Because of the similarities with plication of the K 2 chart, in which the k = 20 nearest
bio-surveillance, we consulted the recent review by neighbors established in Phase I are used to deter-
Shmueli and Burkom (2010), who noted that Phase mine the status of future observations. In Figure 7,
I implementation is extremely challenging in scenar- we see a signal, and Table 2 gives two possible expla-
ios such as these due to the lack of sufficient Phase nations for this signal.
I data. In the circumstances of this example, we be-
While not a perfect analysis, the use of the K 2
lieve it would be extremely difficult (if not impossi-
chart in this example provides an example of the
ble) to conduct a proper Phase I analysis; however,
need for more research on data driven (as opposed to
we did our best to comply with basic principles of
model-based) control charts (see Breiman (2001b) for
Phase I. As such, we assume the process should be
an interesting discussion of model-based versus data-
operating in a typical fashion, free from anomalies or
driven statistical models). The use of unsupervised
unusual sources of variability. If anomalies or unusual
learning methods may provide valuable information
sources of variability are present, these are removed
for very large and broadly defined “processes” in-
only if an assignable cause can be identified. In the
volving organizations such as the NFL, separating
Phase I chart in Figure 6, we notice two signals of
common cause variability in public or media inter-
potential out-of-control events. Further investigation
est from special cause events. Charts such as these
into these time periods reveals possible assignable
may provide insight as to what constitutes an un-
causes (see Table 2). We did not remove these sig-
usual event in a process involving high-dimensional,
nals from our analysis for two reasons: (1) we are
correlated data streams. There are many open re-
not certain that these signals are anomalous to the
search questions with the OCC control charts, and
process, thus, we chose to leave them in the sample;
their performance has not been well studied. This
and (2) Sukchotrat et al. (2009) did not discuss an
example is not intended to encompass all of the chal-
iterative approach to the K 2 chart, where assignable
lenges present in big data monitoring but serves as
causes are removed and the limits recalculated. Be-
one example of a few of the complexities of this type
cause the K 2 chart uses only a small subset of the
of data.
Phase I data as nearest neighbors for computation
of the chart statistic, the removal of out-of-control
observations has no e↵ect on the chart statistics in
6. Concluding Remarks
our example. However, removal of the out-of-control We have given an overview of the main research
events would change the sample on which the boot- streams that apply statistical learning methods to
strap limits are based. We explored removing these statistical process monitoring. Although many of
observations and found only minimal changes in the these methods have not been directly studied using
control limits; thus, we elected to leave these obser- data sets that would be considered big data, some
vations in the reference sample for calculation of the of these methods may be scalable to such problems.

Vol. 48, No. 1, January 2016 www.asq.org


22 MARIA WEESE ET AL.

Our view is that there is a significant need for sta- identification” or “fault diagnosis”. For example, Li
tistical monitoring of data streams and big data for et al. (2008) proposed a method to decompose the
detection of process changes as well as identifying the Hotelling’s T 2 statistic based on a Bayesian net-
root cause for these changes. work. Du et al. (2012) recommended classifying mean
shifts from multivariate charts using a multi-class
The ability to gather observations instanta- SVM. Kim et al. (2011) proposed a decomposition
neously, often in real time, results in observations of Hotelling’s T 2 chart using a one-class classifica-
possessing a high degree of autocorrelation. The ef- tion algorithm.
fect of autocorrelation on traditional control charts
has been studied by, e.g., Maragah and Woodall In addition to using analysis based only on the
(1990), Alwan (1992), and Psarakis and Papaleonida monitored variables, it may be necessary to incor-
(2007). Several authors have recommended statisti- porate data from other sources in order to prop-
cal learning methods for monitoring autocorrelated erly identify the root cause of a process signal
data, including Arkat et al. (2007), Isaam and Mo- (e.g., Hyniewiz (2015)). For example, the potential
hamad (2008), and Kim et al. (2012). Most of these assignable causes shown in Table 2 have been de-
methods apply a statistical learning method to the termined from analyzing news articles around these
residuals of a time series model. There is not much time periods. If such articles had not been found,
literature on how to monitor autocorrelated data text analytics of the play-by-play descriptions on
when the time series model is unknown; thus, we NFL.com (which can be easily done in the Python
recommend more research in this area to further the Programming Language with the package nflgame)
development of monitoring and surveillance in high- can assist in understanding if the signal is related to
velocity data streams. a player’s performance in an NFL game. However,
Additionally, in traditional SPC applications, if both media sources and NFL.com do not indicate
there are two often-made assumptions regarding the that there is an on-the-field or o↵-the-field issue that
role of engineering/process knowledge: (a) much pro- led to a signal, then one might speculate that the sig-
cess knowledge and understanding is needed in tran- nal is caused by a cyber attack on the Wikipedia web-
sitioning from Phase I to Phase II; and (b) the iden- site where web crawlers are increasing the load on the
tification and diagnosis of a process fault is primarily Wikipedia servers. This determination may then be
based on process knowledge informed by the output confirmed by examining the IP addresses for the visi-
of control charting or other monitoring methodolo- tors of the Wikipedia pages that led to a signal. From
gies. As shown in the example, in big data sets, the the above discussion, we see an important opportu-
retrospective analysis done in Phase I remains criti- nity for researchers in both statistical learning and
cal because it allows us to understand the behavior industrial statistics to refine existing methods and
of the process being studied. Unlike in traditional develop new methods to monitor multiple related-
applications, there are often no physical or engineer- data streams (e.g., Twitter, Facebook, Google Search
ing principles that can be used to understand this statistics, and Wikipedia data) in big data applica-
behavior. Accordingly, the goals of Phase I in big tions. The fusion of information from these multiple
data applications include the typical goals of under- data streams may assist in increasing the veracity of
standing process behavior, estimating the in-control the data, an important issue in big data surveillance,
parameter values needed for constructing Phase II as highlighted in Megahed and Jones-Farmer (2015).
methods, as well as an increased emphasis on ex- The benefits of merging data sources from multiple
ploratory data analysis, where statistical graphs and streams (with a discussion of statistical approaches
visual data mining approaches (see, e.g., Smith et for how to do it) is explained in more detail within
al. (2014)) can be used to provide insight into the the context of bio-surveillance by Shmueli and Fien-
behavior of the process. berg ((2006), see Section 4, pp. 123–133).

As for the diagnosis of signals (either in Phase I We also see tremendous opportunity for develop-
or II), it can be informed by the knowledge gained ments regarding how one establishes an in-control
from the existing process data or from the fusion reference sample (Phase I) for multivariate processes
of data from multiple sources. Statistical learning and especially for multivariate processes measured
methods have been suggested for identifying root with mixed variable types. Further, we see the need
causes for process changes, mostly in the field of en- for growth in the application of statistical learn-
gineering process control under the moniker “fault ing methods to high-volume, high-dimensional, and

Journal of Quality Technology Vol. 48, No. 1, January 2016


STATISTICAL LEARNING METHODS APPLIED TO PROCESS MONITORING 23

high-velocity processes. Because most of the appli- Camci, F.; Chinnam, R. B.; and Ellis, R. D. (2008). “Robust
cations considered in the literature pertain to lower- Kernel Distance Multivariate Control Chart Using Support
Vector Principles”. International Journal of Production Re-
dimensional data, the scalability of these monitoring
search 46, pp. 5075–5095.
methods in high dimensions is not fully understood. Capizzi, G. and Masarotto, G. (2011). “A Least Angle Re-
Last, most big process data sets have a complex gression Control Chart for Multidimensional Data”. Techno-
structure, with distinct cyclic patterns of autocor- metrics 53, pp. 285–296.
relation (day-of-week, time-of-day, etc.), are derived Capizzi, G. and Masarotto, G. (2013). “Efficient Control
from multiple streams, and often have some type of a Chart Calibration by Simulated Stochastic Approximation”.
XIth International Workshop on Intelligent Statistical Qual-
hierarchical or nested structure. Monitoring this data
ity Control.
with the existing techniques is challenging, and our Chakraborti, S.; Human, S. W.; and Graham, M. A.
experience suggests that the traditional model-based (2008). “Phase I Statistical Process Control Charts: An
SPC methods are ill suited to big data monitoring. Overview and Some Results”. Quality Engineering 21(1), pp.
The future of statistical monitoring in big data ap- 52–62.
plications is likely to become more data driven as Champ, C. W. and Woodall, W. H. (1987). “Exact Re-
sults for Shewhart Control Charts with Supplementary Runs
opposed to model driven and will rely more heavily
Rules”. Technometrics 29, pp. 393–399.
on statistical and machine learning algorithms. Chen, J. H. and Liu, J. L. (1999). “Mixture Principal Com-
ponent Analysis Models for Process Monitoring”. Industrial
Acknowledgment & Engineering Chemistry Research 38, pp. 1478–1488.
Chen, T.; Morris, J.; and Martin, E. (2006). “Probability
This work was partially supported by an Amazon Density Estimation Via an Infinite Gaussian Mixture Model:
Web Services, Educational Grant to Auburn Univer- Application to Statistical Process Monitoring”. Journal of
sity. The authors would also like to thank Huw Smith the Royal Statistical Society, Series C 55, pp. 699–715.
and Jordan Hu↵, graduate students at Auburn Uni- Chen, Y.; Birch, J. B.; and Woodall, W. H. (2014). “A
Phase I Cluster-Based Method for Analyzing Nonparametric
versity, for their assistance in developing a dictionary
Profiles”. Quality and Reliability Engineering International.
containing all names of NFL players (based on the DOI: 10.1002/qre.1700.
Python nfldb package) and in extracting the relevant Chen, Y.; Birch, J. B.; and Woodall, W. H. (2015).
data from the Wikipedia Compressed (.gz) files. “Cluster-Based Profile Analysis in Phase I”. Journal of
Quality Technology 47, pp. 14–29.
References Cheng, C. and Lee, H. (2012). “Identifying the Out-of-
Control Variables of Multivariate Control Chart Using En-
Alfaro, E.; Gamez, M.; and Garcia, N. (2009). “A Boosting semble SVM Classifiers”. Journal of the Chinese Institute of
Approach for Understanding Out-of-Control Signals in Mul- Industrial Engineers 5, pp. 314–323.
tivariate Control Charts”. Journal of Production Research Cheng, C. S. and Cheng, H. P. (2008). “Identifying the
47, pp. 6821–6831. Source of Variance Shifts in the Multivariate Process Us-
Alwan, L. C. (1992). “E↵ects of Autocorrelation on Control ing Neural Networks and Support Vector Machines”. Expert
Chart Performance”. Communication in Statistics—Theory Systems with Applications 35, pp. 198–206.
and Methods 21, pp. 1025–1049. Cheng, C. S. and Cheng, H. P. (2011). “Using Neural Net-
Arkat, J.; Niaki, S. T. A.; and Abbasi, B. (2007). “Artificial works to Detect the Bivariate Process Variance Shifts Pat-
Neural Networks in Applying MCUSUM Residuals Charts tern”. Computers & Industrial Engineering 60, pp. 269–278.
for AR(1) Processes”. Applied Mathematics and Computa- Cheng, C.; Chen, P.; and Huang, K. (2011). “Estimating
tion 189, pp. 1889–1901. the Shift Size in the Process Mean with Support Vector Re-
Bradley, P. S.; Fayyad, U.; and Reina, C. (1998). “Scaling gression and Neural Networks”. Expert Systems with Appli-
Clustering Algorithms to Large Databases”. Proceedings of cations 38, pp. 10624–10630.
the 4th International Conference on Knowledge Discovery Chiang, L. H.; Braatz, R. D.; and Russell, E. L. (2001).
and Data Mining (KDD’98), pp. 9–15. Fault Detection and Diagnosis in Industrial Systems. Lon-
Breiman, L. (1996). “Bagging Predictors”. Machine Learning don, UK: Springer.
24, pp. 123–140. Chiang, L.; Kotanchek, M. E.; and Kordon, A. K. (2004).
Breiman, L. (2001a). “Random Forests”. Machine Learning “Fault Diagnosis Based on Fisher Discriminant Analysis and
45, pp. 5–32. Support Vector Machines”. Computers and Chemical Engi-
Breiman, L. (2001b). “Statistical Modeling: The Two Cul- neering 28, pp. 1389–1401.
tures”. Statistical Science 16, pp. 199–231. Chin, H.; Chen, C.; and Long-Sheng, C. (2010). “Intelli-
Breunig, M. M.; Kriegel, H. P.; Ng, R. T.; and Sander, gent ICA-SVM Fault Detector for Non-Gaussian Multivari-
J. (2000). “LOF: Identifying Density-Based Local Outliers”. ate Process Monitoring”. Expert Systems with Applications
Proceedings of the ACM SIGMOD 2000 International Con- 37, pp. 3264–3273.
ference on Management of Data 29, pp. 93–104. Chiu, C. C.; Shao, Y. J. E.; Lee, T. S.; and Lee, K.
Burkom, H. S.; Murphy, S.; and Shmueli, G. (2007). “Auto- M. (2003). “Identification of Process Disturbance Using
mated Time Series Forecasting for Biosurveillance”. Statis- Spc/Epc and Neural Networks”. Journal of Intelligent Man-
tics in Medicine 26, pp. 4202–4218. ufacturing 14, pp. 379–388.

Vol. 48, No. 1, January 2016 www.asq.org


24 MARIA WEESE ET AL.

Choi, S. W.; Park, J. H.; and Lee, I. B. (2004). “Process Fogel, S. L.; Steiner, S. H.; and Woodall, W. H. (2015).
Monitoring Using a Gaussian Mixture Model Via Principal “The Monitoring and Improvement of Surgical Outcome
Component Analysis and Discriminant Analysis”. Comput- Quality”. Journal of Quality Technology, to appear.
ers & Chemical Engineering 28, pp. 1377–1387. Fraley, C.; and Raftery, A. E. (2002). “Model-Based Clus-
Chongfuangprinya, P.; Kim, S. B.; Park, S.K.; and Suk- tering, Discriminant Analysis, and Density Estimation”.
chotrat, T. (2011). “Integration of Support Vector Ma- Journal of the American Statistical Association 97, pp. 611–
chines and Control Charts for Multivariate Process Moni- 631.
toring”. Journal of Statistical Computation and Simulation Freund, Y. and Schapire, R. (1995). “A Desicion-Theoretic
81, pp. 1157–1173. Generalization of On-Line Learning and an Application
Choudhardy, A. K.; Harding, J. A.; and Tiwai, M. K. to Boosting”. In Computational Learning Theory, 904, P.
(2009). “Data Mining in Manufacturing: A Review Based Vitányi, P., ed., pp. 23–37. Berlin and Heidelberg, Germany:
on the Kind of Knowledge”. Journal of Intelligent Manufac- Springer.
turing 20, pp. 501–521. Friedman, J. H. (2001). “Greedy Function Approximation:
Colosimo, B. M. and Pacella, M. (2007). “On the Use of A Gradient Boosting Machine”. Annals of Statistics 29, pp.
Principal Component Analysis to Identify Systematic Pat- 1189–1232.
terns in Roundness Profiles”. Quality and Reliability Engi- Gani, W.; Taleb, H.; and Liman, M. (2011). “An Assessment
neering International 23, pp. 707–725. of the Kernel-Distance-Based Multivariate Control Chart
Cristianini, N. and Shawe-Taylor, J. (2000). Support Vec- Through an Industrial Application”. Quality and Reliabil-
tor Machines and Other Kernel-Based Learning Methods. ity Engineering International 27, pp. 391–401.
Cambridge, UK: Cambridge University Press. Gani, W. and Liman, M. (2013). “Performance Evaluation of
Crowder, S. V.; Hawkins, D. M.; Reynolds, M. R.; and One-Class Classification-Based Control Charts Through an
Yashchin, E. (1997). “Process Control and Statistical Infer- Industrial Application”. Quality and Reliability Engineering
ence”. Journal of Quality Technology 29, pp. 134–139. International 29, pp. 841–854.
Dai, C.; Wang, K.; and Jin, R. (2014). “Monitoring Pro- Ge, Z.; Gao, F.; and Song, Z. (2011) “Batch Process Moni-
file Trajectories with Dynamic Time Warping Alignment”. toring Based on Support Vector Data Description Method”.
Quality and Reliability Engineering International, to ap- Journal of Process Control 21, pp. 949–959.
pear. Ghazanfari, M.; Alaeddini, A.; Niaki, S. T. A.; and
Aryanezhad, M.-B. (2008). “A Clustering Approach to
“Data, Data Everywhere”. (2010). The Economist 394, pp. 3–
Identify the Time of a Step Change in Shewhart Control
16. Retrieved August 27, 2014, from http://www.economist
Charts”. Quality and Reliability Engineering International
.com/node/15557443.
24, pp. 765–778.
Dávila, S.; Runger, G.; and Tuv, E. (2011). “High-
Grigg, O. and Farewell, V. (2004). “An Overview of Risk-
Dimensional Surveillance”. In Artificial Neural Networks
Adjusted Charts”. Journal of the Royal Statistical Society,
and Machine Learning—Icann 2011, vol. 6792, Honkela, T.;
Series A 167, pp. 523–539.
Duch, W.; Girolami, M.; and Kaski, S., eds., pp. 245–252.
Grasso, M.; Colosimo, B. M.; Semeraro, Q.; and Pacella,
Berlin and Heidelberg, Germany: Springer.
M. (2015). “A Comparison of Distribution-Free Multivariate
Dávila, S.; Runger, G.; and Tuv, E. (2014). “Public Health
SPC Methods for Multimode Data”. Quality and Reliablity
Surveillance with Ensemble-Based Supervised Learning”.
Engineering International 31, pp. 75–76.
IIE Transactions 46, pp. 770–789.
Gronskyte, R.; Kulahci, M.; and Clemmensen, L. K. H.
Deng, H.; Runger, G.; and Tuv, E. (2012). “System Moni- (2013), ”Monitoring Motion of Pigs in Thermal Videos”.
toring with Real-Time Contrasts”. Journal of Quality Tech- Workshop on Farm Animal and Food Quality Imaging 2013,
nology 44, pp. 9–27. pp. 31–36.
Doymaz, F.; Chen, J.; Romagnoli, J. A.; and Palazoglu, Guha, S.; Rastogi, R.; and Shim, K. (1998). “Cure: An Ef-
A. (2001). “A Robust Strategy for Real-Time Process Mon- ficient Clustering Algorithm for Large Databases”. Proceed-
itoring”. Journal of Process Control 11, pp. 343–359. ings of the 1998 ACM SIGMOD International Conference
Du, S. and Xi, L. (2011). “Fault Diagnosis in Assembly Pro- on Management of Data, pp. 73–84.
cesses Based on Engineering-Driven Rules and PSOSAENA Guyon, I. and Elisseeff, A. (2003). “An Introduction to
algorithm”. Computers & Industrial Engineering 60(1), pp. Variable and Feature Selection”. Journal of Machine Learn-
77–88. ing Research 3, pp. 1157–1182.
Du, S.; Jun, L.; and Xi, L. (2012). “On-line Classifying Pro- Hachicha, W. and Ghorbel, A. (2012). “A Survey of
cess Mean Shifts in Multivariate Control Charts Based on Control-Chart Pattern-Recognition Literature (1991–2010)
Multiclass Support Vector Machines”. International Jour- Based on a New Conceptual Classification Scheme”. Com-
nal of Production Research 50, pp. 6288–6310. puters & Industrial Engineering 63, pp. 204–222.
Duin, R. P. W.; Juszczak, P.; Paclik, P.; Pekhalska, E.; Han, J. and Kamber, M. (2011). Data Mining: Concepts and
de Ridder, D.; Tax, D. M. J.; and Verzakov, S. (2007). Techniques, 3rd ed. Burlington, MA: Elsevier.
“PRTools4.1, A Matlab Toolbox for Pattern Recognition”. Hastie, T.; Tibshirani, R.; Friedman, J.; Hastie, T.;
Delft University of Technology. Friedman, J. and Tibshirani, R. (2009). The Elements of
Efron, B.,; Hastie, T.; Johnstone, I.; and Tibshirani, R. Statistical Learning, 2nd ed. New York, NY: Springer.
(2004). “Least Angle Regression”. The Annals of Statistics Hyniewicz, O. (2015). “SPC of Processes with Predicted
32, pp. 407–499. Data: Application of the Data Mining Process”. In Fron-
Ferrer, A. (2014). “Latent Structures-Based Multivariate tiers in Statistical Quality Control, Knoth, S. and Schmid,
Statistical Process Control: A Paradigm Shift”. Quality En- W., eds., pp. 219–235. Cham, Switzerland: Springer Interna-
gineering 26, pp. 72–91. tional Publishing.

Journal of Quality Technology Vol. 48, No. 1, January 2016


STATISTICAL LEARNING METHODS APPLIED TO PROCESS MONITORING 25

Hwang, W. and Lee, J. S. (2015). “Shifting Artificial Data Li, J.; Jin, J.; and Shi, J. (2008). “Causation-Based T 2 De-
to Detect System Failures”. International Transactions in composition for Multivariate Process Monitoring and Diag-
Operational Research 22, pp. 363–378. nosis”. Journal of Quality Technology 40, pp. 46–58.
Hwang, W.; Runger, G.; and Tuv, E. (2007). “Multivariate Li, F.; Runger, G. C.; and Tuv, E. (2006). “Supervised
Statistical Process Control with Artificial Contrasts”. IIE Learning for Change-Point Detection”. International Jour-
Transactions 39, pp. 659–669. nal of Production Research 15, pp. 2853–2868.
Issam, B. K. and Mohamed, L. (2008). “Support Vector Re- Liu, C. and Wang, T. (2014). “An AK-Chart for the Non-
gression Based Residual MCUSUM Control Chart for Auto- Normal Data”. International Journal of Computer, Infor-
correlated Process”. Applied Mathematics and Computation mation, Systems and Control Engineering 8, pp. 992–997.
201, pp. 565–574. Maboudou-Tchao, E. M. and Diawara, N. (2013). “A
James, G.; Witten, D.; Hastie, T.; and Tibshirani, R. LASSO Chart for Monitoring the Covariance Matrix”. Qual-
(2013). An Introduction to Statistical Learning with Appli- ity Technology and Quantitative Management 10, pp. 95–
cations in R. New York, NY: Springer. 114.
Jianbo, Y.; Lifeng, X.; and Xiaojun, Z. (2009). “Identifying MacGregor, J. and Kourti, T. (1995). “Statistical Pro-
Source(s) of Out-of-Control Signals in Multivariate Manu- cess Control of Multivariate Processes”. Control Engineering
facturing Processes Using Selective Neural Network Ensem- Practice 3, pp. 403–414.
ble”. Engineering Applications of Artificial Intelligence 22, Mahadevan, S. and Shah, S. L. (2009). “Fault Detection and
pp. 141–152. Diagnosis in Process Data Using One-Class Support Vector
Jobe, J. M. and Pokojovy, M. (2009). “A Multistep, Cluster- Machines”. Journal of Process Control 19, pp. 1627–1639.
Based Multivariate Chart for Retrospective Monitoring of Megahed, F. M.; Woodall, W. H.; and Camelio, J. A.
Individuals”. Journal of Quality Technology 41, 323–339. (2011). “A Review and Perspective on Control Charting with
Image Data”. Journal of Quality Technology 43, pp. 83–
Jones-Farmer, L. A.; Ezell, J. D.; and Hazen, B. T.
98.
(2014a). “Applying Control Chart Methods to Enhance Data
Quality”. Technometrics 56, pp. 29–41. Maragah, H. D. and Woodall, W. H. (1990). “The E↵ect
of Autocorrelation on the Retrospective X-Chart”. Journal
Jones-Farmer, L. A.; Woodall, W. H.; Steiner, S. H.; and
of Statistical Computation and Simulation 40, pp. 29–42.
Champ, C. W. (2014b). “An Overview of Phase I Analysis
Megahed, F. M. and Jones-Farmer, L. A. (2015). “Statis-
for Process Improvement and Monitoring”. Journal of Qual-
tical Perspectives on Big Data”. In Frontiers in Statistical
ity Technology 46, pp. 265–280.
Quality Control, 11, Knoth, S. and Schmid, W., eds., pp.
Kang, J. H. and Kim, S. B. (2011). “Clutering-Algorithm-
29–47. Cham, Switzerland: Springer International.
Based Control Charts for Inhomogeneously Distributed
Moguerza, M. J.; Munoz, A.; and Psakaris, S. (2007).
TFT-LCD Processes”. International Journal of Production
“Monitoring Nonlinear Profiles Using Support Vector Ma-
Research 51, pp. 5644–5657.
chines”. Progress in Pattern Recognition, Image Analysis
Keogh, E. and Lin, J. (2005). “Clustering of Time-Series Sub- and Applictions, pp. 574–583.
sequences Is Meaningless: Implications for Previous and Fu- Montgomery, D. C. (2013). Introduction to Statistical Qual-
ture Research”. Knowledge and Information Systems 8, pp. ity Control, 7th ed. Hoboken, NJ: Wiley.
154–177.
Nair, V. (2008). “Industrial Statistics: The Gap Between Re-
Kim, S. B.; Jitpitaklert, W.; Park, S. K.; and Hwang, search and Practice, Youden Memorial Address”. ASQ Sta-
S. J. (2012). “Data Mining Model-Based Control Charts for tistics Division Newsletter 27, pp. 5–7. http://rube.asq.org
Multivariate and Autocorrelated Processes”. Expert Systems /statistics/2010/02/statistical-process-control-(spc)/2007-
with Applications 39(2), pp. 2073–2081. youden-address.pdf
Kim, S. B.; Sukchotrat, T.; and Park, S. K. (2011). “A Niaki, S. T. A. and Abbasi, B. (2008). “Detection and Classi-
Nonparametric Fault Isolation Approach Through One-Class fication Mean-Shifts in Multi-Attribute Processes by Artifi-
Classification Algorithms”. IIE Transactions 43, pp. 505– cial Neural Networks”. International Journal of Production
517. Research 46, pp. 2945–2963.
Kim, S. B.; Weerawat, J.; and Sukchotrat, T. (2010). Ning, X. and Tsung, F. (2010). “Monitoring a Process
“One-Class Classification-Based Control Charts for Moni- with Mixed-Type and High-Dimensional Data”. In Industrial
toring Autocorrelated Multivariate Processes”. Communi- Engineering and Engineering Management (IEEM), 2010
cations in Statistics—Simulation and Computation 39, pp. IEEE International Conference, pp. 1430–1432.
461–474. Ning, X. and Tsung, F. (2013). “Improved Design of Kernel-
Kourti, T. and MacGregor, J. F. (1995). “Process Analy- Distance-Based Charts Using Support Vector Methods”. IIE
sis, Monitoring and Diagnosis, Using Multivariate Projection Transactions 45, pp. 464–476
Methods”. Chemometrics and Intelligent Laboratory Sys- Noorossana, R.; Saghaei, A.; and Amiri, A. (2011). Sta-
tems 28, pp. 3–21. tistical Analysis of Profile Monitoring. Hoboken, NJ: John
Kresta, J. V.; MacGregor, J. F.; and Marlin, T. E. Wiley & Sons.
(1991). “Multivariate Statistical Monitoring of Process Op- Nylund, K. L.; Asparouhov, T.; and Muthén, B. O. (2007).
erating Performance”. The Canadian Journal of Chemical “Deciding on the Number of Classes in Latent Class Analysis
Engineering 69, pp. 35–47. and Growth Mixture Modeling: A Monte Carlo Simulation
Kumar, S.; Choudhary, A. K.; Kumar, M.; Shankar, R.; Study”. Structural Equation Modeling: A Multidisciplinary
and Tiwari, M. K. (2006). “Kernel Distance-Based Support Journal 14, pp. 535–569.
Vector Methods and Its Application in Developing a Robust Psarakis, S. (2011). “The Use of Neural Networks in Statis-
K-Chart”. International Journal of Production Research 44, tical Process Control Charts”. Quality and Reliability Engi-
pp. 77–76. neering International 27, pp. 641–650.

Vol. 48, No. 1, January 2016 www.asq.org


26 MARIA WEESE ET AL.

Psakaris, S. and Papaleonida, G. E. A. (2007). “SPC Pro- Tofighi, D. and Enders, C. K. (2008). “Identifying the Cor-
cedures for Monitoring Autocorrelated Processes”. Quality rect Number of Classes in Growth Mixture Models”. In Ad-
Technology and Quantitative Management 4(4), pp. 501– vances in Latent Variable Mixture Models, Hancock, G. R.
540. and K. M. Samuelsen, K. M., eds., pp. 317-341. Greenwhich,
Rajaraman, A.; Leskovec, J.; and Ullman, J. D. (2014). CT: Information Age.
Mining of Massive Datasets. New York, NY: Cambridge Uni- Tuerhong, G. and Kim, S. B. (2014). “Comparison of Novelty
versity Press. http://www.mmds.org/#book. Score-Based Multivariate Control Charts”. Communications
Sall, J. P. (2013). “Big Statistics Is Di↵erent”. Paper pre- in Statistics—Simulation and Computation 44, pp. 1126–
sented in Plenary Session in the 57th Annual Fall Technical 1143.
Conference, San Antonio, TX, Oct 17–18. Venkatasubramanian, V.; Rengaswamy, R.; Kavuri, S. N.;
Schapire, R. E. (1990). “The Strength of Weak Learnability”. and Yin, K. (2003). “A Review of Process Fault Detection
Machine Learning 5, pp. 197–227. and Diagnosis. Part III: Process History Based Methods”.
Shmueli, G. and Burkom, H. (2010). “Statistical Challenges Computers & Chemical Engineering 27, pp. 327–346.
Facing Early Outbreak Detection in Biosurveillance”. Tech- Wang, K. and Tsung, F. (2005). “Using Profile Monitoring
nometrics 52, pp. 39–51. Techniques for a Data-Rich Environment with Huge Sample
Shmueli, G. and Fienberg, S. E. (2006). “Current and Po- Size”. Quality and Reliability Engineering International 21,
tential Statistical Methods for Monitoring Multiple Data pp. 677–688.
Streams for Bio-Surveilllance”. In Statistical Methods in Wang, K. B. and Jiang, W. (2009). “High-Dimensional Pro-
Counter-Terrorism: Game Theory, Modeling, Syndromic cess Monitoring and Fault Isolation Via Variable Selection”.
Surveillance, Biometric Authentication, pp. 109–140. New Journal of Quality Technology 41, pp. 247–258.
York, NY: Springer. Wells, L. J.; Megahed, F. M.; Camelio, J. A.; and
Smith, H. D.; Megahed, F.M.; Jones-Farmer, L. A.; and Woodall, W. H. (2012). “A Framework for Variation Visu-
Clark, M. (2014). “Using Visual Data Mining to Enhance alization and Understanding in Complex Manufacturing Sys-
the Simple Tools in Statistical Process Control: A Case tems”. Journal of Intelligent Manufacturing 23, pp. 2025–
Study”. Quality and Reliability Engineering International, 2036.
to appear. DOI: 10.1002/qre.1706. Wells, L. J.; Megahed, F. M.; Niziolek, C. B.; Camelio, J.
A.; and Woodall, W. H. (2013). “Statistical Process Moni-
Steiner, S. H. (2014). “Risk-Adjusted Monitoring of Out-
toring Approach for High-Density Point Clouds”. Journal of
comes in Health Care”. in Statistics in Action: A Canadian
Intelligent Manufacturing 24, pp. 1267–1279.
Outlook, Lawless, J. F., ed., p. 225. Boca Raton, FL: CRC
Press. Wheeler, D. J. and Chambers, D. S. (2010). Understand-
ing Statistical Process Control, 3rd ed., Knoxville, TN: SPC
Steiner, S. H.; Cook, R. J.; Farewell, V. T.; and Trea-
Press.
sure, T. (2000). “Monitoring Surgical Performance Using
Wise, B. and Ricker, N. (1989). “Feedback Strategies in Mul-
Risk-Adjusted Cumulative Sum Charts”. Biostatistics 1, pp.
tiple Sensor Systems”. AIChE Symposium Series 85, pp. 19–
441–452.
23.
Sukchotrat, T.; Kim, S. B.; and Tsung, F. (2009). “One-
Wise, B.; Veltkamp, D.; Davis, B.; Ricker, N.; and Kowal-
Class Classification-Based Control Charts for Multivariate
ski, B. (1988). “Principal Components Analysis for Moni-
Process Monitoring”. IIE Transactions 42, pp. 107–120.
toring the West Valley Liquid Fed Ceramic Melter”. Waste
Sullivan, J. (2002). “Detection of Multiple Change Points Management 88, pp. 811–818.
from Clustering Individual Observations”. Journal of Qual-
Woodall, W. H.; and Adams, B. M. (1998). “Statistical Pro-
ity Technology 34, pp. 371–383.
cess Control”. In Handbook of Statistical Methods for Engi-
Sun, R. and Tsung, F. (2003). “A Kernel-Distance-based neers and Scientists, Wadsworth, H. M., ed. New York, NY:
Multivariate Control Chart Using Support Vector Methods”. McGraw-Hill.
International Journal of Production Research 41(13), 2975– Woodall, W. H. (2000). “Controversies and Contradictions
2989. in Statistical Process Control”. Journal of Quality Technol-
Tax, D. M. and Duin, R. P. (1999). “Support Vector Domain ogy 32(4), pp. 341–350.
Description”. Pattern Recognition Letters 20, pp. 1191–1199. Woodall, W. H. (2006). “The Use of Control Charts in
Tax, D. M. and Duin, R. P. (2004). “Support Vector Data Health-Care and Public-Health Surveillance”. Journal of
Description”. Machine Learning 54, pp. 45–66. Quality Technology 38, pp. 89–104.
Tax, D. M. J.; Ypma, A.; and Duin, R. P. W. (1999). “Pump Woodall, W. H. (2007). “Current Research on Profile Mon-
Failure Detection Using Support Vector Data Description”. itoring”. Produção 17, pp. 420–425.
Proceedings of the Third International Symposium on Ad- Woodall, W. H. and Montgomery, D. C. (2014). “Some
vances in Intelligent Data Analysis, Amsterdam, Nether- Current Directions in the Theory and Application of Sta-
lands, pp. 415–425. tistical Process Monitoring”. Journal of Quality Technology
Teague, N. R. (2004). “Control Chart—Asq.” Retrieved Au- 46, pp. 78–94.
gust 28, 2014, from http://asq.org/learn-about-quality/data- Woodall, W. H.; Spitzner, D. J.; Montgomery, D. C.;
collection-analysis-tools/overview/control-chart.html and Gupta, S. (2004). “Using Control Charts to Monitor
Thissen, U.; Swierenga, H.; and de Weijer, A. P. (2005). Process and Product Quality Profiles”. Journal of Quality
“Multivariate Statistical Process Control Using Mixture Technology 36, pp. 309–320.
Modelling”. Journal of Chemometrics 19, pp. 23–31. Yao, M.; Wang, H.; and Xu, W. (2014). “Batch Process Mon-
Tibshirani, R. (1996). “Regression Shrinkage and Selection itoring Based on Functional Data Analysis and Support Vec-
Via the Lasso”. Journal of the Royal Statistical Society, Se- tor Data Description”. Journal of Process Control 24, pp.
ries B 58, pp. 267–288. 1083–1097.

Journal of Quality Technology Vol. 48, No. 1, January 2016


STATISTICAL LEARNING METHODS APPLIED TO PROCESS MONITORING 27

Yan, H.; Paynabar, K.; and Shi, J. (2014). “Image-Based Zhang, T.; Ramakrishnan, R.; and Livny, M. (1996).
Process Monitoring Using Low-Rank Tensor Decomposi- “Birch: An Efficient Data Clustering Method for Very Large
tion”. IEEE Transactions on Automation Science and En- Databases”. Proceedings of the 1996 ACM SIGMOD Inter-
gineering, pp. 1–12. national Conference on Management of Data, pp. 103–114.
Ypma, A.; Tax, D. M. J.; and Duin, R. P. W. (1999). “Sup- Zorriassatine, F. and Tannock, J. D. T. (1998). “A Review
port Vector Data Description Applied to Machine Vibration of Neural Networks for Statistical Process Control”. Journal
Analysis”. Proceedings of the Fifth Annual Conference of the of Intelligent Manufacturing 9, pp. 209–224.
Advanced School for Computing and Imaging, Heijen, The
Zorriassatine, F.; Tannock, J. D. T.; and O’Brien, C.
Netherlands, pp. 398–405.
(2003). “Using Novelty Detection to Identify Abnormalities
Yu, J. and Xi, L. (2009). “A Neural Network Ensemble-Based Caused by Mean Shifts in Bivariate Processes”. Computers
Model for On-Line Monitoring and Diagnosis of Out-of- & Industrial Engineering 44, pp. 385–408.
Control Signals in Multivariate Manufacturing Processes”.
Expert Systems with Applications 36, pp. 909–921. Zou, C. L.; Jiang, W.; and Tsung, F. (2011). “A Lasso-Based
Diagnostic Framework for Multivariate Statistical Process
Zhange, C.; Tsung, F.; and Zou, C. (2015). “A General
Control”. Technometrics 53, pp. 297–309.
Framework for Monitoring Complex Processes with Both In-
Control and Out-of-Control Information”. Computers in In- Zou, C. L.; Ning, X. H.; and Tsung, F. G. (2012). “Lasso-
dustrial Engineering 85, pp. 157–168. Based Multivariate Linear Profile Monitoring”. Annals of
Zhang, H.; Albin, S. L.; Wagner, S. R.; Nolet, D. A.; and Operations Research 192, pp. 3–19.
Gupta, S. (2010). “Determining Statistical Process Control Zou, C. and Qiu, P. (2009). “Multivariate Statistical Process
Baseline Periods in Long Historical Data Streams”. Journal Control Using LASSO”. Journal of the American Statistical
of Quality Technology 42, pp. 21–35. Association 104, pp. 1586–1596.

Vol. 48, No. 1, January 2016 www.asq.org

You might also like