Recognition-of-Control-Chart-Pattern-Using-Improved-Supervis_2017_Procedia-E

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Available online at www.sciencedirect.

com

ScienceDirect
Procedia Engineering 174 (2017) 281 – 288

13th Global Congress on Manufacturing and Management, GCMM 2016

Recognition of control chart pattern using improved supervised


locally linear embedding and support vector machine
Chunhua Zhaoa, b, Chengkang Wangb, *, Lu Huab, Xiao Liub, Yina Zhangb, Hengxing Hub
a
Hubei Key Laboratory of Hydroelectric Machinery Design & Maintenance, China Three Gorges University, Yichang 443002,
b
College of Mechanical and Power Engineering, China Three Gorges University, Yichang 443002, China

Abstract

Control chart patterns (CCPs) have been widely utilized for machining process control, the effective recognition of abnormal
CCPs can significantly narrow the set of possible assignable causes with shortening the diagnostic process to improve the
intelligence of quality monitoring. This paper proposes a method for control chart classification based on improved supervised
locally linear embedding (SLLE) and support vector machine (SVM). The present work extracts 12 dimensional statistical feature
and shape feature of the control chart, reduces the dimensionality of high dimensional feature set by SLLE, and estimates the
neighborhood size and embedding dimension with normalized cuts (Ncut) criterion. Genetic algorithm (GA) is utilized to
optimize SVM classifier by searching the best values of the SVM parameters. According to the data set generated by Monte
Carlo simulation, the simulation result and performance are analyzed and compared the recognition accuracy with fore-and-aft
dimensionality reduction, different descending dimension algorithms and various classifiers. The result demonstrates the
proposed approach can recognize CCPs effectively.
© 2017
© 2016TheTheAuthors.
Authors. Published
Published by Elsevier
by Elsevier Ltd. Ltd.
This is an open access article under the CC BY-NC-ND license
Peer-review under responsibility of the organizing committee of the 13th Global Congress on Manufacturing and Management.
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
Peer-review under responsibility of the organizing committee of the 13th Global Congress on Manufacturing and Management
Keywords: Control chart pattern; Supervised locally linear embedding; Support vector machine; Genetic algorithm

1. Introduction

The control chart, as a basic tool of the statistical process control (SPC), plays an important role in quality control
of various production processes. Based on the principle of statistical hypothesis testing, the control chart is used to
record and monitor the volatility of some key quality characteristics [1]. With the in-depth study, some researchers

*
* Corresponding author. Tel.: +86-18671715751.
E-mail address: wang_ck@foxmail.com

1877-7058 © 2017 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
Peer-review under responsibility of the organizing committee of the 13th Global Congress on Manufacturing and Management
doi:10.1016/j.proeng.2017.01.138
282 Chunhua Zhao et al. / Procedia Engineering 174 (2017) 281 – 288

found it is difficult to determine whether a control chart is going to deteriorate rapidly by observing the fluctuations.
Nevertheless, many researchers have studied control chart patterns (CCPs), analyzed the reason of abnormal pattern
from the point of view of 5M1E (Man, Machine, Material, Method, Measurement, Environment), and formulated the
corresponding treatment plan for each abnormal pattern [2]. After a long period of research and review, most CCPs
can be classified as normal patterns (NOR) and six abnormal patterns, including Up Shift (US), Down Shift (DS),
Up Trend (UT), Down Trend (DT), Cyclic (CYC), and Systematic (SYS)[3]. As shown in Fig.1, the red solid and
blue dotted line indicates the abnormal patterns and NOR, respectively.

(a) Up shift pattern (b) Down shift pattern (c) Up trend pattern

(d) Down trend pattern (e) Cyclic pattern (f) Systematic pattern

Fig. 1. Six abnormal control chart patterns

With the demand of information and intelligence for product quality monitoring, many scholars have made a
large amount of researches on how to use the machine learning technology to identify CCPs rapidly. At present, the
main feature extraction method includes wavelet analysis [4,5], geometric feature extraction and statistical feature
extraction [6,7]. Due to the problem of extracting feature set with partial redundancy, some scholars used various
dimension reduction algorithms to remove redundant information for simplifying original feature set and improving
the efficiency of recognition, such as ICA (Independent Component Analysis), PCA (Principal Component Analysis)
and NRS (Neighborhood Rough Set) [8-10]. In the selection of classifier, Ghomi and Cheng used neural network to
identify CCPs with a certain effect [11,12]. The neural network, with a strong self-learning ability and aiming at the
empirical risk minimization, its practical application will be limited while training a large and complex sample [10].
Considering the minimum structure risk, the training accuracy and generalization ability, Support Vector Machine
(SVM) has the advantages of simple structure, high dimension and small sample size [13], therefore, many scholars
consider to identify CCPs with SVM, and optimize SVM parameters to improve classification accuracy with PSO
(Particle Swarm Optimization) [13], GA (Genetic Algorithm) [7,14] and Grid Search [15].
Locally linear embedding algorithm (LLE) is a non-linear manifold learning method proposed by Roweis on
Science in 2000. LLE mainly takes advantage of the non-linear structure featuring local linear fitting integrity, to
construct a linear map/transformation characterized by invariant neighborhood geometry and find low-dimensional
manifolds with high dimensional data analysis [16,17]. Supervised Locally Linear Embedding (SLLE) can
strengthen the aggregation of similar samples and the mutual exclusion of heterogeneous samples, to achieve
decoupling and classification of feature manifolds and incremental processing of new samples, therefore, it is
mainly used in the field of mechanical fault diagnosis [18]. Since the neighborhood parameter and embedding
dimension of SLLE algorithm can greatly affect the classification result, the present scholars have put some methods
forward to estimate the parameters, such as MRRE, LCMC and Ncut [19,20].
In this study, a new control chart recognition method combines SLLE with SVM is proposed, and the proposed
method is used to estimate neighborhood parameters and embedding dimension of SLLE by the Ncut criterion, and
optimize SVM parameters by GA.
Chunhua Zhao et al. / Procedia Engineering 174 (2017) 281 – 288 283

2. Feature extraction

2.1. Statistical features

This study makes several references to the features which extracted from control chart in the literature [6,7],
including 6 statistical features and 6 shape features. Statistical features include Mean, Standard Deviation (SD),
Skewness, Kurtosis, Mean Square Value (MSV) and Autocorrelation. The calculation formula is as follows, where
t represents position of the monitoring points in control chart, y (t ) indicates quality characteristic value of a
process point, and N represents the number of the original feature points in control chart, that is, the sequence length.
N
Mean ¦ y(t )
t 1
N (1)

N
SD ¦ ( y(t )  Mean)
t 1
2
N (2)

N
Skewness ¦ ( y(t)  Mean)
t 1
3
N ( SD3 ) (3)
N
Kurtosis ¦ ( y(t )  Mean)
t 1
4
N ( SD4 ) (4)
N
MSV ¦ y(t )
t 1
2
N (5)

Autocorr # [ y(1) u y(2)  y(2) u y(3)   y( N  1) u y( N )] N (6)

2.2. Shape features

There are six types of shape features, including the mean for midpoint slope (AASL), range of midpoint slope
(SRANGE), ratio of area to variance for model and mean line (ACLPI), symbol for entire model least squares line
(SB), ratio of average for intersection number of mean line and least square line (PSMLSC), and ratio of average of
mean-variance for least squares line and all data points matching to that for the least squares line and half of data
points of six subset matching (REAE), the calculation method is as follows.

AASL ¦sj ,k
jk 6 (7)

SRANGE max(s jk )  min(s jk ) (8)


ACLPI > ACL ( N  1)@ SD 2
(9)
ªN N
º
SB sgn «¦ y(t )(t  t ) ¦ (t  t )»¼ (10)
¬t 1 t 1
N 1
PSMLSC ¦ (o
t 1
t  otc) 2 N (11)

ª º
REAE MSE «¦ MSE j , k 6» (12)
¬ j ,k ¼
Where, the point of control chart is divided into 4 sections equally in the calculation of AASL, SRANGE and
REAE, s jk represents slope of the midpoint of j segment to k segment, j 1, 2,3 , k 2,3, 4 , j  k . ACL
represents the area between whole CCP and mean line. sgn represents a symbolic function and represents the
positive and negative return values of parameters which are 0 and 1, respectively. ot represents a symbolic value
284 Chunhua Zhao et al. / Procedia Engineering 174 (2017) 281 – 288

calculated by two consecutive sampling values and overall mean of y (t ) , ot sgn ª y(t )  y y(t  1)  y º and
¬ ¼
otc represents a symbolic value calculated by two consecutive sampling values and their least squares value of y (t )c ,
otc sgn ª¬ y(t )  y(t ) c y(t 1)  y(t 1) c º¼ . MSE represents the mean-variance of least square estimation for all
sampling points, MSE j , k represents the mean-variance of least square estimation for reorganization of j to k .

3. Methodology

3.1. Improved supervised locally linear embedding

3.1.1. Supervised locally linear embedding

The principle of SLLE is employed to find the k th nearest neighbors of each training sample xi  X according
to the class labels, which enlarging the Euclidean distance by adding a constant to pairs of points from different
classes, and the distance of data points from the same class is kept [18]. The formula is as follows
Sc S  D max S 1  G xi , x j (13)
In order to ensure the neighborhood with a majority of the same points, the different samples are mapped to low
dimensional space individually. Where, S xi  x j represents the original distance without the Euclidean distance
of class labels, max S max ij xi  x j is the largest distance between any two points in training set.
If xi and x j belong to different classes, G xi , x j 0 , if not, G xi , x j 1 . D  >0,1@ , which is a tuning
parameter, controls the amount for class information incorporated.
The k th nearest neighbors of each training point xi are determined by formula (13) firstly, then defining the
following distance
2
n n
H W ¦ x  ¦w x
i 1
i
j 1
ij ij (14)

The weight matrix W is constructed by wij . If xij does not belong to the set of neighbors of xi , wij 0 , and the
n
constraint ¦w
j 1
ij 1 should be meet.

Each high dimensional data xi is mapped to a low dimensional data yi representing global internal coordinates
on the manifold. The d-dimensional coordinates yi can be obtained by minimizing the quadratic form in (15).
2
n n
H Y ¦
i 1
yi  ¦ wij yij
j 1
(15)

n
1 n
It should meet constraints ¦y
i 1
i 0 , and ¦ yi yiT
ni1
I , where I is the unit matrix. In the calculation formula

(15) for optimal the solution, H Y can be expressed as


n n
H Y ¦¦ M
i 1 j 1
ij yiT y j tr YMY T (16)

Where, M is a n u n identity matrix as M I W


I  W , the matrix Y composed of the eigenvector of
T

matrix M is the solution of formula (3), the column vector dimension of Y is d .


Chunhua Zhao et al. / Procedia Engineering 174 (2017) 281 – 288 285

3.1.2 Parameter estimation

In this paper, the Ncut criterion is selected to estimate neighborhood and embedding dimension of SLLE [20,21].
Assuming the sample points can be divided into K class, Ci belongs to the sample set of i class, the Ncut criterion
is

K
¦ S xi , x j
Ncut ¦
xi Ci , x j Ci (17)
i 1 d xi

Where, d xi ¦S
xi Ci
xi , xi , the numerator of formula (17) decreases with the increment of the distance

between boundary points of Ci and others. The closer points of Ci are to each other, the higher weight S
d 2 xi , x j

represents the bigger denominator of formula (17). The weights of edges are V iV j , the
S xi , x j S x j , xi e
Euclidean distance d xi , x j measure spatial relation between xi and x j , V i V j represent the average distance

between xi x j and other neighborhood points. If xi , x j  E , S 0 . In order to improve the accuracy of feature
extraction in calculation, the reconstruction error H in formula (15) should be as small as possible.

3.2. Support vector machine

SVM is a pattern recognition method based on the principle of structural risk minimization. The non-liner
mapping method, aims at solving the linearly non-separable samples problems, is usually employed to map samples
into a higher dimensional feature space [4]. In SVM classification, appropriate type of kernel function, the penalty
function value c and kernel function g should be selected. In this study, the Gauss kernel function is the kernel
function of SVM classifier on the basis of literature [10]. GA is used to find the optimal penalty function value c and
kernel function g . The specific method is the accuracy of training set by Cross Validation (CV) serves as GA
fitness, to optimize the parameters of SVM. Consequently, the present study names the SVM based on GA
optimization parameters as GA-SVM.

4. Data generation

In the control chart recognition, the best source of data should be derived from the practical production. Since the
most enterprises could not provide a large amount of quality data required for the study, Monte-Carlo (MC)
simulation is often employed to simulate control chart of different modes to generate data. Based on above, the
simulation data generation mode are set as follows.
(1)NOR expression
y(t ) P  r (t ) u V (18)
P , V denotes the mean and mean-variance, respectively, r (t ) represents the random disturbances in normal
range of manufacturing processes, and its value subjects to standard normal distribution.
(2)US/DS expression
y(t ) P  r (t ) u V r k u s (19)
k is 0 or 1 indicates whether a step change occurs with the t point of control chart or not, when the pattern is US,
the sign of k is positive, DS is negative otherwise, s denotes the amplitude of phase step.
(3)UT/DT expression
y(t ) P  r (t ) u V r t u g (20)
g denotes slope of UT/DT, when the pattern is UT, the sign of t is positive, DT is negative otherwise.
(4) CYC expression
y(t ) P  r (t ) u V  a u sin(2S t / T ) (21)
286 Chunhua Zhao et al. / Procedia Engineering 174 (2017) 281 – 288

a denotes amplitude, T denotes cycle.


(5) SYS expression
y(t ) P  r (t ) u V  d u (1)t (22)
d represents the mean degree of deviation for system model.
According to the above method, it sets the relevant values on the basis of literature [14], where, P 0, V 1,
s  1,3 , g  (0.1,0.26) , a  (1.5, 2.5) , T 8 , d  (1,3) . The present work uses this rule to generate a sample
set of each model with 7 u 200 u 40 format. 7 types of control chart are generated, including a normal pattern and
six abnormal patterns, each type generates 200 samples, 100 as a training set, and the remaining 100 as a test set,
each control chart contains 40 sampling points.

5. Simulation and results analysis

According to the formula (1-12), 12 statistical and shape features can be extracted from the sample set, forming a
feature set of all control chart types with 7 u 200 u12 format. The improved SLLE is employed to reduce the
dimension of feature set, the neighborhood parameters and embedding dimension of algorithm are estimated by
Ncut criterion, it can obtain neighborhood parameter k 4 , intrinsic dimension d 6 , and a new low dimensional
set with 7 u 200 u 6 format.
The feature set is divided into training set and test set, the fitness of GA-SVM is verified by cross validation with
700 samples of training set. The termination algebra is 100, the number of population is 20, therefore it obtains the
penalty function value c 34.7599 , and kernel function value g 975.7995 .

5.1. Performance of recognizer in different dimension

The test set is classified by GA-SVM. In order to verify the effectiveness of dimension reduction based on
improved SLLE for high dimensional features, it is necessary to compare the recognition accuracy of each control
chart type with that of the overall average recognition fore-and-aft dimension reduction. The results are shown in
Table 1.

Table 1. Recognition accuracies in different dimension


Accuracy (%)
Dimension
Average NOR US DS UT DT CYC SYS
12 94.7143 94 97 95 100 82 97 98
6 99.4286 99 100 99 100 100 99 99
As demonstrated in Table 1, due to the neighborhood parameter and embedding dimension of SLLE algorithm
are optimized, while the sample feature is descended to 6 dimensions, the information of original feature set can be
preserved to the maximum extent, and the recognition accuracy would be greatly improved. Especially in the
recognition of DT, there are obvious advantages, while identifying other types, the recognition accuracy is more
stable and reliable. In addition, due to the large number of dimensionality reduction, the computational complexity
is reduced and the computational speed is improved.

5.2. Performance of recognizer with different descending dimension algorithm

In order to verify advantages of the improved SLLE in feature extraction of control charts ulteriorly, SLLE, PCA
(principal component analysis), NRS (neighborhood rough set) are employed to reduce dimension based on the
original 12 dimensional feature set [10], then using GA-SVM to classify the low dimensional feature set, the
recognition results of 4 different descending dimension algorithms are shown in Table 2.
Chunhua Zhao et al. / Procedia Engineering 174 (2017) 281 – 288 287

Table 2. Recognition accuracies with different descending dimension algorithm


Descending dimension algorithm Dimension Accuracy (%)
SLLE 8 87.8571
PCA 8 94.4285
NRS 7 95.3300
Improved SLLE 6 99.4286
Table 2 shows that the classification accuracy based on 12 dimensional feature set is relatively lower after
dimensionality reduction by SLLE, PCA and NRS, however, there are still some redundant features. The improved
SLLE algorithm reduced dimension of feature set furthest, and contribute significantly to increase the recognition
accuracies.

5.3. Comparison of the GA-SVM and BPNN

Artificial Neural Network (ANN) is also widely used in pattern recognition. The present research selects the
Back-propagation Neutral Network (BPNN) mentioned in the literature [14] and compares BPNN with GA-SVM,
the feature set can be served as classification after dimensionality reduction by improved SLLE, the recognition
accuracy of each control chart and overall average recognition accuracy are shown in Table 3.

Table 3. Recognition accuracies with GA-SVM and BPNN


Accuracy (%)
Classifier
Average NOR US DS UT DT CYC SYS
BPNN 91.1429 98 86 96 78 94 97 89
GA-SVM 99.4286 99 100 99 100 100 99 99
As demonstrated in Table 3, the recognition accuracy on each control chart types by GA-SVM is higher than that
by BPNN in the classification with same feature set, especially on US, DS and SYS. Since the weight of BPNN is
gradually adjusted to the direction of local improvement, it will make weight converge to local minima and cause
network training performing ordinarily. Each operation result will have greater volatility while initializing the
network of different weights. Consequently, GA-SVM, compared with BPNN, performs a better stability and
reliability.

6. Conclusion

The fast recognition of control chart in the manufacturing process plays a key role in realizing the monitoring of
production processes and diagnosis of abnormal factors. The present study proposes a feature extraction method of
control chart based on improved SLLE and SVM. The result shows that the improved SLLE for high dimensional
feature reduction can effectively eliminate redundant features in the feature set and reduce the complexity of
classification model with the integrity of control chart features guaranteed. And the optimization of SVM parameters
based on GA improves the recognition precision of the classifier significantly.

Acknowledgments

The authors acknowledge the support provided by the CERNET Innovation Project under Grant No. NG Ċ
20150801, the National Natural Science Foundation under Grant No. 51205230 and the Natural Science Foundation
of Hubei Province of China under Grant No. 2015CFB445.

References

[1] J. W. Baik, H. W. Kang, C. W. Kang, M.Song, The Optimal Control Limit of a G –EWMAG Control Chart, Int.
J. Adv. Manuf. Technol. 56 (2011) 161-175.
288 Chunhua Zhao et al. / Procedia Engineering 174 (2017) 281 – 288

[2] S. K. Gauri, S. Chakraborty, Recognition of Control Chart Patterns Using Improved Selection of Features
Comput, Ind. Eng. 56 (2009) 1577-1588.
[3] S. Haghtalab, P. Xanthopoulos, K. Madani, A robust unsupervised consensus control chart pattern recognition
framework, Expert. Syst. Appl. 42 (2015) 6767-6776.
[4] S. C. Du, D. L. Huang, J. Lv, Recognition of concurrent control chart patterns using wavelet transform
decomposition and multiclass support vector machines, Comput. Ind. Eng. 66 (2013) 683–695.
[5] A. Cohen, T. Tiplica, A. Kobi, Design of experiments and statistical process control using wavelets analysis,
Control. Eng. Pract. 49 (2016) 129-138.
[6] W. Hachicha, A. Ghorbel, A survey of control-chart pattern-recognition literature (1991–2010) based on a new
conceptual classification scheme, Comput. Ind. Eng. 63 (2012) 204-222.
[7] V. Ranaee, A. Ebrahimzadeh, Control Chart Pattern Recognition Using a Novel Hybrid Intelligent Method[J].
Applied Soft Computing, Appl. Soft. Comput. 11 (2011) 2676-2686.
[8] C. J. Lu, Y. E. Shao, P. H. Li, Mixture control chart patterns recognition using independent component analysis
and support vector machine, Neurocomputing.74 (2011) 1908-1914.
[9] T. F. Li, S. Hu, Z.Y. Wei, Y. J. Han, PCA-SVM for control chart recognition of genetic optimization[J].
Application Research of Computers, Appl. Res. Comp. 29 (2012) 4538-4541.
[10] Q. Xiang, L. Xu, B. Liu, Z. J. Lv, J. G. Yang, Processing Anomaly Detection Based on Rough Set and Support
Vector Machine, Comput. Integr Manuf. 21 (2015) 2467-2474.
[11] S. M. T. Fatemi Ghomi, S. A. Lesany, A. Koochakzadeh, Recognition of Unnatural Patterns in Process Control
Charts Through Combining Two Types of Neural Networks, Appl. Soft. Comput. 11 (2011) 5444-5456.
[12] C. S. Cheng, K. K. Huang, P. W. Chen, Recognition of Control Chart Patterns Using a Neural Network-Based
Pattern Recognizer with Features Extracted from Correlation Analysis, Pattern Anal Applic, 18 (2015) 75-86.
[13] Y. M. Zhao, Z. He, S. G. He, M. Zhang, Support vector machine based on particle swarm optimization for
monitoring mean shift signals in multivariate control charts, J. Tianjin. Univ. 46 (2013) 469-475.
[14] M. Zhang, W. Cheng, Recognition of Mixture Control Chart Pattern Using Multiclass Support Vector Machine
and Genetic Algorithm Based on Statistical and Shape Features, Math. Probl. Eng. 2015 (2015) 1-10.
[15] S. J. Lee, H. Zhao, Recognition of Control Chart Patterns Based on Feature Fusion with Support Vector
Machine, Appl. Res. Comput. 31 (2014) 937-941.
[16] S. T. Roweis, L. K. Saul, Nonlinear dimensionality reduction by locally linear embedding, Sci. 290 (2000)
2323-6.
[17] Y. H. Liu, Y. S. Zhang, Z. W. Yu, M. Zeng, Incremental supervised locally linear embedding for machinery
fault diagnosis, Eng. Appl. Artif. Intel. 50 (2016) 60-70.
[18] F. Hu, C. T. Wang, Y. C. Wu, L. Z. Fan, Fault features extraction based on improved supervised locally linear
embedding, J. Vib. Shock. 34 (2015) 119-123.
[19] J. A. Lee, M. Verleysen, Quality assessment of dimensionality reduction: Rank-based criteria, Neurocomputing,
72 (2009) 1431-1443.
[20] F. Hu, X. Su, W. Liu, Y. C. Wu, L. Z. Fan, Fault feature extraction based on improved locally linear embedding,
J. Vib. Shock. 34 (2015) 201-204.
[21] J. Malik, J. Shi, Normalized cuts and image segmentation, IEEE Trans. pattern Anal. mach. intell, 22 (2000)
888-905.

You might also like