Multivariate Statistical Process Control in Chromatography

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Chemometrics and

intelligent
laboratory systems
ELSEVIER Chemometrics and Intelligent LaboratorySystems 38 (1997) 51-62

Multivariate statistical process control in chromatography


A. Nijhuis a,*, S. de Jong a, B.G.M. Vandeginste a,b
a UnileverResearch Laboratorium, P.O. Box 114, 3130AC Vlaardingen, The Netherlands
b Department of Organic Chemistry, University of Gent, Krijgslaan 281, B-9000 Gent, Belgium
Received 5 December 1996; revised 2 May 1997; accepted 5 May 1997

Abstract

The need for multivariate statistical process control (MSPC) to check the performance of processes is becoming more
important as the increasing number of variables that can be measured increases. In this paper HoteUing's T 2 statistic based
on PCA is used for the development of multivariate control charts. The significant principal components (PCs) are used to
develop the T2-chart and the remaining PCs contribute to the ~ -chart. Results of applying multivariate control charts in
chromatography are presented. The application concerns a capillary gas chromatography analysis of the fatty acid composi-
tion in BCR162 (soya-maize oil). © 1997 Elsevier Science B.V.

Keywords: Multivariate statistical process control; Hotelling's T 2 statistic; Principal component analysis; Chromatography

1. Introduction and pressure can be of significant importance for the


performance of a certain chromatographic method.
In recent years the interest in quality control has Also the ageing of a colunm or, for example, a con-
strongly increased. Quality can be improved by im- taminated detector can take the system out of statisti-
plementing quality requirements to systems with the cal control. Therefore in chromatography, univariate
help of statistical tools. A well known statistical tool control charts are used to check whether an analyti-
is the Shewhart control chart, a simple plot of the cal method is in statistical control or not. The contri-
quality variable on the vertical axis and time on the bution of this paper is to show how to apply multi-
horizontal axis. Control limits indicate the range of variate control charts, based on Hotelling's T 2 statis-
variation of the quality variable. The most important tic and principal component analysis, in chromatog-
concept of Shewharts' statistical approach is that a raphy. Results of a multivariate process control are
controlled process only varies due to 'normal' causes. also discussed to illustrate the performance of such a
If the process is not in control, additional variation is process control.
possible through 'special' causes. These special
causes can be identified and eliminated to make the
process predictable [1].
2. Theory
In chromatography, process variables such as flow

To control the performance of a chromatographic


* Corresponding author. Fax: + 31-10-4605671. method one has to check the chromatographic sys-

0169-7439/97/$17.00 © 1997 Elsevier Science B.V. All rights reserved.


PII S0169-7439(97)00054-3
52 A. Nijhuis et aL / Chemometrics and Intelligent Laboratory Systems 38 (1997) 51-62

tern. A way to do this is to analyse a check sample 4"

together with the unknown samples at regular inter-


vals. The measurements of the check sample are pre-
sented in control charts. With univariate control the 2" ij

concentration of each compound of interest is inves- X2 ": ,.~ .


tigated with a control chart of its own. A univariate
/ ,.," .-.. g . /
control chart usually consists of two warning limits
(target value ___20.) and two control limits (target
o- / i I I I
:.' /
"t" • • Q •

value +30"). Different rules apply to interpret the


control charts in order to distinguish between out-
of-control and in-control situations.
(1) One or more observations outside of the con-
":~ D °
trol limits.
(2) A run of at least 8 observations, where the type .4
4 -3 -2 "i bXl i ~ 3 ;~
of run could be either a run up or down, or a run
above or below the centre line. Fig. 1. Different in-control and out-of-control observations in
(3) Two or more consecutive observations outside combination with univariate and multivariatecontrol limits (or =
0.003).
the warning limits but still inside the control limits.
(4) Four or five consecutive observations beyond
the 1 ~r limits. low side. This information is obtained by considering
(5) An unusual or nonrandom pattern in the data. the two control charts simultaneously in what is called
The most important rule and a basic criterion for a multivariate control chart, which is based on prin-
Shewhart charts is rule number 1. The supplementary cipal component analysis (PCA). A PCA can express
criteria are applied to increase the sensitivity of the correlations between variables in a relatively small
univariate control charts. A big disadvantage of the number of 'latent variables' (principal components)
univariate control charts appears as the number of which are uncorrelated.
variables to be measured increases. As mentioned In Fig. 1 different in-control and out-of-control
before, each variable is monitored with its own con- observations are given in combination with univari-
trol chart. In case of several variables an equal num- ate (30., i.e. a = 0.003) and multivariate control lim-
ber of univariate control charts is required. In prac- its ( a = 0.003). There are areas in this plot where an
tice it demands much work from the analyst to check observation can be multivariate in control and uni-
the process with these univariate charts and the prob- variate out-of-control or vice versa. For example, ob-
ability to make mistakes is larger when several con- servation A is in control both for the univariate and
trol charts should be checked [2]. multivariate case. However, observation B (low x l,
Another disadvantage is that out-of-control situa- low x2) is only in control for the multivariate case
tions can be missed due to correlation in the data set. due to the correlation structure of the data. On the
This means that a measurement can be in control in other hand, observation C (high xl, low x2) is only
all univariate charts whereas in fact it is an out-of- in control for the univariate case. For the multivari-
control situation [3]. The detection and correction of ate case, observation C is clearly out-of-control since
such situations is of paramount importance for keep- a high xl should be allocated with a high x2. Ob-
ing the within-laboratory variation under control. In servation D is out-of-control for both the univariate
case of chromatography they indicate that the corre- and the multivariate case.
lation between the analyte concentrations has An advantage of multivariate process control is
changed. For instance, it may appear that when the that only two control charts are needed, the so-called
concentration of a given analyte is at the high side, T2-chart and a ~ chart. The T2-chart is based on
the concentration of another analyte is usually at the the significant principal components while the Sv/-S-P-E
low side. It may then be abnormal that at a certain chart represents the error of the model based on the
instance both are at the high side or both are at the remaining principal components. The T2-chart has its
A. Nijhuis et aL/ Chemometrics and Intelligent Laboratory Systems 38 (1997) 51-62 53

origin in the work of Hotelling and is supported by taking into account that statistically 5% of the data
the Mahalanobis distance which takes the correlation exceeds the 2or warning limits. This means that 1
in the data into account. For a new observation (y), observation out of 20 observations will produce a
Hotelling's T2-statistic [4] is given by false warning and should not be deleted. However,
when a warning is produced by more than 5% of the
T2= ( y- "I")Ts-I( y - 7") (1)
data, abnormal observations must be deleted. After
where S is the k X k covariance matrix of the data this correction the limits are recalculated.
table with n observations and k variables and where Eq. (2) can be used for the calculation of the con-
r is the target vector of k variables. trol limit. This equation makes use of the assumption
There is only one limit for the T2-chart instead of that Hotelling's T 2 statistic can be approximated by
two for the univariate charts. This limit is the upper an F-distribution. This is true if the ith observation
control limit calculated as Yi is independent of both ~- and S. However, this as-
sumption is not true for the start-up stage of a con-
(n-1)(n+l)k
UCLr2 = n(n-k) .Fl_~(k, n-k) (2) trol chart. Gnanadesikan and Kettenring [5] have
shown, based on a result of Wilks [6], that Eq. (2) can
Here, Fl_~(k, n - k) is the 100(1 - or) percentile of be simplified by the assumption that the UCL can be
the F-distribution with k and n - k degrees of free- approximated by a beta-distribution.
dom [3].
In the presence of large correlations between the UCLr2
concentrations of the compounds in the control sam-
ple, the covariance matrix S in Eq. (1) can become
( m - 1) 2
Bl_(~/2)( p / 2 , ( m - - p - 1)/2)
nearly singular. Therefore it is common to calculate m
the TZ-value by a principal component analysis. The (5)
T2-value is then estimated from the scores t of the
multivariate measurement in the space defined by A where B 1_ ~ ( p / 2 , (m - p - 1)/2) is the 100(1 - a )
( < k) significant principal components (first term of percentile of the beta distribution with parameters
(3)), p / 2 and ( m - p - 1)/2 [7]. This approximation is
t? t, correct only when individual Yi values collected in
T 2 = E ~2 + E sz (3) the start-up stage of the process are checked to see
i=1 ti i=A+l ti whether they fall within the control limits [7].
where ti is the score of the measurement vector y, on For the Svt-S-P-Echart the weighted chi-squared dis-
the ith principal component and s ti2 is the variance of tribution ( g x ~ ) of Box [8] is used, with the weight
these scores. g and h degrees of freedom. Factors g and h are
The second part of Eq. (3) represents the contribu- calculated from the mean value (m) and variance (v)
tion to the T 2 due to pure noise. Thus, of the values of the ~ obtained for the learning
set, as g = v / 2 m and h = 2m2/v. Using these ap-
T z= TA2 + e (4) proximations the control limit for the ~ chart can
where i?2 is the value estimated from A principal be calculated as [9]:
components and e is the residual T 2.
Multivariate process control based on PCA there- UCL s¢-/~- = ~ ( v / 2 m ) X21_~ ( 2 m 2 / v ) (6)
fore consists of two charts: a i?a2-chart, which moni-
tors the multivariate distance of a new measurement Now multivariate control charts can be built without
from the target value in the reduced PCA space and inverting the covariance matrix and with control lim-
a ~-chart, which monitors the deviation from the its that are quite easily calculated from the beta-dis-
PCA model. The limits for the ~/SPE chart and the tribution and the chi-squared distribution, respec-
i?aZ-chart are calculated from a learning data set. This tively.
data set should be cleared from abnormal situations, With these two charts, different types of devia-
54 A. Nijhuis et al. / Chemometrics and Intelligent Laboratory Systems 38 (1997) 51-62

%**,,
this case the correlation is disturbed by different de-
viations in component A and B, and there are large
deviations from the target value.
\ t, c 2 With this knowledge, an out-of-control situation
can be assigned to a certain effect. However, when a
-." o°°" more specific investigation is required, the effects of
individual variables must be observed.
A critical problem is to find the correct number
(A) of significant principal components. Because in
chromatography the correlation structure can be very
weak, the number of significant PCs may be difficult
to find. There are a number of rules for deciding how
PCl ~
many PCs have to be included in the model. A very
Fig. 2. Multivariate control chart; SvrSvrSvrSvrSvrSplotted
vrS~ against PC1 and
obvious method is to select those PCs that together
PC2 (adapted from Ref. [9], see text for explanation). The cylinder
includes the normal operation conditions (NOC). account for a large percentage, e.g. 80% or 90%, of
total variance. Or choose all eigenvalues greater than
the average eigenvalue. Another approach making use
tions from normal operating conditions (NOC) can be of the eigenvalues is the scree-graph. In this graph the
defined [10]: eigenvalues are plotted against the PC number and
(1) The process is disturbed in one or more mod- then the PC must be found where the lines joining the
elled variables, without changing the model relation. points are 'steep' to the left and 'not steep' to the
The process will be out-of-control in the T2-chart, but right [11]. Other tests such as the Bartlett test or the
x/SPE will remain in control (see blocked observa- split-sample procedure are also available, but the
tions in Fig. 2). In chromatography, this means that success of these tests depends on the size of the data
there is a deviation from the target value for one or matrix [12]. In practice (with chromatography data)
more check sample components. However, the corre- the size of the learning set will be too small for a
lation structure of the components remains the same proper use of these tests.
(the model relation). For example, suppose that there An alternative is the algorithm of Dijksterhuis and
is a strong increase in the peak area of component A Heiser, who evaluated permutation tests in multivari-
and that this component A has a large correlation with ate data analysis [13]. In the permutation test ele-
component B. Due to this correlation also the peak ments in the column of a matrix are permuted inde-
area of component B has increased. In this case the pendently to remove the correlations from the data.
correlation structure is maintained and there will be The idea of this method is that the eigenvalues found
an out-of-control situation only in the T2-chart. for the permuted data matrices Xp are smaller than
(2) When a new event occurs, which is not cov- the eigenvalues found for the original data matrix X.
ered by the model, a significant deviation can be seen The probability that an eigenvalue of the original data
in the ~ - c h a r t . In this case the T2-chart will re- matrix X is exceeded by the corresponding eigen-
main in control (see triangular observations in Fig. 2). value for the permuted data, can be interpreted as the
A new event related to our example can be that the level of significance of that eigenvalue. Another pos-
peak area of component A is increased while from sibility is the evaluation of the variation of the eigen-
component B the peak area is decreased. In normal vector during cross-validation. By deleting a number
conditions the second component also had to in- of rows randomly from the data matrix and recalcu-
crease, so in this case the event is not longer covered lating the PCA, the stability of the eigenvectors
by the model and there will be an out-of-control situ- (loadings) can be evaluated. This stability can be ex-
ation only in the ~ - c h a r t . pressed by the variance of the loadings for each vari-
(3) Any combinatory effect of the first and the able or by the variance of the angles of the eigenvec-
second type causes an out-of-control situation in both tors in space. By plotting the variance of the loadings
the T 2- and x/SPE -chart (the circles in Fig. 2). So in or the variance of the PC angles as a function of the
A. Nijhuis et al. / Chemometrics and Intelligent Laboratory Systems 38 (1997) 51-62 55

PC number, the stability of the eigenvectors as a Delete n objects f r o m matrix X


function of the PC number can be evaluated. Ideally lOOx
one would expect a stepwise decrease of the stability
[ Perform PCA on matrix Xa I > I
for noise vectors.
The algorithm, which is akin to Jackknifing [14],
for the evaluation of the variance of the loadings is
I
summarised in the flow-chart given in Fig. 3.
I >° I
The algorithm for the evaluation of the variance of Clult [
the angles is divided into a few steps. First, all eigen- l:)etwveen eigvec Xs - mean eigve° ]
vectors for each dimension must be aligned in the I
same direction. Eigenvectors with an angle of about Plot angle variation vs PC number
180° are essentially equal but for a different sign. To
achieve the alignment, the inner products of all Fig. 4. Calculation of the angle variation.
eigenvectors with the 'overall'-vector for the corre-
sponding PC are calculated. The 'overaU'-vectors are
the vectors found for the complete data matrix X. If gles (or the loadings) of each PC by the correspond-
this inner product is negative then the sign of the ing mean eigenvalue Ai. Like this the variance is
eigenvector for the subset X s must be inverted. After weighted by the length (importance) of the eigenvec-
aligning all eigenvectors of a PC in more or less the tor.
same direction, a 'mean'-eigenvector can be calcu-
lated. Then the angle (~) between the subset eigen-
vector p and the corresponding 'mean'-eigenvector 3. Experimental
is computed using the following equation:
The chromatographic method used for the devel-
= arccos( ffrpi/ll ~11" IIpill)
opment of multivariate control charts is a method for
where ~ is the 'mean'-eigenvector using all data, Pi the quantification of the total content of trans fatty
the subset eigenvector for the ith subset, and [Ipll the acids in vegetable oils by capillary gas chromatogra-
norm of vector p. phy, according to the adapted standard AOCS method
For each dimension, the variance of the angles Ce lc-89: 'Fatty acid composition by GLC, cis and
can be calculated. This variance is plotted against the trans isomers', revised 1990, 1991, updated 1992.
PC number as we did before for the variance of the Analytical data on check sample BCR 162 (soya-
loadings. The flow-chart of the computation is given maize oil) were collected. The control data comprise
in Fig. 4. the fatty acid composition, obtained for 56 manual
In addition, one can divide the variance of the an- injections and 92 automatic injections by a HP prep-
station, measured on a HP5890 gas chromatograph.
Further 88 manual injections were analysed with a
DeletenobjectsfrommatrixX ] < [ CP9000 gas chromatograph. All samples were mea-
sured in a period of about 10 months. Analytes that
100x were selected for the control of the chromatographic
I
system are C16:0, C18:0, C18:1, C18:2 and C18:3.
Perform PCA on matrix Ms > I
For all data sets about a third part of the observations
was used as a learning set to calculate the control
limits and the remaining observations were used as an
Calculate variance for each PC [
observation set to evaluate the results of the multi-
I variate charts. In the example described later in this
paper a learning set and an observation set are used
Plot log(variance) vs PC number ]
(Tables 1 and 2), which were generated on the
Fig. 3. Calculation of the variance of the PC loadings. CP9000 gas chromatograph with manual injection.
56 A. Nijhuis et aL / Chemometrics and Intelligent Laboratory Systems 38 (1997) 51-62

Table 1 Table 2
Learning set CP9000 manual injection Observation set CP9000 manual injection
Obs. Analyte peak area (%) Obs. Analyte peak area (%)
C16:0 C18:0 C18:1 C18:2 C18:3 C16:0 C18:0 C18:1 C18:2 C18:3
1 10.88 2.85 24.17 56.37 4.68 1 10.76 2.92 24.39 56.17 4.68
2 10.82 2.87 24.1 56.43 4.71 2 10.73 2.89 24.2 56.42 4.72
3 10.76 2.85 24.11 56.37 4.78 3 10.70 2.92 24.37 56.29 4.65
4 10.86 2.86 24.24 56.27 4.75 4 10.77 2.88 24.33 56.23 4.68
5 10.8 2.89 24.29 56.29 4.69 5 11.03 2.85 24.14 56.26 4.71
6 10.77 2.88 24.23 56.42 4.7 6 10.73 2.88 24.36 56.27 4.66
7 10.68 2.85 24.39 56.29 4.77
7 10.74 2.87 24.2 56.35 4.76
8 10.79 2.85 24.23 56.46 4.71 8 10.72 2.88 24.23 56.36 4.77
9 10.79 2.89 24.28 56.22 4.74 9 10.76 2.88 24.24 56.31 4.72
10 10.75 2.84 24.19 56.38 4.79 10 10.75 2.87 24.22 56.37 4.72
11 10.75 2.89 24.23 56.26 4.74 11 10.76 2.88 24.18 56.2 4.69
12 10.75 2.87 24.1 56.4 4.79 12 10.68 2.89 24.19 56.31 4.75
13 10.78 2.85 24.11 56.41 4.76
14 10.8 2.88 24.13 56.36 4.74
13 10.69 2.87 24.14 56.34 4.84
14 10.66 2.9 24.18 56.37 4.77
15 10.77 2.88 24.08 56.4 4.79 15 10.65 2.88 24.13 56.38 4.81
16 10.74 2.9 24.12 56.39 4.76 16 10.66 2.89 24.23 56.25 4.71
17 10.75 2.89 24.1 56.34 4.81 17 10.61 2.89 24.18 56.34 4.83
18 10.76 2.9 24.09 56.36 4.81 18 10.66 2.88 24.14 56.35 4.81
19 10.72 2.89 24.11 56.38 4.81
20 10.73 2.95 24.38 56.15 4.69
19 10.69 2.89 24.23 56.23 4.77
21 10.73 2.93 24.33 56.18 4.73
20 10.62 2.88 24.12 56.39 4.8
21 10.58 2.89 24.14 56.32 4.81
22 10.68 2.89 24.18 56.39 4.77 22 10.50 2.9 24.24 56.36 4.75
23 10.65 2.92 24.2 56.53 4.83 23 10.50 2.93 24.19 56.4 4.78
24 10.55 2.88 24.26 56.51 4.74
25 10.64 2.89 24.31 56.34 4.76
26 10.61 2.87 24.22 56.4 4.83
27 10.66 2.87 24.14 56.47 4.79
control limits of the univariate charts must be calcu-
l a t e d f r o m t h e learning d a t a set. T h u s , t h e m e a n s (~-)
a n d t h e s t a n d a r d d e v i a t i o n s ( o r ) m u s t b e c a l c u l a t e d to
d e v e l o p t h e w a r n i n g a n d c o n t r o l l i m i t s o f + 2 o-, re-
A l l c a l c u l a t i o n s a n d g r a p h s are m a d e w i t h a n appli-
s p e c t i v e l y , ___3 o ' , g i v i n g 2 0 c o n t r o l l i m i t s to b e cal-
c a t i o n d e v e l o p e d to d e s i g n m u l t i v a r i a t e c o n t r o l c h a r t s
c u l a t e d ( f o u r c o n t r o l l i m i t s f o r e a c h chart). F o r t h e
i n c h r o m a t o g r a p h y . T h i s a p p l i c a t i o n , w r i t t e n in S A S
l e a r n i n g set o f o u r e x a m p l e ( T a b l e 1) t h e l i m i t s f o u n d
6.11, u s e s t h e s i n g u l a r v a l u e d e c o m p o s i t i o n ( S V D )
are g i v e n in T a b l e 3 ( v a l u e s are r e l a t i v e a m o u n t s ) .
[15] i n s t e a d o f t h e p r i n c i p a l c o m p o n e n t a n a l y s i s
( P C A ) , b u t t h e results are t h e s a m e [16].

Table 3
The control limits for the univariate charts
4. R e s u l t s a n d discussion
Limit C16:0 C18:0 C18:1 C18:2 C18:3

4.1. Univariate control charts - 3s 10.52 2.80 23.92 56.09 4.63


-2 s 10.59 2.83 24.01 56.18 4.67
+2s 10.89 2.93 24.38 56.54 4.84
In the univariate approach five univariate charts +3s 10.96 2.96 24.47 56.64 4.89
are u s e d to statistically c o n t r o l t h e process. First, t h e
A. Nijhuis et al. / Chemometrics and Intelligent Laboratory Systems 38 (1997) 51-62 57

C16:0 A C18:0

--3s
2.96
11.00-t
3s 2s
0 2s
tQ

J 10.75

-2s
• 2.85.

t
lo.6o 2 -3s
2.80 -3s

I026 2.75"t . . . . , .... , .... , .... , .... ,


, ,,,i,, ,,i, ,,ii,, ,,i,,I,

0 5 10 16 20 25 0 6 10 15 20 25

Tlrne Time

2,Ic18I cl3s 66.70-


_c_~_,±2_ O__]3s
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2S

8
t-
S 56.'!6
J 68,20 -2s

-3s

68.96
i,, , , I , , I , I r , , , I , , , , i , , , , i i, ,, i , , l l l l l l l l l l , i i i i i i

0 5 10 15 20 25 0 5 10 "16 20 25

~me "nrra

4~ 1 C18:3 E t3s
4,85 2s

J 4.76"

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3S

4.66
I F ' ' I I ' 1 , 1 1 1 , , 1 1 , , 1 1 1 1 , 1

0 6 10 16 20 26

Time
Fig. 5. Univariate Shewhart control charts for five analytes; C16:0, C18:0, C18:1, C18:2 and C18:3.
58 A. Nijhuis et al./ Chemometricsand Intelligent Laboratory Systems 38 (1997) 51-62

These limits are used for the five univariate control Table 5
charts to monitor the observations from the observa- The number of significantPCs for three data sets with three differ-
ent approaches
tion set (Fig. 5).
In those univariate charts a few observations are Gas chromatograph Number of significant
principal components
out-of-control. Analyte 1 (C16:0) is out-of-control for
observations 5, 22, 23 and a warning is issued for permutation varianceof variance
test loadings of angles
observation 21. In the second chart (C18:0) only ob-
servation 23 exceeds the warning limit. Further, all CP9000 (manual inj.) 2 3 3
HP5890 (manual inj.) 0 __ a __ a
observations are within the control limit. In the third HP5890 (prepstat.) 1 1 1
univariate control chart (C18:1) observation 1 is de-
tected outside the 20- warning limit and this obser- a No decrease in stability was observed.
vation is also found outside the - 2 0 - limit for the
fourth univariate control chart (C18:2). Finally, in
control chart number 5 (C18:3), observations 3, 6 and ing different numbers of significant PCs for the dif-
13 are outside the 20- warning limits. So overall, ferent methods.
three observations are out-of-control with respect to For the data sets generated on the HP5890 with a
the 3 0- control limits and 5 observations are out-of- HP prepstation, all approaches yield one significant
control with respect to the 2 0- warning limits.

4.2. Multivariate control charts A

First, the number of significant principal compo-


nents must be investigated. The T2-chart consists of ~12-
the significant PCs and the SvCSP-E-chart is based on
10"
the remaining PCs. To determine the number of sig-
nificant PCs the permutation algorithm of Dijkster- 8-
huis and Heiser has been chosen [13]. For the learn-
ing set (Table 1) this gives the results shown in Table 6-
4.
4-
The eigenvector stability was also evaluated on the I I I I I

1 2 3 4 5
chromatographic data set (Table 5). For the permuta-
tion algorithm two significant PCs are found. From PC
the plots of the eigenvector stability a shift at the third
m,0,6
PC can be noticed (Fig. 6). B
The conclusion here is that three principal compo-
--0.8
nents are significant. Thus, in comparison to the per-
O
mutation test one additional significant PC is found.
This demonstrates a traditional difficulty when find-

--1.2-
Table 4
Investigation of the significant PCs with a permutation test --1.4"
PC Eigenvalue ( )0 (~A= 1Ai//~ff= 1)~j)' 100% p-value
1 2.089 41.78 0.000 --1.6-
I I I I I
2 1.515 72.09 0.000 1 2 3 4 6
3 0.784 87.76 0.990
PC
4 0.517 98.10 0.990
5 0.095 100.00 1.000 Fig. 6. Eigenvector stability; variance of the angles (A), variance
of the loadings (B). CP9000 data for manual injection.
A. Nijhuis et al. / Chemometrics and Intelligent Laboratory Systems 38 (1997) 51-62 59

principal component. However, for the samples anal- After having determined the significant number of
ysed on the HP5890 and injected manually, the per- PCs we can start to develop the multivariate control
mutation test found no significant components, charts. Making use of Eq. (5) with m = 27, p = 5 and
whereas the two eigenvector stability approaches in- a = 0.05, the control limit for T 2 is 9.03. The same
dicate a uniform decrease in eigenvector stability for can be done for the ~ control limit but now
the f'lrst three components, followed by an increase in making use of Eq. (6). Substituting v = 9.38, m =
stability for the last component (Fig. 7). 2.89 and o~= 0.05 in Eq. (6), the control limit for
The fact that the two stability indicators yield the is 3.35. Now the T 2 and svrffp--E
- values for each
same result is not surprising, as the variance of the observation in the learning set must be calculated,
loadings is directly related to the variance of the an- using Eq. (3). The first part of the equation gives the
gles. Taking into account the values of the eigenval- values for the T2-chart and the second part the val-
ues, the permutation test seems to be the best ap- ues for the Sv~PE--chart. For this data set, using the
proach. Further, the permutation test (200 permuta- permutation test, only two PCs are significant so the
tions) produces stable results, where the two other scores of the first two PCs are used for the T2-values
approaches have a small variation in the plots when and the scores of the other PCs are used for the
the calculation is repeated with the same data. In -values. In Fig. 8 the results of the multivariate
conclusion, the permutation test is preferred when the control charts are plotted.
number of significant PCs has to be investigated. The multivariate control charts produced the fol-

14 •..0.8 ¸
A
? B

"~12" .~ --1.0

o -1.2
10-

o
8"
--1.6"
6"
--1,8"
4" --2.0"
I I ; I I I I I I I
1 2 3 4 6 1 2 3 4 5

PC PC

6- --1.0'
C I)
?
o

;> 3- ~ " --2.O

2- --2.6 ¸

1- --3.0 ¸

0- --3.6 ¸
I I I I I I I I I I

1 2 3 4 5 1 2 3 4 5

PC PC

Fig. 7. Eigenvector stability. HP5890 manual injection (A, B) and HP5890 prepstation (C, D). variance of the angles (A, C), variance of the
loadings (B, D).
60 A. Nijhuis et al. / Chemometrics and Intelligent Laboratory Systems 38 (1997) 51-62

15
A and 16 are in-control in the univariate charts but are
found to be out-of-control in the Sx/-SP-E--chart.A rea-
son for this discrepancy can be that relevant out-of-
UCL control observations are missed in the univariate
charts due to the correlation structure in the data (Fig.
1, observation C).

An interesting aspect is whether the selected num-
ber of significant eigenvectors is critical for flagging
0 , - , . , . , T , - r . l . , . , . , . , . , .
out-of-control situations. In the example discussed,
0 10 20 two PCs are found to be significant and six observa-
Observationrtr. tions are detected to be out-of-control in the multi-
variate control charts, one observation in the T2-chart
so J B and five observations in the sfsP--E--chart. If the cal-
culation of the control charts is repeated for only one
significant PC, the same six observations are found to
gaoq _ ll ft 1~cL be out-of-control (Fig. 9AB). However, in this case
all six out-of-control observations are found in the
sfsP--E--chart and in the T2-chart all observations are
in-control. Using three significant PCs the same re-
.0 " , ' , - , . , . , . , - , - , . , . , . , .
suits are found as for two significant PCs, one obser-
0 i0 20 vation is out-of-control in the T2-chart (observation
ObservaLionrtr.
number 5) and six observations in the ~ - c h a r t s
Fig. 8. The TE-chart (A) and the Svf~-chart (B) for the CP9000
check sample data based on two significant PCs.
(Fig. 9CD).
From this example it appears that the method is
robust to the number of significant PCs in the sense
that out-of-control situations are correctly flagged.
lowing results for the observation set (Table 2). In the However, the assignment of the cause may be differ-
two control charts six objects are out-of-controL In ent.
the T2-chart, observation number 5 is out-of-control Fact is that all PCs are used in the multivariate
and in the sfsP-E--chart observation numbers 11, 16, approach. No information is thrown away. If a PC is
21, 22 and 23. Objects 5, 21, 22 and 23 were also not used to calculate the TE-value then the variation
found with the univariate control charts, so the re- it describes is included for the ~ and vice versa.
suits found with the univariate charts are partly con- A data vector is always a summary of all PCs, but
firmed. However, some warning situations in the divided in a TE-part and a sVrSP-E-part (see Eq. (3)).
univariate sense (1, 3, 6 and 13) are not confirmed in Thus, when the number of PCs is underestimated
the multivariate charts. This indicates that the paral- (e.g., 1 PC instead of 2), T 2 will only flag out-of-
lel use of several univariate charts increases the control situations which occur in the direction of PC- 1
chance of false out-of-control situations, viz. (1 - (1 and not in the direction of PC-2. In general, there will
- c~)l/k). In the extreme case, when using 20 charts be less out-of-control flagging in the T 2, when the
on independent in-control processes, one may expect number of significant PCs is reduced.
that each time one of these charts will produce a This can also be seen in Fig. 10, where the scores
warning by accident. Furthermore, multivariate in- of the two first principal components are plotted for
vestigation of the univariate out-of-control observa- the CP9000 data set. Observation 5 is found to lie
tions revealed that they have a relatively high devia- separately from most of the other observations. How-
tion from the target value, but the correlation struc- ever, when a T2-chart is used with 1 principal com-
ture is still present. In such cases, observations can be ponent, this observation will be in-control, because it
univariate out-of-control and multivariate in-control is in the range defined by PC 1. When the TE-chart
(Fig. 1, observation B). On the other hand, objects 11 is based on the first two principal components, this
A. Nijhuis et al. / Chemometrics and Intelligent Laboratory System 38 (1997) 51-62 61

ao
1
B

.o
0 lo 20 0
Observationnr Obs.&tionnr. 2o

5.0 - D
/\
UCL

0
0 20
Obs%tion nr.

Fig. 9. The @-chart (A) and the m-chart (B) for the CP9OOO check sample data based on one significant PC. The T*-chart (C) and the
m-chart (D) for the CP9000 check sample data based on three significant PCs.

observation will be out-of-control because it is out of In this &%-chart all PC scores are included and the
the range defined by PC 2. chart is simply a summary of all univariate control
When no PCs are found to be significant, still a charts. For each measurement the \/SPE-value is a
multivariate control chart can be developed. In this representation of the sum of all distances from the
case one only has one control chart, the &??-chart. value of a variable to a certain ‘mean’ value. The ad-
vantage of this multivariate chart compared to uni-
variate charts is that the process can be checked with
t1 . .
. . . only one control chart.
. . ..
0
I .
C
n5 .
5. Conclusion
-1 . . n

This paper has provided an approach of MSPC


. n n
with an application in chromatography. Including
-2
1 -2 ; g
PCA in Hotelling’s T2 statistic produces a good al-
ternative to the traditional calculation of T2 where
t2 the covariance matrix has to be inverted. It was shown
Fig. 10. Scores of the PCA for PC1 versus PC2 (0’9000). that it is possible to control a chromatographic sys-
62 A. Nijhuis et al. / Chemometn'cs and Intelligent Laboratory Systems 38 (1997) 51-62

tem with a T2-chart and a SfS-P-E-chart based on a [2] D.C. Montgomery, Statistical Quality Control: an Introduc-
multivariate statistical projection method like PCA. tion, 2od ed., Wiley, New York, 1991.
[3] T. Kourti, J.F. MacGregor, Process analysis, monitoring and
A possible shortcoming of multivariate charts is diagnosis, using multivariate projection methods, Chemom.
that supplementary criteria for detecting out-of-con- Intell. Lab. Syst. 28 (1995) 3-21.
trol situations, as they are applied for the univariate [4] K.C.S. Pillal, Hotelling's T 2 Statistic, in: S. Kotz, N.L.
charts, are not yet defined for the multivariate con- Johnson (Eds.), Encyclopedia of Statistical Sciences, vol. 3,
trol charts. The only, but most important, criterion in Wiley, New-York, 1983, pp. 668-673.
[5] R. Gnanadesikan, J.R. Kettenring, Robust estimates, residu-
the multivariate case is in- or outside the control limit.
als, and outlier detection with multi response data, Biomet-
It is not possible yet to detect a trend in the observa- rics 28 (1972) 81-124.
tions for example. However, in the future such sensi- [6] S.S. Wilks, Mathematical Statistics, Wiley, New York, 1962.
tivity criteria may well be applied for multivariate [7] N.D. Tracy, J.C. Young, R.L. Mason, Multivariate control
charts. From the three approaches to investigate the charts for individual observations, J. Quality Technol. 24
(1992) 88-95.
number of significant PCs, the permutation test seems [8] G.E.P. Box, Some theorems on quadratic forms applied in the
to have the best performance. The two other ap- study of analysis of variance problems: Effect of inequality
proaches, PC stability and angle variation, are pro- of variance in one-way classification, Ann. Math. Stat. 25
ducing more or less similar results. However, our ex- (1954) 290-302.
perience is that the permutation test approach tends to [9] B. Skagerberg, J.F. MacGregor, C. Kiparissides, Multivariate
data analysis applied to low-density polyethylene reactors,
be more conservative and more reliable. Chemom. Intell. Lab. Syst. 14 (1992) 341-356.
[10] P. Nomikos, J.F. MacGregor, Multivariate SPC charts for
monitoring batch processes, Technometrics 37 (1995) 41-59.
Acknowledgements [11] I.T. Jolliffe, Principal Component Analysis, Springer-Verlag,
New York, 1986.
We thank Professor Dr. P. Sandra (University of [12] N. Cliff, Analyzing Multivariate Data, Harcourt Brace Jo-
vanovich, London, 1987.
Gent, Department of Organic Chemistry), for fruitful
[13] G.B. Dijksterhuis, W.J. Heiser, The role of permutation tests
discussions about the chromatographic aspects of this in exploratory multivariate data analysis, Food Quality Pref-
paper. Further, we thank the Standards, Measure- erence 6 (1995) 263-270.
ments and Testing ( S M & T ) project for financial [14] B. Efron, The Jacknife, the Bootstrap and other Resampling
support. Plans, Society for Industrial and Applied Mathematics,
Philadelphia, 1982.
[15] G.H. Golub, C.F. Van Loan, Matrix Computations, 2nd ed.,
Johns Hopkins University Press, Baltimore, 1989.
References [16] B.M. Wise, N.L. Ricker, D.F. Veltkamp, B.R. Kowalski, A
theoretical basis for the use of principal component models for
[1] R.J.M.M. Does, K.C.B. Roes, A. Trip, Statistische Procesbe- monitoring multivariate processes, Process Control Qual. 1
heersing in Bedrijf, Kluwer, Deventer, 1996. (1990) 41-51.

You might also like