Professional Documents
Culture Documents
Robust Multivariate Control Charts Based On Birnbaum-Saunders Distributions
Robust Multivariate Control Charts Based On Birnbaum-Saunders Distributions
Carolina Marchant, Víctor Leiva, Francisco José A. Cysneiros & Shuangzhe Liu
To cite this article: Carolina Marchant, Víctor Leiva, Francisco José A. Cysneiros & Shuangzhe
Liu (2017): Robust multivariate control charts based on Birnbaum–Saunders distributions, Journal
of Statistical Computation and Simulation, DOI: 10.1080/00949655.2017.1381699
Download by: [Australian Catholic University] Date: 10 October 2017, At: 06:57
JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2017
https://doi.org/10.1080/00949655.2017.1381699
Pernambuco, Recife, Brazil; e Faculty of Education, Science, Technology and Mathematics, University of
Canberra, Canberra, Australia
1. Introduction
Shewhart [1] introduced univariate control charts to monitor the process quality through
a statistical sample. However, there is often a need to monitor several quality characteris-
tics of a process simultaneously. If these characteristics are correlated, then using separate
univariate control charts for individual monitoring may not be adequate in order to assess
changes in the overall quality of the process. Thus, it is desirable to have tools that can
jointly monitor all of these characteristics. Such tools include multivariate control charts,
which are commonly used for this type of joint monitoring; see Alt [2]. These charts may
take into account the global nature of the control scheme and the correlation structure
between the quality characteristics. The main objective of a multivariate control chart is to
detect the presence of special causes of variation in a process. In addition, a control chart
can be employed to identify multivariate outliers, mean shifts, and other distributional
deviations from the in-control distribution.
Hotelling [3] was the first to analyse correlated random variables in quality control. He
developed a procedure based on a statistical distance, which extends the Student-t (called
t hereafter) statistic to the multivariate case. This was later named the Hotelling T 2 statis-
tic in his honour. The standard T 2 statistic is a useful tool for multivariate process control
under normality. Specifically, it assumes that the vector summarizing the quality character-
istics, X namely, follows a multivariate normal distribution. Then, the standard T 2 statistic
is defined by Ts2 = n(X̄ − μ) S−1 (X̄ − μ), where n is the sample size, X̄ is the vector of
the sample means, μ is the vector of population means and S−1 is the sample variance-
Downloaded by [Australian Catholic University] at 06:57 10 October 2017
covariance matrix. After Hotelling’s [3] work, there was no significant research done in
this field until the last few decades, when interest in multivariate statistical quality control
was revived due to advances in computing. Since then, a number of authors have done
some studies in the area of multivariate quality control based on the normal distribution;
see, for example, Lowry and Montgomery [4].
In standard control charts, it is usually supposed that the data follow a normal distribu-
tion. However, there are many practical applications in which the normality assumption is
not fulfilled, because the data exhibit asymmetrical patterns or heavy tails. Some authors
have developed multivariate control charts upon non-normality. Liu and Tang [5] proposed
that, when one is not sure of the data normality, the parametric bootstrap method may be
used to determine the control limits. They showed that obtention of the control limits by
means of bootstrapping is generally better than the normal approach based on the central
limit theorem. The non-parametric bootstrap method can also be applied to control charts,
thereby eliminating the traditional parametric assumption; see Liu and Tang [5]. Hence,
bootstrap methods may be considered when the distribution of the statistic employed to
monitor the process is not available. The limits of bootstrap-based T 2 control charts are
calculated using the quantiles of distribution of the T 2 statistic derived from bootstrap
samples. Stoumbos and Sullivan [6] investigated the effects of non-normality on the statis-
tical performance of the exponentially weighted moving average chart, and its special case,
the Hotelling chart. Alfaro and Ortega [7,8] proposed robust Hotelling charts based on the
multivariate t distribution for individual observations.
Multivariate control charts under normality use mean vector and variance–covariance
matrix estimators, which are sensitive to outliers in Phase I; see details of Phases I and II of
a control chart in Section 2.4. A univariate outlier is defined as an observation that devi-
ates greatly from other data points indicating that this observation could be generated by
a different mechanism; see Hawkins [9]. Multivariate outliers are considered to be atyp-
ical by not taking the value in a given random variable, but in all the multivariate set of
random variables; see Becker and Gather [10]. Multivariate outliers are more difficult to
identify than univariate outliers, since they could not be considered as outliers when you
have a single variable under study. Their presence has further detrimental effects than the
univariate case. This is because not only they distort the position (mean) or dispersion
(variance) of the observations, but they also distort the correlations between the variables;
see Rocke and Woodruff [11]. Multivariate outliers greatly influence the resulting estimates
and cause any out-of-control status to remain undetected. The identification of outliers in
JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION 3
multivariate data is usually based on the Mahalanobis distance (MD); see Marchant et al.
[12]. However, sometimes outliers do not have a large MD, which is known as the masking
effect. This effect is due to the fact that the estimators based on the model employed to
generate the MD are statistically non-robust; see, for example, Rocke and Woodruff [11]
and Becker and Gather [10]. Masking effects occur when a group of extreme observations
distorts the estimates of the mean vector and/or variance–covariance matrix, resulting in
a small distance from the outlier to the mean.
Jensen et al. [13], Chenouri et al. [14], Chenouri and Variyath [15], and Alfaro and
Ortega [8] studied the behaviour of different robust alternatives for estimating the pro-
cess parameters in multivariate control charts. This allowed the researchers to avoid the
negative effect of outliers. Alfaro and Ortega [7] proposed a robust T 2 control chart to pro-
tect it in the presence of outliers in Phase I, when the data are multivariate t distributed,
Downloaded by [Australian Catholic University] at 06:57 10 October 2017
are computed assuming an in-control status from the bootstrap distribution of this statis-
tic; (d) the MD to check the model and to detect multivariate outliers; (e) the Monte Carlo
(MC) method to conduct the simulation study; and (f) the R software (www.R-project.org)
to implement the proposed methodology by a computational routine. We employ this rou-
tine to make an illustration with multivariate air quality data from the city of Santiago,
Chile, collected by the Chilean official environmental authority; see CONAMA [29].
The paper is organized as follows. In Section 2, we provide a background on multivariate
GBS and related distributions, the MD and control charts, whereas Section 3 derives the
proposed methodology for multivariate control charts. In Section 4, an MC simulation
study is carried out to evaluate the performance of this methodology and to compare it to a
standard methodology. In Section 5, we apply our methodology to real-world multivariate
air quality data. Finally, Section 6 discusses some conclusions and future studies related to
the topic of this paper.
2. Background
In this section, we provide some results on the multivariate GBS distribution and its log-
arithmic version, named log-GBS in short. In addition, the MD and general concepts on
multivariate quality control charts are also provided.
−1/2 −1/2
= D( 0 ) 0 D( 0 ) = D( −1/2 ) D( −1/2 ), (1)
JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION 5
where
−1/2 −1/2 −1/2
D( −1/2 ) = diag(σ11 , . . . , σpp ), D( 0 ) = diag((c0 σ11 )−1/2 , . . . , (c0 σpp )−1/2 ),
Downloaded by [Australian Catholic University] at 06:57 10 October 2017
fECp (x; , g (p) ) = c(p) ||−1/2 g (p) (x −1 x), x = (x1 , . . . , xp ) ∈ Rp , (2)
where c(p) > 0 is a normalizing constant and, as mentioned, g (p) > 0 is a p-variate PDF
generator. Constants and PDF generators of the p-variate normal and t distributions are
presented in Table 1, where denotes the gamma function. The cumulative distribution
function (CDF) of X is denoted by FECp .
Let the p × 1 random vector V = (V1 , . . . , Vp ) have a p-variate GBS (GBSp ) distri-
bution with p × 1 vectors of parameters α = (α1 , . . . , αp ) and β = (β1 , . . . , βp ) , EC
PDF generator g (p) > 0 and p × p scale and correlation matrices = (σrs ) and = (ψrs ),
respectively. Observe that, in this case, σkk = 1 for all k = 1, . . . , p, from which and
based on (1), = , inducting the notation V ∼ GBSp (α, β, , g (p) ). For more details
of multivariate GBS distributions, the interested reader is referred to Kundu et al. [27].
p
1 yj − μj
fY (y; α, μ, , g (p) ) = fECp (B; , g (p) ) cosh , y ∈ Rp ,
j=1
αj 2
where fECp is the p-variate EC PDF as given in (2). If Y ∼ log-GBSp (α, μ, , g (p) ), then:
From Property (A1), for the parameter θ = (α , μ , svec() ) , with ‘svec’ denoting
Downloaded by [Australian Catholic University] at 06:57 10 October 2017
Observe that θ does not consider the degrees of freedom ν of the p-variate log-BS-t (log-
BS-tp ) distribution due to a robustness aspect in the ML estimation procedure, which we
detail in Section 2.3. In addition, note that:
(i) MDi (θ) ∼ χ 2 (p), if g (p) is the p-variate normal PDF generator; and
(ii) MDi (θ)/p ∼ F (p, ν), if g (p) is the p-variate t PDF generator, where F (p, ν) denotes
the Fisher distribution with ν degrees of freedom in the numerator and p in the
denominator.
When evaluated at the ML estimate of θ, the MD for the case i defined in (4) is useful,
as mentioned, for assessing multivariate outliers and model checking. For more details
of multivariate log-GBS distributions, the interested reader is referred to Marchant et al.
[12,26] and Garcia-Papani et al. [33].
expressed as
⎛ ⎞
n
n
p
1 y ij − μj
(θ) =
i (θ ) = ⎝log(fECp (Bi ; , g (p) )) + log cosh ⎠ , (5)
i=1 i=1 j=1
αj 2
where fECp is given in (2) and Bi = (Bi1 , . . . , Bip ) , with elements obtained from (3) as
2 yij − μj
Bij = sinh , i = 1, . . . , n, j = 1, . . . , p. (6)
αj 2
From Table 1, if g (p) is the p-variate normal or t PDF generator, then the p-variate log-BS
(log-BSp ) and log-BS-tp distributions are obtained. Thus, by using (5), the corresponding
log-likelihood functions for θ in the case i are expressed respectively as
Downloaded by [Australian Catholic University] at 06:57 10 October 2017
p
p 1 1 −1 1 yij − μj
i (θ) = − log(2π ) − log(||) − Bi Bi + log cosh ,
2 2 2 j=1
αj 2
ν
ν+p p ν+p 1
i (θ) = − log +log − log(νπ ) + log(ν) − log(||)
2 2 2 2 2
p
ν+p 1 yij − μj
− log(ν + B −1
i Bi ) + log cosh ,
2 j=1
αj 2
where the elements of Bi are defined in (6). In order to compute the ML estimates of
the multivariate log-GBS distribution parameters, the log-likelihood function given in (5)
must be maximized. In this case, the corresponding likelihood equations must be solved by
a non-linear iterative procedure, such as the Broyden–Fletcher–Goldfarb–Shanno quasi-
Newton method. The function optim of the R software has implemented this iterative
procedure, whose initial values in our case are obtained as
Note that the ν parameter of the log-BS-tp distribution is not estimated but fixed at ν =
4. This is because, as pointed by Lucas [34], Paula et al. [25], Marchant et al. [26] and refer-
ences therein, the influence function based on the t distribution is bounded only when ν is
fixed, producing robust parameter estimates. Thus, we work with a type of log-likelihood
function profiled at ν = 4, value that often maximizes the log-likelihood function.
(i) Definition of a centre line (CL), which represents a prefixed parameter value, usu-
Downloaded by [Australian Catholic University] at 06:57 10 October 2017
ally the mean vector, or another parameter of interest, called the target, of the quality
characteristics of the process to be monitored.
(ii) Establishment of lower and upper control limits (LCL and UCL, respectively), based
on subgroups of data from an in-control status of the underlying process, which set a
distance above and below the CL.
(iii) Plotting of points, each of them representing a future (new) subgroup of data, which
are not taken from an in-control status of the underlying process, unless a clear
indication of no changes in the process exists.
LCL and UCL provide a visual display for the expected amount of data dispersion. The
control limits are based on the actual behaviour of the process, not the desired behaviour
nor specification limits. A process can be in control and yet not be capable of meet-
ing requirements; see Leiva et al. [35]. Note that the steps (i)–(iii), above mentioned,
used to construct a multivariate control chart, are divided in the following two phases
(see [2]):
Phase I: In this phase related to steps (i)– (ii), a data set of size N = k × n is taken from
an in-control status of the underlying process, where k ≥ 20 is the number of subgroups
and n > 1 is the size of each subgroup. This data set is used (a) to estimate the parameters
of interest, (b) to check the distributional assumption employing goodness-of-fit tools, (c)
to compute the control limits, and (d) to identify multivariate outliers.
Phase II: In this phase related to step (iii), the control limits obtained in Phase I are
utilized to assess if the data sample of a new subgroup from the underlying process is in
control or out of control. Therefore, Phase II consists of using these limits to detect any
departure of the data of a new subgroup in relation to a prefixed mean value, μ0 namely, or
another target parameter of interest. Recall that, in Phase II, the data are not taken from an
in-control process, unless no changes in the process exist. Thus, in Phase II, the monitoring
is done for each new subgroup from 1 to m.
The average run length (ARL) is the mean number of points that must be plotted before
one of them to indicate an out-of-control status. ARL can be used to evaluate the perfor-
mance of a control chart and it is calculated as one divided by the probability of one point
plotted is out of control, that is, as
1
ARL = .
Prob(one point plotted is out of control)
JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION 9
An in-control ARL is denoted by ARL0 and expressed as ARL0 = 1/η, where η represents
the probability of type-I error. Note that ARL0 usually takes values in {200, 370.4, 500,
1000}. On the one hand, the probability that an observation is considered as out of control,
if the process is actually in control, indicates a false alarm rate (FAR), which is often set
in {0.001, 0.002, 0.0027, 0.005}. On the other hand, the probability of a true out-of-control
signal may be obtained from 1 − Prob(type-II error). An out-of-control ARL is denoted
by ARL1 and calculated as ARL1 = 1/(1 − γ ), where γ is the probability of type-II error.
tic. These charts are constructed once again by considering the relationship indicated in
Section 2.2 between the multivariate GBS and log-GBS distributions. In addition, Prop-
erties (A1) and (A2) of multivariate log-GBS distributions and the MD defined in (4) are
used for such a construction. This allows us to get the adapted Ta2 statistic as indicated
below. We employ the bootstrap method to determine the control limits in Phase I. Then,
we formulate these charts to monitor a process in Phase II.
where B̄ = ni=1 Bi /n and C = ni=1 Bi B i . Note that, if Y ∼ log-BSp (α, μ, ), Ta given
2
in (8) follows a Fisher distribution with p and n−p degrees of freedom, that is, Ta2 ∼
F (p, n − p); see Kundu [36]. However, for the wide family of multivariate log-GBS dis-
tributions, this result is not valid. Particularly, for the multivariate t distribution, Kotz and
Nadarajah [37, p. 199–200] mentioned that the Hotelling statistic has no closed form. This
also occurs with the Hotelling statistic adapted to the multivariate log-BS-t distribution. We
propose a multivariate GBS control chart based on the adaptation of the Hotelling statistic
given in (8), using the multivariate log-GBS distribution as an intermediary between the
multivariate GBS distribution and its adapted Hotelling statistic. In Section 3.2, we discuss
a manner to find the distribution of this statistic. Once such a distribution is obtained, we
can compute its quantiles and then the corresponding limits of the multivariate quality
control chart.
Downloaded by [Australian Catholic University] at 06:57 10 October 2017
3.3. Phase I
As mentioned in Section 2.4, control limits must be obtained in Phase I. According to
Duncan [40], in this phase such limits must be constructed with data of several subgroups
JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION 11
from an in-control status of the process to be monitored. Thus, these control limits are use-
ful for monitoring the underlying process over time, for example based on the Hotelling
statistic. Algorithm 3 details how to compute the limits of multivariate GBS control charts
with the bootstrap distribution of the Ta2 statistic defined in (8); see Section 3.2. In addition
to the construction of control limits, in Phase I, it is also necessary to check the distribu-
tional assumption by using goodness-of-fit tools and to assess multivariate outliers with
suitable methods. Control charts often have both LCL and UCL, but sometimes only an
UCL is considered; see, for example, Alfaro and Ortega [7].
3.4. Phase II
As also mentioned in Section 2.4, multivariate GBS quality control charts, obtained in
Phase I, must be used in Phase II to test if the process to be monitored remains in con-
trol when data of new subgroups are collected. In Phase II, the adapted Hotelling statistic
is denoted by Ta2new . Then, we employ multivariate GBS control charts to plot the sequence
of values for the Ta2new statistic calculated as in (8), with the m subgroups employed in this
phase. Algorithm 4 describes how the p-variate control chart based on multivariate GBS
distributions is utilized to monitor the underlying process.
4. Simulation study
In this section, by using MC simulations, we evaluate the proposed methodology and
a standard methodology both in Phase I and in Phase II. Specifically, we compare the
12 C. MARCHANT ET AL.
Algorithm 4 Process monitoring using the multivariate GBS chart in Phase II.
1: Repeat Steps 1-2 of Algorithm 3, obtaining a new transformed data vector y =
hi
(yhi1 , . . . , yhip ) , for h = 1, . . . , m and i = 1, . . . , n, but now the new original data are
not collected from an in-control process necessarily.
2: Calculate the Ta2 statistic for each sample of new transformed data obtained in Step
new
1, generated in the hth subgroup, with h = 1, . . . , m, for regular time intervals, getting
ta2new , . . . , ta2newm .
1
3: Plot the points ta2 , . . . , ta2newm in the GBSp control chart with limits generated by
new1
Algorithm 3.
4: Declare the process as in control if all points ta2 , . . . , ta2newm fall between LCL and
new1
UCL obtained in Algorithm 3; otherwise, that is, if any of the points ta2new , . . . , ta2newm
Downloaded by [Australian Catholic University] at 06:57 10 October 2017
1
falls below the LCL or above the UCL, the process is in an out-of-control status.
behaviour of the Ta2 statistic under a multivariate GBS distribution with the Ts2 statis-
tic under a multivariate normal distribution. We perturb the new subgroups in Phase II
considering three levels (low, moderate, and high) and evaluate the performance of the
multivariate GBS control chart by means of the detection rate of the new subgroup, which
is obtained as the proportion of values for the corresponding Hotelling statistic that are
located above the UCL. We use the R software to carry out this simulation.
4.1. Phase I
In this phase, we consider the following simulation scenario. Using Algorithms 1
and 2, we generate B = 10, 000 bootstrap samples for k = 20 subgroups of sizes n ∈
{5, 10, 25, 50, 100} from multivariate log-BS and log-BS-t distributions. For each bootstrap
sample, we compute its Ta2 statistic with the formula defined in (8) and the Ts2 statis-
tic, obtaining ta2∗1 , . . . , ta2∗10000 and ts2∗
1
, . . . , ts2∗
10000
, respectively. We employ an overall FAR
η = 0.0027 to compute the LCL and UCL based on the 0.27th and 99.73th quantiles of the
10,000 bootstrap values for the corresponding statistics, respectively. Table 2 provides the
LCL and UCL calculated with these statistics. We consider data generated from BSp and
BS-tp distributions and their corresponding logarithm transformations with p ∈ {2, 3, 4}
and true values for their parameters established respectively as
1.0
0.8
(a) p = 2, α = (0.4, 0.5) , μ = (2, 1) and = .
0.8
1.0
⎛ ⎞
1.0 0.8 0.5
(b) p = 3, α = (0.4, 0.5, 0.4) , μ = (2, 1, 5) and = ⎝0.8 1.0 0.2⎠.
0.5 0.2 1.0
⎛ ⎞
1.0 0.8 0.5 0.2
⎜ 0.8 1.0 0.7 0.3⎟
⎟.
(c) p = 4, α = (0.4, 0.5, 0.4, 0.5) , μ = (2, 1, 5, 3) and = ⎜ ⎝0.5 0.7 1.0 0.5⎠
0.2 0.3 0.5 1.0
JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION 13
Table 2. Simulated limits for the GBSp control chart with η = 0.0027 for the indicated n, l, p, static, and
distribution.
UCL
p=2 p=3 p=4
Ta2 Ta2 Ta2
n Ts2 BS2 BS-t2 Ts2 BS3 BS-t3 Ts2 BS4 BS-t4
5 10.5221 11.6027 6.7079 9.3715 14.8541 8.2944 11.5399 16.0207 9.3704
10 10.0075 11.1691 6.0518 8.6043 13.6852 7.4770 9.6871 15.9451 8.8128
25 9.5291 10.7939 5.8128 8.5067 13.7052 7.1973 9.6420 15.7863 8.6813
50 8.7918 10.9625 5.6462 7.8080 13.5828 6.8775 8.7755 15.5876 7.9276
100 7.3866 10.9106 5.3870 7.3844 13.4448 6.6943 8.3706 15.5746 7.8409
LCL
5 0.0029 0.0041 0.0031 0.0219 0.0472 0.0265 0.0634 0.1810 0.0578
Downloaded by [Australian Catholic University] at 06:57 10 October 2017
The values presented in (a)–(c) above are chosen following the criteria: (i) αj ≤ 0.5
according to Marchant et al. [26]; (ii) μj ∈ {1, 2, 3, 5} based on Alfaro and Ortega [7]; and
(iii) values for corresponding to small, medium, and large correlations. We use ν = 4
for the multivariate log-BS-t distribution according to the robustness aspects mentioned
in Section 2.3. From Table 2, as n increases, the control limits become narrower for both
statistics, Ta2 and Ts2 namely, whereas the UCL decreases progressively. Note that the LCLs
are very close to zero, which is a reason why this control limit is often not calculated and
set as zero. For a fixed n, the control limits of Ta2 based on the multivariate log-BS-t distri-
bution are narrower than those based on the multivariate log-BS distribution. In addition,
the control limits of Ta2 based on the multivariate log-BS-t distribution are narrower than
in the Ts2 statistic under normality. This can be attributed to the non-robustness to out-
liers of the ML estimation of parameters in the cases of multivariate log-BS and normal
distributions, possibly affecting the detection of out-of-control status in Phase II.
4.2. Phase II
In this phase, we generate m = 30 new subgroups using the same scenario of Phase I. We
employ the function T2.2(data, estat, n) of an R package named IQCC to cal-
culate the Ts2 statistic for multivariate observations at Phase II; see https://cran.r-project.
org/web/packages/IQCC/IQCC.pdf. We compute Ta2new with Algorithm 4 and generate
M = 5000 MC replications. We perturb a number l of the m new subgroups with one obser-
vation in each subgroup and consider three perturbation levels (low, moderate, and high),
corresponding to one (l = 1), five (l = 5) , and ten (l = 10) perturbed subgroups. This per-
turbation is given by yij∗ = yij + 4SYj , where SYj represents the standard deviation of the
quality characteristic Yj , for i = 1, . . . , n and j = 1, . . . , p. When a subgroup is not per-
turbed, l = 0. The performance of the multivariate GBS control chart is judged in terms of
the detection rate of the new subgroup, which is obtained as the proportion of values for the
corresponding Hotelling statistic that are located above the UCL. The results of this simula-
tion are shown in Table 3. From this table, note that the multivariate BS-t control chart per-
forms well in detecting out-of-control status in Phase II. Observe that the low and moderate
14 C. MARCHANT ET AL.
Table 3. Out-of-control detection rate in Phase II for indicated n, l, p, statistic and target in the BS-tp
distribution.
p=2 p=3 p=4
μ0 = (2, 1) μ0 = (2, 1, 5) μ0 = (2, 1, 5, 3)
n l Ts2 Ta2 Ts2 Ta2 Ts2 Ta2
5 0 0.1300 0.4320 0.5330 0.9790 0.4940 1.0000
1 0.1490 0.4960 0.5300 0.9810 0.4550 1.0000
5 0.0358 0.2278 0.1530 0.8526 0.1200 1.0000
10 0.0133 0.1597 0.0746 0.5922 0.0635 0.9875
10 0 0.1770 0.8210 0.6360 0.9940 0.7680 0.9950
1 0.1860 0.8960 0.6620 0.9980 0.7500 0.9960
5 0.0396 0.7706 0.2146 0.9768 0.2666 0.9704
10 0.0207 0.6235 0.1092 0.9145 0.1352 0.9037
Downloaded by [Australian Catholic University] at 06:57 10 October 2017
perturbations are well detected by the control chart. Also, notice that the Ta2 statistic based
on the multivariate BS-t distribution has a better performance than the Ts2 statistic, as
the number p of quality characteristics increases. In addition, as the level of perturbation
increases, the detection of out-of-control status decreases due to the masking effect, espe-
cially when 10 (l = 10) new subgroups are perturbed. However, this effect is attenuated in
the case of the multivariate BS-t chart due to the robustness of its estimation procedure
in relation to the BS chart (omitted here) and to the chart based on the Ts2 statistic under
normality. In general, when the subgroup size n increases, the performance to detect an
out-of-control status also improves for both statistics under study, Ta2 and Ts2 namely.
5. Data analysis
In this section, we apply the proposed methodology to real-world air quality data. This
methodology was implemented by a computational routine in R code.
than 10 mm are called PM10, whereas those less than 2.5 mm are called PM2.5. As size
decreases, there is a higher possibility for PM to penetrate deeper into smaller alveoli and
airways. In particular, various effects are produced from exposure to PM, but the nature of
those induced effects varies according to the PM composition. Indeed, there is evidence of
an increase in the risk of cardiovascular diseases and mortality from exposure to PM2.5,
which occurs even after short time periods, such as hours or weeks. For more details about
PM concentrations, the interested reader is referred to Marchant et al. [41]. Because of
a combination of anthropogenic, meteorological, and topographic factors, Santiago, the
capital city of Chile, endures bad atmospheric ventilation increasing the accumulation of
PM and gaseous contaminants, mainly from 1 April to 31 August of each year. Gaseous
contaminants, such as carbon monoxide (CO), NO2 , SO2 , and PM, are the main contrib-
utors to air quality problems in Santiago, Chile. In addition, NO2 and SO2 are precursors
Downloaded by [Australian Catholic University] at 06:57 10 October 2017
of PM; see Marchant et al. [41]. The current official methodology used by the authority in
Santiago, Chile, for predicting PM10 concentrations is based on a multiple linear regres-
sion. It helps to forecast the maximum value of the 24-h average concentration of PM10
in μ g/normalized cubic meters (m3 N) for the period from 00:00 to 24:00 hours of the
following day. Furthermore, in the year 2015, through Supreme Decree number 15/2015
and resolution number 9664/2015, it was instructed by the Chilean Ministry of Health to
declare sanitary alert employing PM2.5 concentrations. This decree establishes the faculty
of managing critical events by PM2.5 as a complementary measure. It leads to the need
to develop multivariate tools in order to model and monitor PM10 and PM2.5 concen-
trations simultaneously, predicting critical periods of contamination adequately. In our
illustration, we use the following variables: (X1 ) PM2.5 in μ g/m3 N; and (X2 ) PM10 in
μg/m3 N. The data to be considered were collected by the Chilean Metropolitan Envi-
ronmental Health Service. We utilize these data for our analysis, which are available at
http://sinca.mma.gob.cl. The data were collected in 2003 as 1-h (hourly) average values, at
monitoring stations: (a) Las Condes and (b) Pudahuel, located in Santiago. We select data
from these stations mainly because they are more suitable to conditions of low and high
stability, allowing us to analyse different pollution patterns. Chilean guidelines for air qual-
ity are established by its Ministry of the Environment. The maximum concentrations (in
μ g/m3 N) according to these guidelines are 50 and 150, during 24-h for PM2.5 and PM10,
respectively. These values are considered as the targets in our illustration.
5.3. Phase I
We initially use concentrations for the months of January and February to calculate the
control limits according to Algorithm 3 with k = 59, n = 24, N = 1416, B = 10, 000, and
16 C. MARCHANT ET AL.
Downloaded by [Australian Catholic University] at 06:57 10 October 2017
Figure 1. Scatter-plots with their corresponding correlations of the indicated variables for (left) Las
Condes and (right) Pudahuel stations.
Figure 2. PP-plots with KS acceptance regions at 5% for transformed MDs with (left) BS2 and (right)
BS-t2 distributions based on Pudahuel data.
FAR η = 0.0027. We utilize these months since their air quality is stable (considered as an
in-control status), because the meteorological and topographical conditions favour no sat-
uration of PM concentrations. We employ the transformed MD with the Wilson–Hilferty
approximation to obtain a normal distribution; see Marchant et al. [26]. Then, a goodness-
of-fit technique is considered to check step 2 of Algorithm 3; see Marchant et al. [12].
Figure 2 shows the corresponding theoretical probability versus empirical probability (PP)
plots with acceptance bands for a significance level of 5% in Pudahuel station based on BS2
and BS-t2 distributions (for Las Condes station, the results are similar so that they are omit-
ted here). From this figure, we corroborate the good fit of the BS2 and BS-t2 distributions
to the data in Phase I, which is supported by the p-values 0.245 and 0.227, respectively, of
the Kolmogorov–Smirnov (KS) test associated with these PP-plots; see Marchant et al. [26].
We use ν = 4 for the log-BS-t2 distribution according to the robustness aspects mentioned
in Section 2.3.
JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION 17
5.4. Phase II
We use the control limits obtained in Phase I (see Section 5.3) to monitor April of 2003;
see Marchant et al. [41]. For the control chart of this month, the number of subgroups
and the subgroup size are m = 30 days and n = 24 hours, respectively, giving a total of 720
observations. We employ the transformed MD to assess the fit of the most appropriate
distribution to these data. Figure 3 displays the PP-plots with acceptance bands for a sig-
nificance level of 5%. From this figure, we detect that the BS-t2 distribution provides a
better fit than the BS2 distribution for both stations, which is corroborated by the p-values
0.901 (Las Condes/BS-t) and 0.520 (Pudahuel/BS-t) versus 0.095 (Las Condes/BS) and
0.238 (Pudahuel/BS) of the KS test associated with these PP-plots. We employ the MD as
a measure to detect multivariate outliers. Figures 4 (left) and (right) depict graphical plots
for both stations with the BS-t2 distribution. From the figure, note that 19 April (labelled as
Downloaded by [Australian Catholic University] at 06:57 10 October 2017
19) is detected as a multivariate outlier. However, this observation does not influence the
control charts shown in Figure 5 (left) and (right) for both stations due to the robustness
of the ML estimation when the BS-t2 distribution is used. It is known that the Las Condes
station is less contaminated than other stations (in 2003), due to better ventilation at its
Figure 3. PP-plots with 5% KS acceptance regions for transformed MDs with (first panel) BS2 and
(second panel) BS-t2 distributions for (left) Las Condes and (right) Pudahuel based on pollutant data.
18 C. MARCHANT ET AL.
Downloaded by [Australian Catholic University] at 06:57 10 October 2017
Figure 4. Index-plots of MDs for (left) Las Condes and (right) Pudahuel based on pollutant data using
the BS-t2 distribution.
Figure 5. Bivariate BS-t control charts for (left) Las Condes and (right) Pudahuel stations in April 2003
based on pollutant data.
high altitude; see Marchant et al. [41]. This can be the reason why such a station does not
have points outside of the limits. On the contrary, 10 April (labelled as 10) exceeds the limit
in the Pudahuel station. Therefore, an environmental alert must be declared as an out-of-
control status for the next day. Note that, if at least one of the air quality monitoring stations
presents a dangerous PM level for human health in Santiago, then an out-of-control status
must be declared. The interested reader is referred to CONAMA [29] for details of the offi-
cial decree of the Ministry of Environment of the Chilean government that indicates such a
regulation. Observe that our criterion is coherent with the official information provided by
the Chilean Ministry of Health, which established environmental alerts for the day 11 April
2003; see www.seremisaludrm.cl/sitio/pag/aire/indexjs3airee001.asp. We recommend this
methodology based on multivariate GBS control charts because these are useful for alert-
ing episodes of extreme air contamination. Thus, pollutants must be monitored to prevent
adverse effects on human health for the population of Santiago, Chile.
JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION 19
6. Concluding remarks
In this paper, we proposed a robust methodology for multivariate control charts in Phases
I and II with subgroups based on GBS distributions and the Hotelling statistic. In addi-
tion, we considered the MD to detect multivariate outliers and evaluate the adequacy of
the distributional assumption. The proposed methodology estimates the multivariate BS
and BS-t parameters with the ML method. For data following a multivariate BS-t model,
the distribution of the associated Hotelling statistic is not known in closed form. Then,
the methodology used the bootstrap method to obtain this distribution. Once such a dis-
tribution is known, the proposed methodology computed its quantiles to construct the
control limits of the multivariate chart. A Monte Carlo simulation study was conducted to
evaluate the proposed methodology in Phases I and II, comparing its behaviour with the
Hotelling statistic under normality. We concluded by means of this simulation study that
Downloaded by [Australian Catholic University] at 06:57 10 October 2017
the multivariate BS-t control chart performed well in the detection of out-of-control status
in Phase II, conducting better than the multivariate BS control chart in both phases. In
addition, the Hotelling statistic under normality does not perform well in the detection of
out-of-control status in Phase II. We illustrated the proposed methodology with real-world
air quality data of Santiago, Chile, through two monitoring stations with different climatic
conditions and pollution levels. This illustration showed that our methodology is useful
for alerting episodes of extreme air pollution and for preventing adverse effects on human
health for the population of Santiago, Chile. In addition, we demonstrated empirically the
coherence between our criterion and real-world situations in which Chilean health author-
ity ruled environmental alerts. As future research, we are planning to compare the proposed
multivariate control charts with others based on statistics proposed in the literature. For
example, control charts for individual observations, or with autocorrelation structures over
time, as well as charts for parameters different to the mean.
Acknowledgements
The authors thank editors and two referees for their constructive comments on an earlier version of
the present manuscript.
Disclosure statement
No potential conflict of interest was reported by the authors.
Funding
This research was supported partially by the Chilean Council for Scientific and Technology Research,
grant FONDECYT 1160868 (V. Leiva) and fellowship ‘Becas-Conicyt’ (C. Marchant), as well as by
the Brazilian National Council for Scientific and Technological Development (CNPq) and Brazilian
Coordination for the Improvement of Higher Level Personnel (CAPES).
ORCID
Carolina Marchant http://orcid.org/0000-0003-1832-4444
Víctor Leiva http://orcid.org/0000-0003-4755-3270
Francisco José A. Cysneiros http://orcid.org/0000-0001-6757-6969
Shuangzhe Liu http://orcid.org/0000-0002-4858-2789
20 C. MARCHANT ET AL.
References
[1] Shewhart WA. Economic control of quality of manufactured product. New York: D. Van
Nostrand Company; 1931.
[2] Alt FB. Multivariate quality control. In Kotz S, Johnson NL, and Read CB, editors. The
encyclopedia of statistical sciences. volume 6, Wiley, New York, US; 1985. p. 110–112.
[3] Hotelling H. Multivariate quality control. In: Eisenhart C, Hastay M, and Wallis WA, editors.
Techniques of statistical analysis. New York (NY): McGraw-Hill; 1947. p. 111–184.
[4] Lowry CA, Montgomery DC. A review of multivariate control charts. IEE Trans.
1995;27:800–810.
[5] Liu RY, Tang J. Control charts for dependent and independent measurements based on
boostrap methods. J Am Stat Assoc. 1997;91:1694–1700.
[6] Stoumbos GZ, Sullivan JH. Robustness to non-normality of the multivariate EWMA control
charts. J Qual Technol. 2002;34:304–315.
[7] Alfaro JL, Ortega JF. Robust Hotelling’s T 2 control charts under non-normality: the case of
Downloaded by [Australian Catholic University] at 06:57 10 October 2017
2017;31:105–124.
[34] Lucas A. Robustness of the student t based M-estimator. Commun Stat Theory Methods.
1997;41:1165–1182.
[35] Leiva V, Marchant C, Saulo H, et al. Capability indices for Birnbaum–Saunders processes
applied to electronic and food industries. J Appl Stat. 2014;41:1881–1902.
[36] Kundu D. Bivariate log-Birnbaum–Saunders distribution. Statistics. 2015;49:900–917.
[37] Kotz S, Nadarajah S. Multivariate t-distributions and their applications. New York (NY):
Cambridge University Press; 2004.
[38] Hall P. The bootstrap and edgeworth expansion. New York (NY): Springer; 2013.
[39] Leiva V, Sanhueza A, Sen PK, et al. Random number generators for the generalized Birn-
baum–Saunders distribution. J Stat Comput Simul. 2008;78:1105–1118.
[40] Duncan A. Quality control and industrial statistics. Homewood (IL): Irwin; 1986.
[41] Marchant C, Leiva V, Cavieres M, et al. Air contaminant statistical distributions with applica-
tion to PM10 in Santiago, Chile. Rev Environ Contam Toxicol. 2013;223:1–31.