Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

Journal of Statistical Computation and Simulation

ISSN: 0094-9655 (Print) 1563-5163 (Online) Journal homepage: http://www.tandfonline.com/loi/gscs20

Robust multivariate control charts based on


Birnbaum–Saunders distributions

Carolina Marchant, Víctor Leiva, Francisco José A. Cysneiros & Shuangzhe Liu

To cite this article: Carolina Marchant, Víctor Leiva, Francisco José A. Cysneiros & Shuangzhe
Liu (2017): Robust multivariate control charts based on Birnbaum–Saunders distributions, Journal
of Statistical Computation and Simulation, DOI: 10.1080/00949655.2017.1381699

To link to this article: http://dx.doi.org/10.1080/00949655.2017.1381699

Published online: 09 Oct 2017.

Submit your article to this journal

View related articles

View Crossmark data

Full Terms & Conditions of access and use can be found at


http://www.tandfonline.com/action/journalInformation?journalCode=gscs20

Download by: [Australian Catholic University] Date: 10 October 2017, At: 06:57
JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2017
https://doi.org/10.1080/00949655.2017.1381699

Robust multivariate control charts based on Birnbaum–


Saunders distributions
Carolina Marchant a , Víctor Leiva b,c , Francisco José A. Cysneiros d and
Shuangzhe Liu e
a Faculty of Basic Sciences, Universidad Católica del Maule, Talca, Chile; b School of Industrial Engineering,
Pontificia Universidad Católica de Valparaíso, Valparaíso, Chile; c Faculty of Administration, Accounting and
Economics, Universidade Federal de Goiás, Goiania, Brazil; d Department of Statistics, Universidade Federal de
Downloaded by [Australian Catholic University] at 06:57 10 October 2017

Pernambuco, Recife, Brazil; e Faculty of Education, Science, Technology and Mathematics, University of
Canberra, Canberra, Australia

ABSTRACT ARTICLE HISTORY


Multivariate control charts are powerful and simple visual tools for Received 29 June 2017
monitoring the quality of a process. This multivariate monitoring is Accepted 15 September 2017
carried out by considering simultaneously several correlated qual- KEYWORDS
ity characteristics and by determining whether these characteris- Average run length;
tics are in control or out of control. In this paper, we propose a bootstrapping; Hotelling
robust methodology using multivariate quality control charts for statistic; Mahalanobis
subgroups based on generalized Birnbaum–Saunders distributions distance; maximum
and an adapted Hotelling statistic. This methodology is constructed likelihood method; Monte
for Phases I and II of control charts. We estimate the corresponding Carlo simulation; multivariate
parameters with the maximum likelihood method and use paramet- non-normal distributions;
R software
ric bootstrapping to obtain the distribution of the adapted Hotelling
statistic. In addition, we consider the Mahalanobis distance to detect JEL CLASSIFICATION
multivariate outliers and use it to assess the adequacy of the distri- C12; C13; C15; C16; Q53
butional assumption. A Monte Carlo simulation study is conducted
to evaluate the proposed methodology and to compare it with a
standard methodology. This study reports the good performance of
our methodology. An illustration with real-world air quality data of
Santiago, Chile, is provided. This illustration shows that the method-
ology is useful for alerting early episodes of extreme air pollution,
thus preventing adverse effects on human health.

1. Introduction
Shewhart [1] introduced univariate control charts to monitor the process quality through
a statistical sample. However, there is often a need to monitor several quality characteris-
tics of a process simultaneously. If these characteristics are correlated, then using separate
univariate control charts for individual monitoring may not be adequate in order to assess
changes in the overall quality of the process. Thus, it is desirable to have tools that can
jointly monitor all of these characteristics. Such tools include multivariate control charts,
which are commonly used for this type of joint monitoring; see Alt [2]. These charts may
take into account the global nature of the control scheme and the correlation structure

CONTACT Víctor Leiva victorleivsanchez@gmail.com www.victorleiva.cl


© 2017 Informa UK Limited, trading as Taylor & Francis Group
2 C. MARCHANT ET AL.

between the quality characteristics. The main objective of a multivariate control chart is to
detect the presence of special causes of variation in a process. In addition, a control chart
can be employed to identify multivariate outliers, mean shifts, and other distributional
deviations from the in-control distribution.
Hotelling [3] was the first to analyse correlated random variables in quality control. He
developed a procedure based on a statistical distance, which extends the Student-t (called
t hereafter) statistic to the multivariate case. This was later named the Hotelling T 2 statis-
tic in his honour. The standard T 2 statistic is a useful tool for multivariate process control
under normality. Specifically, it assumes that the vector summarizing the quality character-
istics, X namely, follows a multivariate normal distribution. Then, the standard T 2 statistic
is defined by Ts2 = n(X̄ − μ) S−1 (X̄ − μ), where n is the sample size, X̄ is the vector of
the sample means, μ is the vector of population means and S−1 is the sample variance-
Downloaded by [Australian Catholic University] at 06:57 10 October 2017

covariance matrix. After Hotelling’s [3] work, there was no significant research done in
this field until the last few decades, when interest in multivariate statistical quality control
was revived due to advances in computing. Since then, a number of authors have done
some studies in the area of multivariate quality control based on the normal distribution;
see, for example, Lowry and Montgomery [4].
In standard control charts, it is usually supposed that the data follow a normal distribu-
tion. However, there are many practical applications in which the normality assumption is
not fulfilled, because the data exhibit asymmetrical patterns or heavy tails. Some authors
have developed multivariate control charts upon non-normality. Liu and Tang [5] proposed
that, when one is not sure of the data normality, the parametric bootstrap method may be
used to determine the control limits. They showed that obtention of the control limits by
means of bootstrapping is generally better than the normal approach based on the central
limit theorem. The non-parametric bootstrap method can also be applied to control charts,
thereby eliminating the traditional parametric assumption; see Liu and Tang [5]. Hence,
bootstrap methods may be considered when the distribution of the statistic employed to
monitor the process is not available. The limits of bootstrap-based T 2 control charts are
calculated using the quantiles of distribution of the T 2 statistic derived from bootstrap
samples. Stoumbos and Sullivan [6] investigated the effects of non-normality on the statis-
tical performance of the exponentially weighted moving average chart, and its special case,
the Hotelling chart. Alfaro and Ortega [7,8] proposed robust Hotelling charts based on the
multivariate t distribution for individual observations.
Multivariate control charts under normality use mean vector and variance–covariance
matrix estimators, which are sensitive to outliers in Phase I; see details of Phases I and II of
a control chart in Section 2.4. A univariate outlier is defined as an observation that devi-
ates greatly from other data points indicating that this observation could be generated by
a different mechanism; see Hawkins [9]. Multivariate outliers are considered to be atyp-
ical by not taking the value in a given random variable, but in all the multivariate set of
random variables; see Becker and Gather [10]. Multivariate outliers are more difficult to
identify than univariate outliers, since they could not be considered as outliers when you
have a single variable under study. Their presence has further detrimental effects than the
univariate case. This is because not only they distort the position (mean) or dispersion
(variance) of the observations, but they also distort the correlations between the variables;
see Rocke and Woodruff [11]. Multivariate outliers greatly influence the resulting estimates
and cause any out-of-control status to remain undetected. The identification of outliers in
JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION 3

multivariate data is usually based on the Mahalanobis distance (MD); see Marchant et al.
[12]. However, sometimes outliers do not have a large MD, which is known as the masking
effect. This effect is due to the fact that the estimators based on the model employed to
generate the MD are statistically non-robust; see, for example, Rocke and Woodruff [11]
and Becker and Gather [10]. Masking effects occur when a group of extreme observations
distorts the estimates of the mean vector and/or variance–covariance matrix, resulting in
a small distance from the outlier to the mean.
Jensen et al. [13], Chenouri et al. [14], Chenouri and Variyath [15], and Alfaro and
Ortega [8] studied the behaviour of different robust alternatives for estimating the pro-
cess parameters in multivariate control charts. This allowed the researchers to avoid the
negative effect of outliers. Alfaro and Ortega [7] proposed a robust T 2 control chart to pro-
tect it in the presence of outliers in Phase I, when the data are multivariate t distributed,
Downloaded by [Australian Catholic University] at 06:57 10 October 2017

thus improving the behaviour in Phase II.


The univariate Birnbaum–Saunders (BS) distribution is asymmetrical (skewed to the
right) and unimodal, having two parameters which modify its shape and scale. One of
these parameters is also its median, making the BS distribution an alternative to the nor-
mal distribution in an asymmetric framework; see more details in Saulo et al. [16]. The BS
distribution has been largely studied and applied in different areas, including engineering
and environmental sciences; see Leiva et al. [17] and Leiva [18]. Because some monitored
processes have data with asymmetric behaviour, distributions that have this pattern, as the
BS distribution, are adequate for analysing quality characteristics of this type of processes.
For studies on univariate quality control charts based on the BS distribution, see Lio and
Park [19] and Leiva et al. [20]. The BS distribution originates from the cumulative dam-
age law related to fatigue of materials; see Birnbaum and Saunders [21]. In contrast to its
original derivation, Leiva et al. [20] formalized the BS distribution as an adequate model
to describe environmental data using the proportionate-effect law. Based on its origins,
all random variable following a BS distribution can be considered as a transformation of
another random variable following a standard normal distribution; see Johnson et al. [22,
p. 651–663], Kotz et al. [23], and Balakrishnan et al. [24]. Then, due to the relationship
with the normal distribution, the parameter estimates of the BS distribution obtained from
the maximum likelihood (ML) method are sensitive to atypical observations. In order to
attenuate this sensitivity, and employing the relationship between the normal and BS dis-
tributions, one may generate a BS distribution based on the t distribution (BS-t). Thus, ML
estimates of the BS-t distribution parameters attribute smaller weights to atypical observa-
tions than the BS distribution, producing robust parameter estimators in the sense of the
MD; see Paula et al. [25] and Marchant et al. [12]. BS and BS-t distributions are members
of a wider family of distributions generated from elliptically contoured (EC) distributions,
known as generalized BS (GBS) distributions; see Marchant et al. [26]. Multivariate GBS
distributions, as well as their estimation and modelling, have been analysed by Kundu et al.
[27], Lemonte et al. [28], and Marchant et al. [12,26]. There are few works on multivariate
control charts for quality characteristics with asymmetric patterns, heavy tails and pres-
ence of atypical data. Particularly, no studies on multivariate control charts based on GBS
distributions have been reported, which consider asymmetry, heavy tails, and robustness to
outliers. In addition, multivariate data following asymmetric distributions with heavy tails
are often present in environmental sciences, where monitoring is requested. Thus, there is
a need to derive multivariate control charts based on asymmetric distributions with heavy
4 C. MARCHANT ET AL.

tails for environmental monitoring. These distributions should be justified theoretically


for being used in environmental monitoring, such as demonstrated in the case of the BS
distribution by Leiva et al. [20].
The objectives of this paper are: (i) to propose a robust methodology for multivariate
control charts in Phases I–II with subgroups based on GBS distributions and an adapted
Hotelling statistic, denoted by Ta2 ; (ii) to provide tools for assessing multivariate outliers
and the adequacy of the distributional assumption in these charts; (iii) to evaluate the
performance of the proposed methodology by means of simulations; and (iv) to apply
the methodology to multivariate real-world data. To meet these objectives, we use the
following: (a) the ML method to estimate the parameters of multivariate BS and BS-t dis-
tributions; (b) the parametric bootstrap method to obtain the distribution of the adapted
Hotelling statistic; (c) multivariate control charts to carry out the monitoring, whose limits
Downloaded by [Australian Catholic University] at 06:57 10 October 2017

are computed assuming an in-control status from the bootstrap distribution of this statis-
tic; (d) the MD to check the model and to detect multivariate outliers; (e) the Monte Carlo
(MC) method to conduct the simulation study; and (f) the R software (www.R-project.org)
to implement the proposed methodology by a computational routine. We employ this rou-
tine to make an illustration with multivariate air quality data from the city of Santiago,
Chile, collected by the Chilean official environmental authority; see CONAMA [29].
The paper is organized as follows. In Section 2, we provide a background on multivariate
GBS and related distributions, the MD and control charts, whereas Section 3 derives the
proposed methodology for multivariate control charts. In Section 4, an MC simulation
study is carried out to evaluate the performance of this methodology and to compare it to a
standard methodology. In Section 5, we apply our methodology to real-world multivariate
air quality data. Finally, Section 6 discusses some conclusions and future studies related to
the topic of this paper.

2. Background
In this section, we provide some results on the multivariate GBS distribution and its log-
arithmic version, named log-GBS in short. In addition, the MD and general concepts on
multivariate quality control charts are also provided.

2.1. Multivariate GBS distributions


Let the p × 1 random vector X = (X1 , . . . , Xp ) have an EC distribution, with p × 1 null
location vector, 0p×1 namely, p × p scale matrix,  = (σrs ) say, of rank rk() = p, and p-
variate probability density function (PDF) generator, g (p) namely. This is denoted by X ∼
ECp (0p×1 , , g (p) ). The matrix  allows us to obtain the variance–covariance matrix of
X by  0 = c0 , where c0 = E(X  X), with X  X ∼ Gχ 2 (p, g (p) ), that is, the generalized
chi-squared distribution with p degrees of freedom and p-variate PDF generator g (p) . For
details about EC and Gχ 2 distributions, the interested reader is referred to Fang et al. [30],
Fang and Zhang [31], and Gupta et al. [32]. Thus, the p × p correlation matrix  = (ψrs )
of X, with rk() = p, is given by

−1/2 −1/2
 = D( 0 ) 0 D( 0 ) = D( −1/2 ) D( −1/2 ), (1)
JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION 5

Table 1. Normalizing constant and PDF generator of the indi-


cated p-variate distribution.
Distribution c(p) g(p) (u), u > 0
1  
Normal p exp − 12 u
(2π)− 2
((ν + p)/2)  u −(ν+p)/2
t p 1+
(νπ) (ν/2)
2 ν

where
−1/2 −1/2 −1/2
D( −1/2 ) = diag(σ11 , . . . , σpp ), D( 0 ) = diag((c0 σ11 )−1/2 , . . . , (c0 σpp )−1/2 ),
Downloaded by [Australian Catholic University] at 06:57 10 October 2017

with  0 = (c0 σrs ). The PDF of X is defined as

fECp (x; , g (p) ) = c(p) ||−1/2 g (p) (x  −1 x), x = (x1 , . . . , xp ) ∈ Rp , (2)

where c(p) > 0 is a normalizing constant and, as mentioned, g (p) > 0 is a p-variate PDF
generator. Constants and PDF generators of the p-variate normal and t distributions are
presented in Table 1, where  denotes the gamma function. The cumulative distribution
function (CDF) of X is denoted by FECp .
Let the p × 1 random vector V = (V1 , . . . , Vp ) have a p-variate GBS (GBSp ) distri-
bution with p × 1 vectors of parameters α = (α1 , . . . , αp ) and β = (β1 , . . . , βp ) , EC
PDF generator g (p) > 0 and p × p scale and correlation matrices  = (σrs ) and  = (ψrs ),
respectively. Observe that, in this case, σkk = 1 for all k = 1, . . . , p, from which and
based on (1),  = , inducting the notation V ∼ GBSp (α, β, , g (p) ). For more details
of multivariate GBS distributions, the interested reader is referred to Kundu et al. [27].

2.2. Log-GBS distributions and their MD


The log-GBS distribution is often used in diverse statistical aspects related to GBS distribu-
tions; see, for example, Marchant et al. [26]. In order to derive our methodology, we employ
the multivariate log-GBS distribution as an intermediate step between the multivariate GBS
distribution and the MD.
If V = (V1 , . . . , Vp ) ∼ GBSp (α, β, , g (p) ), then Y = (Y1 , . . . , Yp ) has a p-variate
log-GBS (log-GBSp ) distribution, with Yj = log(Vj ), for j = 1, . . . , p. Its parameters
correspond to a p × 1 shape vector α = (α1 , . . . , αp ) , a p × 1 location vector μ =
(μ1 , . . . , μp ) , where μ = E(Y) = (E(Y1 ), . . . , E(Yp )) , with E(Yj ) = log(βj ), for j =
1, . . . , p, a p × p correlation matrix , and an EC PDF generator g (p) . This setting is
denoted by Y ∼ log-GBSp (α, μ, , g (p) ). The CDF of Y is defined as

FY (y; α, μ, , g (p) ) = FECp (B; , g (p) ), y = (y1 , . . . , yp ) ∈ Rp ,

where FECp is the CDF of an EC distribution and B = B(y; α, μ) = (B1 , . . . , Bp ) , with


 
2 yj − μj
Bj = sinh , j = 1, . . . , p. (3)
αj 2
6 C. MARCHANT ET AL.

The PDF of Y is read to be

p  
1 yj − μj
fY (y; α, μ, , g (p) ) = fECp (B; , g (p) ) cosh , y ∈ Rp ,
j=1
αj 2

where fECp is the p-variate EC PDF as given in (2). If Y ∼ log-GBSp (α, μ, , g (p) ), then:

(A1) B (Y; α, μ) −1 B(Y; α, μ) ∼ Gχ 2 (p, g (p) ); and


(A2) D(α)B(Y; α, μ) ∼ ECp (0p×1 , D(α)D(α), g (p) ), where D(α) = diag(α1 , . . . , αp ).

From Property (A1), for the parameter θ = (α  , μ , svec() ) , with ‘svec’ denoting
Downloaded by [Australian Catholic University] at 06:57 10 October 2017

vectorization of a symmetric matrix, we obtain the MD for the case i as

MDi (θ) = B (Y i ; α, μ) −1 B(Y i ; α, μ), i = 1, . . . , n. (4)

Observe that θ does not consider the degrees of freedom ν of the p-variate log-BS-t (log-
BS-tp ) distribution due to a robustness aspect in the ML estimation procedure, which we
detail in Section 2.3. In addition, note that:

(i) MDi (θ) ∼ χ 2 (p), if g (p) is the p-variate normal PDF generator; and
(ii) MDi (θ)/p ∼ F (p, ν), if g (p) is the p-variate t PDF generator, where F (p, ν) denotes
the Fisher distribution with ν degrees of freedom in the numerator and p in the
denominator.

When evaluated at the ML estimate of θ, the MD for the case i defined in (4) is useful,
as mentioned, for assessing multivariate outliers and model checking. For more details
of multivariate log-GBS distributions, the interested reader is referred to Marchant et al.
[12,26] and Garcia-Papani et al. [33].

2.3. ML estimation in multivariate log-GBS distributions


As mentioned, we employ the multivariate log-GBS distribution as an intermediary to con-
struct multivariate control charts based on multivariate GBS distributions. By using the
relationship defined in Section 2.2 between the multivariate log-GBS and GBS distribu-
tions, we estimate the log-GBSp parameters and then we obtain the GBSp parameters. Next,
we detail the corresponding ML estimation.
Let V 1 , . . . , V n be a sample from the GBSp distribution, that is, V i = (Vi1 , . . . , Vip ) ∼
GBSp (α, β, , g (p) ), for i = 1, . . . , n. Furthermore, let v 1 , . . . , v n be the observed values of
V 1 , . . . , V n , with v i = (vi1 , . . . , vip ) . Recall that, if V i ∼ GBSp (α, β, , g (p) ), then Y i =
(Yi1 , . . . , Yip ) ∼ log-GBSp (α, μ, , g (p) ), where Yij = log(Vij ), for i = 1, . . . , n and j =
1, . . . , p. Note that, before carrying out the ML method, a logarithmic transformation must
be applied to the original data, that is, we work with the observations y = (yi1 , . . . , yip )
i
of the random vector Y i , where yij = log(vij ). Then, the log-likelihood function for θ is
JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION 7

expressed as
⎛ ⎞
n
n
p
  
1 y ij − μj

(θ) =
i (θ ) = ⎝log(fECp (Bi ; , g (p) )) + log cosh ⎠ , (5)
i=1 i=1 j=1
αj 2

where fECp is given in (2) and Bi = (Bi1 , . . . , Bip ) , with elements obtained from (3) as
 
2 yij − μj
Bij = sinh , i = 1, . . . , n, j = 1, . . . , p. (6)
αj 2

From Table 1, if g (p) is the p-variate normal or t PDF generator, then the p-variate log-BS
(log-BSp ) and log-BS-tp distributions are obtained. Thus, by using (5), the corresponding
log-likelihood functions for θ in the case i are expressed respectively as
Downloaded by [Australian Catholic University] at 06:57 10 October 2017

p   
p 1 1  −1 1 yij − μj

i (θ) = − log(2π ) − log(||) − Bi  Bi + log cosh ,
2 2 2 j=1
αj 2
  ν      
ν+p p ν+p 1

i (θ) = − log  +log  − log(νπ ) + log(ν) − log(||)
2 2 2 2 2
  p   
ν+p 1 yij − μj
− log(ν + B −1
i  Bi ) + log cosh ,
2 j=1
αj 2

where the elements of Bi are defined in (6). In order to compute the ML estimates of
the multivariate log-GBS distribution parameters, the log-likelihood function given in (5)
must be maximized. In this case, the corresponding likelihood equations must be solved by
a non-linear iterative procedure, such as the Broyden–Fletcher–Goldfarb–Shanno quasi-
Newton method. The function optim of the R software has implemented this iterative
procedure, whose initial values in our case are obtained as

(i) μ̂(0) (0)


j = log(βj ) = log(med(exp(y1j ), . . . , exp(ynj ))), for j = 1, . . . , p, where “med” is
the sample median of the indicated data;
(ii)  n  (0)
1/2
(0) 4 y ij − μ̂j
α̂j = sinh2 , j = 1, . . . , p,
n i=1 2

where μ̂j (0) is given in (i); and


ˆ (0) = D(
(iii)  ˆ (0) )−1/2 
ˆ (0) D(
ˆ (0) )−1/2 , where
n
(0) (0)
ˆ (0) = 1
 B̂ (B̂i ) ,
n i=1 i
(0)
with B̂i having elements as in (6) expressed as
⎛ ⎞
(0)
(0) 2 yij − μ̂j
B̂ij = (0) sinh ⎝ ⎠ , i = 1, . . . , n, j = 1, . . . , p,
α̂j 2

and α̂j (0) being computed as in (ii).


8 C. MARCHANT ET AL.

Note that the ν parameter of the log-BS-tp distribution is not estimated but fixed at ν =
4. This is because, as pointed by Lucas [34], Paula et al. [25], Marchant et al. [26] and refer-
ences therein, the influence function based on the t distribution is bounded only when ν is
fixed, producing robust parameter estimates. Thus, we work with a type of log-likelihood
function profiled at ν = 4, value that often maximizes the log-likelihood function.

2.4. Multivariate quality control charts


In general, the construction of a multivariate control chart consists of the following
three steps:

(i) Definition of a centre line (CL), which represents a prefixed parameter value, usu-
Downloaded by [Australian Catholic University] at 06:57 10 October 2017

ally the mean vector, or another parameter of interest, called the target, of the quality
characteristics of the process to be monitored.
(ii) Establishment of lower and upper control limits (LCL and UCL, respectively), based
on subgroups of data from an in-control status of the underlying process, which set a
distance above and below the CL.
(iii) Plotting of points, each of them representing a future (new) subgroup of data, which
are not taken from an in-control status of the underlying process, unless a clear
indication of no changes in the process exists.

LCL and UCL provide a visual display for the expected amount of data dispersion. The
control limits are based on the actual behaviour of the process, not the desired behaviour
nor specification limits. A process can be in control and yet not be capable of meet-
ing requirements; see Leiva et al. [35]. Note that the steps (i)–(iii), above mentioned,
used to construct a multivariate control chart, are divided in the following two phases
(see [2]):
Phase I: In this phase related to steps (i)– (ii), a data set of size N = k × n is taken from
an in-control status of the underlying process, where k ≥ 20 is the number of subgroups
and n > 1 is the size of each subgroup. This data set is used (a) to estimate the parameters
of interest, (b) to check the distributional assumption employing goodness-of-fit tools, (c)
to compute the control limits, and (d) to identify multivariate outliers.
Phase II: In this phase related to step (iii), the control limits obtained in Phase I are
utilized to assess if the data sample of a new subgroup from the underlying process is in
control or out of control. Therefore, Phase II consists of using these limits to detect any
departure of the data of a new subgroup in relation to a prefixed mean value, μ0 namely, or
another target parameter of interest. Recall that, in Phase II, the data are not taken from an
in-control process, unless no changes in the process exist. Thus, in Phase II, the monitoring
is done for each new subgroup from 1 to m.
The average run length (ARL) is the mean number of points that must be plotted before
one of them to indicate an out-of-control status. ARL can be used to evaluate the perfor-
mance of a control chart and it is calculated as one divided by the probability of one point
plotted is out of control, that is, as

1
ARL = .
Prob(one point plotted is out of control)
JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION 9

An in-control ARL is denoted by ARL0 and expressed as ARL0 = 1/η, where η represents
the probability of type-I error. Note that ARL0 usually takes values in {200, 370.4, 500,
1000}. On the one hand, the probability that an observation is considered as out of control,
if the process is actually in control, indicates a false alarm rate (FAR), which is often set
in {0.001, 0.002, 0.0027, 0.005}. On the other hand, the probability of a true out-of-control
signal may be obtained from 1 − Prob(type-II error). An out-of-control ARL is denoted
by ARL1 and calculated as ARL1 = 1/(1 − γ ), where γ is the probability of type-II error.

3. GBS multivariate quality control charts


In this section, we derive a methodology for multivariate control charts in Phases I–II
with subgroups based on multivariate GBS distributions and an adapted Hotelling statis-
Downloaded by [Australian Catholic University] at 06:57 10 October 2017

tic. These charts are constructed once again by considering the relationship indicated in
Section 2.2 between the multivariate GBS and log-GBS distributions. In addition, Prop-
erties (A1) and (A2) of multivariate log-GBS distributions and the MD defined in (4) are
used for such a construction. This allows us to get the adapted Ta2 statistic as indicated
below. We employ the bootstrap method to determine the control limits in Phase I. Then,
we formulate these charts to monitor a process in Phase II.

3.1. An adapted T 2 statistic


Let V i = (Vi1 , . . . , Vip ) ∼ GBSp (α, β, , g (p) ), for all i = 1, . . . , n, where V i contains
p quality characteristics of the process to be monitored. These characteristics must be
observed, generating the data v hi = (vhi1 , . . . , vhip ) , corresponding to the ith item in the
hth subgroup, for h = 1, . . . , k and i = 1, . . . , n. Note that n > 1 is the size of each sub-
group and k ≥ 20 is the number of subgroups available from an in-control status for the
underlying process, that is, such data are used for Phase I of the proposed multivariate GBS
control chart. Furthermore, suppose that the process is stable and the vectors V i are inde-
pendent over time. Recall that, if V i = (Vi1 , . . . , Vip ) ∼ GBSp (α, β, , g (p) ), then Y i =
(Yi1 , . . . , Yip ) ∼ log-GBSp (α, μ, , g (p) ), where Yij = log(Vij ), for i = 1, . . . , n and j =
1, . . . , p. Observe that, before using our methodology, a logarithmic transformation must
be applied to the original data, that is, we work with the observations y = (yhi1 , . . . , yhip )
hi
for the hth subgroup of the random vector Y i , where yhij = log(vhij ), with h = 1, . . . , k,
i = 1, . . . , n and j = 1, . . . , p. We are interested in testing the hypotheses
H0 : μ = μ0 = (μ01 , . . . , μ0p ) versus H1 : μ = μ0 , (7)
where μ0 is the target mean vector of an in-control process, whose elements are
μ0j = log(β0j ), for j = 1, . . . , p, with β0j being the jth element of the parameter
vector β corresponding to the GBSp distribution. An adaptation of the Hotelling
T 2 statistic presented in Gupta et al. [32, p. 201–216] can be used for testing
the hypotheses in (7) as follows. From Property (A2) and considering that Bi =
(2 sinh((Yi1 − μ01 )/2), . . . , 2 sinh((Yip − μ0p )/2)) has a p-variate EC distribution, that
is, Bi ∼ ECp (0p×1 , D(α)D(α), g (p) ), for i = 1, . . . , n, we obtain the Hotelling Ta2 statistic
adapted for log-GBSp distributions as

Ta2 = n(n − 1)B̄ C−1 B̄, (8)
10 C. MARCHANT ET AL.

 
where B̄ = ni=1 Bi /n and C = ni=1 Bi B i . Note that, if Y ∼ log-BSp (α, μ, ), Ta given
2

in (8) follows a Fisher distribution with p and n−p degrees of freedom, that is, Ta2 ∼
F (p, n − p); see Kundu [36]. However, for the wide family of multivariate log-GBS dis-
tributions, this result is not valid. Particularly, for the multivariate t distribution, Kotz and
Nadarajah [37, p. 199–200] mentioned that the Hotelling statistic has no closed form. This
also occurs with the Hotelling statistic adapted to the multivariate log-BS-t distribution. We
propose a multivariate GBS control chart based on the adaptation of the Hotelling statistic
given in (8), using the multivariate log-GBS distribution as an intermediary between the
multivariate GBS distribution and its adapted Hotelling statistic. In Section 3.2, we discuss
a manner to find the distribution of this statistic. Once such a distribution is obtained, we
can compute its quantiles and then the corresponding limits of the multivariate quality
control chart.
Downloaded by [Australian Catholic University] at 06:57 10 October 2017

3.2. A bootstrap distribution for the Ta2 statistic


As mentioned, for data following a multivariate log-BS-t model, the distribution of the
associated Ta2 statistic is not known in closed form. We use the parametric bootstrap
method to determine this distribution; see Hall [38]. In order to generate random vec-
tors from multivariate log-BS and log-BS-t distributions, and then to obtain the bootstrap
distribution of the Ta2 statistic given in (8), we use Algorithms 1 and 2, respectively; see
Leiva et al. [39] for the generation of random numbers in univariate GBS and log-GBS
distributions.

Algorithm 1 Generation of random vectors from p-variate log-BS distributions.


1: Make a Cholesky’s decomposition of  as  = LL , where L is a lower triangular
matrix with real and positive diagonal entries.
2: Generate p independent standard normal random numbers, W = (W1 , . . . , Wp )
namely.
3: Compute Z = LW = (Z1 , . . . , Zp ) .
4: Fix values for μj and αj , with j = 1, . . . , p, of log-BSp distributions.
5: Obtain Y with elements Yj = μj + 2 arcsin(αj Zj /2), for j = 1, . . . , p.
6: Repeat Steps 1-5 until the required vector of data is generated.

Algorithm 2 Generation of random vectors from log-BS-tp distributions.


1: Repeat Steps 1-3 of Algorithm 1.
2: Generate random numbers from U ∼ Gamma(ν/2, ν/2).
3: Fix values for μj and αj , with j = 1, . . . , p, of log-BS-tp distributions.
4: Obtain Y with elements Yj = μj + 2 arcsin(αj Zj /(2 U 1/2 )), for j = 1, . . . , p.
5: Repeat Steps 1-4 until the required vector of data is generated.

3.3. Phase I
As mentioned in Section 2.4, control limits must be obtained in Phase I. According to
Duncan [40], in this phase such limits must be constructed with data of several subgroups
JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION 11

from an in-control status of the process to be monitored. Thus, these control limits are use-
ful for monitoring the underlying process over time, for example based on the Hotelling
statistic. Algorithm 3 details how to compute the limits of multivariate GBS control charts
with the bootstrap distribution of the Ta2 statistic defined in (8); see Section 3.2. In addition
to the construction of control limits, in Phase I, it is also necessary to check the distribu-
tional assumption by using goodness-of-fit tools and to assess multivariate outliers with
suitable methods. Control charts often have both LCL and UCL, but sometimes only an
UCL is considered; see, for example, Alfaro and Ortega [7].

Algorithm 3 Construction of multivariate GBS control limits in Phase I.


1: Collect the data vector v hi = (vhi1 , . . . , vhip ) containing the observations of p quality
characteristics V i = (Vi1 , . . . , Vip ) ∼ GBSp (α, β, , g (p) ), with h = 1, . . . , k and i =
Downloaded by [Australian Catholic University] at 06:57 10 October 2017

1, . . . , n, in k ≥ 20 subgroups of size n > 1 from an in-control process.


2: Obtain the transformed data vector y = (yhi1 , . . . , yhip ) containing the log-
hi
arithms of the data v hi = (vhi1 , . . . , vhip ) collected in Step 1, that is, yhij =
log(vhij ), where y can be considered as an observation of Y i = (Yi1 , . . . , Yip ) ∼
hi
log-GBSp (α, μ, , g (p) ), with h = 1, . . . , k, i = 1, . . . , n and j = 1, . . . , p.
3: Compute the ML estimates of α, μ and  using the data transformed by logarithm
in Step 2, from the pooled sample of size N = k × n, and check the distributional
assumption, as well as the presence of multivariate outliers.
4: Generate a parametric bootstrap sample (y∗ , . . . , y∗ ) of size n from a log-GBSp distri-
1 n
bution employing the ML estimates obtained in Step 3 as the distribution parameters.
5: Compute Ta2 defined in (8) with the bootstrap sample (y∗ , . . . , y∗ ) generated in Step
1 n
4, which is denoted by Ta2∗ , assuming a target μ0 .
6: Repeat Steps 4-5 a large number of times (for example, B = 10000) and compute B
values of the bootstrap statistic of Ta2 , denoted by ta2∗1 , . . . , ta2∗B .
7: Fix η as the desired FAR of the control chart.
8: Use the B values of the bootstrap statistic obtained in Step 6 to find the 100 × (η/2)th
and 100 × (1 − η/2)th quantiles of the distribution of Ta2 , which are the LCL and UCL
for the multivariate GBS control chart of FAR η, respectively.

3.4. Phase II
As also mentioned in Section 2.4, multivariate GBS quality control charts, obtained in
Phase I, must be used in Phase II to test if the process to be monitored remains in con-
trol when data of new subgroups are collected. In Phase II, the adapted Hotelling statistic
is denoted by Ta2new . Then, we employ multivariate GBS control charts to plot the sequence
of values for the Ta2new statistic calculated as in (8), with the m subgroups employed in this
phase. Algorithm 4 describes how the p-variate control chart based on multivariate GBS
distributions is utilized to monitor the underlying process.

4. Simulation study
In this section, by using MC simulations, we evaluate the proposed methodology and
a standard methodology both in Phase I and in Phase II. Specifically, we compare the
12 C. MARCHANT ET AL.

Algorithm 4 Process monitoring using the multivariate GBS chart in Phase II.
1: Repeat Steps 1-2 of Algorithm 3, obtaining a new transformed data vector y =
hi
(yhi1 , . . . , yhip ) , for h = 1, . . . , m and i = 1, . . . , n, but now the new original data are
not collected from an in-control process necessarily.
2: Calculate the Ta2 statistic for each sample of new transformed data obtained in Step
new
1, generated in the hth subgroup, with h = 1, . . . , m, for regular time intervals, getting
ta2new , . . . , ta2newm .
1
3: Plot the points ta2 , . . . , ta2newm in the GBSp control chart with limits generated by
new1
Algorithm 3.
4: Declare the process as in control if all points ta2 , . . . , ta2newm fall between LCL and
new1
UCL obtained in Algorithm 3; otherwise, that is, if any of the points ta2new , . . . , ta2newm
Downloaded by [Australian Catholic University] at 06:57 10 October 2017

1
falls below the LCL or above the UCL, the process is in an out-of-control status.

behaviour of the Ta2 statistic under a multivariate GBS distribution with the Ts2 statis-
tic under a multivariate normal distribution. We perturb the new subgroups in Phase II
considering three levels (low, moderate, and high) and evaluate the performance of the
multivariate GBS control chart by means of the detection rate of the new subgroup, which
is obtained as the proportion of values for the corresponding Hotelling statistic that are
located above the UCL. We use the R software to carry out this simulation.

4.1. Phase I
In this phase, we consider the following simulation scenario. Using Algorithms 1
and 2, we generate B = 10, 000 bootstrap samples for k = 20 subgroups of sizes n ∈
{5, 10, 25, 50, 100} from multivariate log-BS and log-BS-t distributions. For each bootstrap
sample, we compute its Ta2 statistic with the formula defined in (8) and the Ts2 statis-
tic, obtaining ta2∗1 , . . . , ta2∗10000 and ts2∗
1
, . . . , ts2∗
10000
, respectively. We employ an overall FAR
η = 0.0027 to compute the LCL and UCL based on the 0.27th and 99.73th quantiles of the
10,000 bootstrap values for the corresponding statistics, respectively. Table 2 provides the
LCL and UCL calculated with these statistics. We consider data generated from BSp and
BS-tp distributions and their corresponding logarithm transformations with p ∈ {2, 3, 4}
and true values for their parameters established respectively as
 
1.0
0.8
(a) p = 2, α = (0.4, 0.5) , μ = (2, 1) and  = .
0.8
1.0
⎛ ⎞
1.0 0.8 0.5
(b) p = 3, α = (0.4, 0.5, 0.4) , μ = (2, 1, 5) and  = ⎝0.8 1.0 0.2⎠.
0.5 0.2 1.0
⎛ ⎞
1.0 0.8 0.5 0.2
⎜ 0.8 1.0 0.7 0.3⎟
⎟.
(c) p = 4, α = (0.4, 0.5, 0.4, 0.5) , μ = (2, 1, 5, 3) and  = ⎜ ⎝0.5 0.7 1.0 0.5⎠
0.2 0.3 0.5 1.0
JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION 13

Table 2. Simulated limits for the GBSp control chart with η = 0.0027 for the indicated n, l, p, static, and
distribution.
UCL
p=2 p=3 p=4
Ta2 Ta2 Ta2
n Ts2 BS2 BS-t2 Ts2 BS3 BS-t3 Ts2 BS4 BS-t4
5 10.5221 11.6027 6.7079 9.3715 14.8541 8.2944 11.5399 16.0207 9.3704
10 10.0075 11.1691 6.0518 8.6043 13.6852 7.4770 9.6871 15.9451 8.8128
25 9.5291 10.7939 5.8128 8.5067 13.7052 7.1973 9.6420 15.7863 8.6813
50 8.7918 10.9625 5.6462 7.8080 13.5828 6.8775 8.7755 15.5876 7.9276
100 7.3866 10.9106 5.3870 7.3844 13.4448 6.6943 8.3706 15.5746 7.8409
LCL
5 0.0029 0.0041 0.0031 0.0219 0.0472 0.0265 0.0634 0.1810 0.0578
Downloaded by [Australian Catholic University] at 06:57 10 October 2017

10 0.0053 0.0054 0.0031 0.0249 0.0504 0.0246 0.0638 0.1727 0.0795


25 0.0190 0.0050 0.0024 0.0251 0.0480 0.0218 0.0762 0.1509 0.0813
50 0.0051 0.0052 0.0030 0.0243 0.0530 0.0245 0.0696 0.1676 0.0672
100 0.0027 0.0059 0.0031 0.0214 0.0596 0.0252 0.0754 0.1576 0.0775

The values presented in (a)–(c) above are chosen following the criteria: (i) αj ≤ 0.5
according to Marchant et al. [26]; (ii) μj ∈ {1, 2, 3, 5} based on Alfaro and Ortega [7]; and
(iii) values for  corresponding to small, medium, and large correlations. We use ν = 4
for the multivariate log-BS-t distribution according to the robustness aspects mentioned
in Section 2.3. From Table 2, as n increases, the control limits become narrower for both
statistics, Ta2 and Ts2 namely, whereas the UCL decreases progressively. Note that the LCLs
are very close to zero, which is a reason why this control limit is often not calculated and
set as zero. For a fixed n, the control limits of Ta2 based on the multivariate log-BS-t distri-
bution are narrower than those based on the multivariate log-BS distribution. In addition,
the control limits of Ta2 based on the multivariate log-BS-t distribution are narrower than
in the Ts2 statistic under normality. This can be attributed to the non-robustness to out-
liers of the ML estimation of parameters in the cases of multivariate log-BS and normal
distributions, possibly affecting the detection of out-of-control status in Phase II.

4.2. Phase II
In this phase, we generate m = 30 new subgroups using the same scenario of Phase I. We
employ the function T2.2(data, estat, n) of an R package named IQCC to cal-
culate the Ts2 statistic for multivariate observations at Phase II; see https://cran.r-project.
org/web/packages/IQCC/IQCC.pdf. We compute Ta2new with Algorithm 4 and generate
M = 5000 MC replications. We perturb a number l of the m new subgroups with one obser-
vation in each subgroup and consider three perturbation levels (low, moderate, and high),
corresponding to one (l = 1), five (l = 5) , and ten (l = 10) perturbed subgroups. This per-
turbation is given by yij∗ = yij + 4SYj , where SYj represents the standard deviation of the
quality characteristic Yj , for i = 1, . . . , n and j = 1, . . . , p. When a subgroup is not per-
turbed, l = 0. The performance of the multivariate GBS control chart is judged in terms of
the detection rate of the new subgroup, which is obtained as the proportion of values for the
corresponding Hotelling statistic that are located above the UCL. The results of this simula-
tion are shown in Table 3. From this table, note that the multivariate BS-t control chart per-
forms well in detecting out-of-control status in Phase II. Observe that the low and moderate
14 C. MARCHANT ET AL.

Table 3. Out-of-control detection rate in Phase II for indicated n, l, p, statistic and target in the BS-tp
distribution.
p=2 p=3 p=4
μ0 = (2, 1) μ0 = (2, 1, 5) μ0 = (2, 1, 5, 3)
n l Ts2 Ta2 Ts2 Ta2 Ts2 Ta2
5 0 0.1300 0.4320 0.5330 0.9790 0.4940 1.0000
1 0.1490 0.4960 0.5300 0.9810 0.4550 1.0000
5 0.0358 0.2278 0.1530 0.8526 0.1200 1.0000
10 0.0133 0.1597 0.0746 0.5922 0.0635 0.9875
10 0 0.1770 0.8210 0.6360 0.9940 0.7680 0.9950
1 0.1860 0.8960 0.6620 0.9980 0.7500 0.9960
5 0.0396 0.7706 0.2146 0.9768 0.2666 0.9704
10 0.0207 0.6235 0.1092 0.9145 0.1352 0.9037
Downloaded by [Australian Catholic University] at 06:57 10 October 2017

25 0 0.2310 0.8500 0.6770 0.9900 0.7630 0.9880


1 0.2320 0.9040 0.6590 0.9960 0.7790 0.9930
5 0.0544 0.6736 0.2276 0.9784 0.2910 0.9406
10 0.0261 0.5983 0.1141 0.9189 0.1444 0.8158
50 0 0.3110 0.8540 0.8040 0.9970 0.8930 0.9980
1 0.3210 0.8880 0.7910 0.9980 0.8880 0.9990
5 0.0736 0.5538 0.2926 0.9680 0.3810 0.9636
10 0.0331 0.4558 0.1535 0.8525 0.1948 0.8424
100 0 0.5450 0.8580 0.8350 1.0000 0.9260 1.0000
1 0.5180 0.8720 0.8450 1.0000 0.9180 1.0000
5 0.1454 0.4794 0.3622 0.9768 0.4522 0.9796
10 0.0767 0.3793 0.1783 0.8345 0.2301 0.8536

perturbations are well detected by the control chart. Also, notice that the Ta2 statistic based
on the multivariate BS-t distribution has a better performance than the Ts2 statistic, as
the number p of quality characteristics increases. In addition, as the level of perturbation
increases, the detection of out-of-control status decreases due to the masking effect, espe-
cially when 10 (l = 10) new subgroups are perturbed. However, this effect is attenuated in
the case of the multivariate BS-t chart due to the robustness of its estimation procedure
in relation to the BS chart (omitted here) and to the chart based on the Ts2 statistic under
normality. In general, when the subgroup size n increases, the performance to detect an
out-of-control status also improves for both statistics under study, Ta2 and Ts2 namely.

5. Data analysis
In this section, we apply the proposed methodology to real-world air quality data. This
methodology was implemented by a computational routine in R code.

5.1. Description of the problem and data


Several studies show an association between pollutant concentrations, such as particu-
late matter (PM), ozone (O3 ), nitrogen dioxide (NO2 ), and sulphur dioxide (SO2 ). These
pollutants produce premature deaths and diverse cardio-respiratory diseases in children
and adults. PM is classified according to its diameter, because particle size determines
sites of deposition within the respiratory tract. Coarser particles (those with a diameter
over 10 mm) do not penetrate into airways. However, these particles are deposited in the
upper respiratory tract and are cleared by cilia action. Inhalable particles measuring less
JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION 15

than 10 mm are called PM10, whereas those less than 2.5 mm are called PM2.5. As size
decreases, there is a higher possibility for PM to penetrate deeper into smaller alveoli and
airways. In particular, various effects are produced from exposure to PM, but the nature of
those induced effects varies according to the PM composition. Indeed, there is evidence of
an increase in the risk of cardiovascular diseases and mortality from exposure to PM2.5,
which occurs even after short time periods, such as hours or weeks. For more details about
PM concentrations, the interested reader is referred to Marchant et al. [41]. Because of
a combination of anthropogenic, meteorological, and topographic factors, Santiago, the
capital city of Chile, endures bad atmospheric ventilation increasing the accumulation of
PM and gaseous contaminants, mainly from 1 April to 31 August of each year. Gaseous
contaminants, such as carbon monoxide (CO), NO2 , SO2 , and PM, are the main contrib-
utors to air quality problems in Santiago, Chile. In addition, NO2 and SO2 are precursors
Downloaded by [Australian Catholic University] at 06:57 10 October 2017

of PM; see Marchant et al. [41]. The current official methodology used by the authority in
Santiago, Chile, for predicting PM10 concentrations is based on a multiple linear regres-
sion. It helps to forecast the maximum value of the 24-h average concentration of PM10
in μ g/normalized cubic meters (m3 N) for the period from 00:00 to 24:00 hours of the
following day. Furthermore, in the year 2015, through Supreme Decree number 15/2015
and resolution number 9664/2015, it was instructed by the Chilean Ministry of Health to
declare sanitary alert employing PM2.5 concentrations. This decree establishes the faculty
of managing critical events by PM2.5 as a complementary measure. It leads to the need
to develop multivariate tools in order to model and monitor PM10 and PM2.5 concen-
trations simultaneously, predicting critical periods of contamination adequately. In our
illustration, we use the following variables: (X1 ) PM2.5 in μ g/m3 N; and (X2 ) PM10 in
μg/m3 N. The data to be considered were collected by the Chilean Metropolitan Envi-
ronmental Health Service. We utilize these data for our analysis, which are available at
http://sinca.mma.gob.cl. The data were collected in 2003 as 1-h (hourly) average values, at
monitoring stations: (a) Las Condes and (b) Pudahuel, located in Santiago. We select data
from these stations mainly because they are more suitable to conditions of low and high
stability, allowing us to analyse different pollution patterns. Chilean guidelines for air qual-
ity are established by its Ministry of the Environment. The maximum concentrations (in
μ g/m3 N) according to these guidelines are 50 and 150, during 24-h for PM2.5 and PM10,
respectively. These values are considered as the targets in our illustration.

5.2. Exploratory data analysis


First, we conduct an exploratory data analysis by computing correlations between X1 and
X2 in both stations. Figure 1 displays the scatter-plots of these variables and their cor-
responding correlations for (left) Las Condes and (right) Pudahuel. From this figure, we
detect that there are large (0.86) and medium (0.60) correlations between X1 and X2 for Las
Condes and Pudahuel stations, respectively. Exploratory data analyses for each variable at
the two stations were conducted and marginal GBS distributions seem good candidates for
describing these data.

5.3. Phase I
We initially use concentrations for the months of January and February to calculate the
control limits according to Algorithm 3 with k = 59, n = 24, N = 1416, B = 10, 000, and
16 C. MARCHANT ET AL.
Downloaded by [Australian Catholic University] at 06:57 10 October 2017

Figure 1. Scatter-plots with their corresponding correlations of the indicated variables for (left) Las
Condes and (right) Pudahuel stations.

Figure 2. PP-plots with KS acceptance regions at 5% for transformed MDs with (left) BS2 and (right)
BS-t2 distributions based on Pudahuel data.

FAR η = 0.0027. We utilize these months since their air quality is stable (considered as an
in-control status), because the meteorological and topographical conditions favour no sat-
uration of PM concentrations. We employ the transformed MD with the Wilson–Hilferty
approximation to obtain a normal distribution; see Marchant et al. [26]. Then, a goodness-
of-fit technique is considered to check step 2 of Algorithm 3; see Marchant et al. [12].
Figure 2 shows the corresponding theoretical probability versus empirical probability (PP)
plots with acceptance bands for a significance level of 5% in Pudahuel station based on BS2
and BS-t2 distributions (for Las Condes station, the results are similar so that they are omit-
ted here). From this figure, we corroborate the good fit of the BS2 and BS-t2 distributions
to the data in Phase I, which is supported by the p-values 0.245 and 0.227, respectively, of
the Kolmogorov–Smirnov (KS) test associated with these PP-plots; see Marchant et al. [26].
We use ν = 4 for the log-BS-t2 distribution according to the robustness aspects mentioned
in Section 2.3.
JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION 17

5.4. Phase II
We use the control limits obtained in Phase I (see Section 5.3) to monitor April of 2003;
see Marchant et al. [41]. For the control chart of this month, the number of subgroups
and the subgroup size are m = 30 days and n = 24 hours, respectively, giving a total of 720
observations. We employ the transformed MD to assess the fit of the most appropriate
distribution to these data. Figure 3 displays the PP-plots with acceptance bands for a sig-
nificance level of 5%. From this figure, we detect that the BS-t2 distribution provides a
better fit than the BS2 distribution for both stations, which is corroborated by the p-values
0.901 (Las Condes/BS-t) and 0.520 (Pudahuel/BS-t) versus 0.095 (Las Condes/BS) and
0.238 (Pudahuel/BS) of the KS test associated with these PP-plots. We employ the MD as
a measure to detect multivariate outliers. Figures 4 (left) and (right) depict graphical plots
for both stations with the BS-t2 distribution. From the figure, note that 19 April (labelled as
Downloaded by [Australian Catholic University] at 06:57 10 October 2017

19) is detected as a multivariate outlier. However, this observation does not influence the
control charts shown in Figure 5 (left) and (right) for both stations due to the robustness
of the ML estimation when the BS-t2 distribution is used. It is known that the Las Condes
station is less contaminated than other stations (in 2003), due to better ventilation at its

Figure 3. PP-plots with 5% KS acceptance regions for transformed MDs with (first panel) BS2 and
(second panel) BS-t2 distributions for (left) Las Condes and (right) Pudahuel based on pollutant data.
18 C. MARCHANT ET AL.
Downloaded by [Australian Catholic University] at 06:57 10 October 2017

Figure 4. Index-plots of MDs for (left) Las Condes and (right) Pudahuel based on pollutant data using
the BS-t2 distribution.

Figure 5. Bivariate BS-t control charts for (left) Las Condes and (right) Pudahuel stations in April 2003
based on pollutant data.

high altitude; see Marchant et al. [41]. This can be the reason why such a station does not
have points outside of the limits. On the contrary, 10 April (labelled as 10) exceeds the limit
in the Pudahuel station. Therefore, an environmental alert must be declared as an out-of-
control status for the next day. Note that, if at least one of the air quality monitoring stations
presents a dangerous PM level for human health in Santiago, then an out-of-control status
must be declared. The interested reader is referred to CONAMA [29] for details of the offi-
cial decree of the Ministry of Environment of the Chilean government that indicates such a
regulation. Observe that our criterion is coherent with the official information provided by
the Chilean Ministry of Health, which established environmental alerts for the day 11 April
2003; see www.seremisaludrm.cl/sitio/pag/aire/indexjs3airee001.asp. We recommend this
methodology based on multivariate GBS control charts because these are useful for alert-
ing episodes of extreme air contamination. Thus, pollutants must be monitored to prevent
adverse effects on human health for the population of Santiago, Chile.
JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION 19

6. Concluding remarks
In this paper, we proposed a robust methodology for multivariate control charts in Phases
I and II with subgroups based on GBS distributions and the Hotelling statistic. In addi-
tion, we considered the MD to detect multivariate outliers and evaluate the adequacy of
the distributional assumption. The proposed methodology estimates the multivariate BS
and BS-t parameters with the ML method. For data following a multivariate BS-t model,
the distribution of the associated Hotelling statistic is not known in closed form. Then,
the methodology used the bootstrap method to obtain this distribution. Once such a dis-
tribution is known, the proposed methodology computed its quantiles to construct the
control limits of the multivariate chart. A Monte Carlo simulation study was conducted to
evaluate the proposed methodology in Phases I and II, comparing its behaviour with the
Hotelling statistic under normality. We concluded by means of this simulation study that
Downloaded by [Australian Catholic University] at 06:57 10 October 2017

the multivariate BS-t control chart performed well in the detection of out-of-control status
in Phase II, conducting better than the multivariate BS control chart in both phases. In
addition, the Hotelling statistic under normality does not perform well in the detection of
out-of-control status in Phase II. We illustrated the proposed methodology with real-world
air quality data of Santiago, Chile, through two monitoring stations with different climatic
conditions and pollution levels. This illustration showed that our methodology is useful
for alerting episodes of extreme air pollution and for preventing adverse effects on human
health for the population of Santiago, Chile. In addition, we demonstrated empirically the
coherence between our criterion and real-world situations in which Chilean health author-
ity ruled environmental alerts. As future research, we are planning to compare the proposed
multivariate control charts with others based on statistics proposed in the literature. For
example, control charts for individual observations, or with autocorrelation structures over
time, as well as charts for parameters different to the mean.

Acknowledgements
The authors thank editors and two referees for their constructive comments on an earlier version of
the present manuscript.

Disclosure statement
No potential conflict of interest was reported by the authors.

Funding
This research was supported partially by the Chilean Council for Scientific and Technology Research,
grant FONDECYT 1160868 (V. Leiva) and fellowship ‘Becas-Conicyt’ (C. Marchant), as well as by
the Brazilian National Council for Scientific and Technological Development (CNPq) and Brazilian
Coordination for the Improvement of Higher Level Personnel (CAPES).

ORCID
Carolina Marchant http://orcid.org/0000-0003-1832-4444
Víctor Leiva http://orcid.org/0000-0003-4755-3270
Francisco José A. Cysneiros http://orcid.org/0000-0001-6757-6969
Shuangzhe Liu http://orcid.org/0000-0002-4858-2789
20 C. MARCHANT ET AL.

References
[1] Shewhart WA. Economic control of quality of manufactured product. New York: D. Van
Nostrand Company; 1931.
[2] Alt FB. Multivariate quality control. In Kotz S, Johnson NL, and Read CB, editors. The
encyclopedia of statistical sciences. volume 6, Wiley, New York, US; 1985. p. 110–112.
[3] Hotelling H. Multivariate quality control. In: Eisenhart C, Hastay M, and Wallis WA, editors.
Techniques of statistical analysis. New York (NY): McGraw-Hill; 1947. p. 111–184.
[4] Lowry CA, Montgomery DC. A review of multivariate control charts. IEE Trans.
1995;27:800–810.
[5] Liu RY, Tang J. Control charts for dependent and independent measurements based on
boostrap methods. J Am Stat Assoc. 1997;91:1694–1700.
[6] Stoumbos GZ, Sullivan JH. Robustness to non-normality of the multivariate EWMA control
charts. J Qual Technol. 2002;34:304–315.
[7] Alfaro JL, Ortega JF. Robust Hotelling’s T 2 control charts under non-normality: the case of
Downloaded by [Australian Catholic University] at 06:57 10 October 2017

t-Student distribution. J Stat Comput Simul. 2012;83:1437–1447.


[8] Alfaro JL, Ortega JF. A new control chart in contaminated data of t-Student distribution for
individual observations. Appl Stoch Models Bus Ind. 2013;29:79–91.
[9] Hawkins DM. Identification of outliers. New York (NY): Chapman and Hall; 1980.
[10] Becker C, Gather U. The masking breakdown point of multivariate outlier identification rules.
J Am Stat Assoc. 1999;94:947–955.
[11] Rocke DM, Woodruff DL. Identification of outliers in multivariate data. J Am Stat Assoc.
1996;91:1047–1061.
[12] Marchant C, Leiva V, Cysneiros FJA, et al. Diagnostics in multivariate generalized Birn-
baum–Saunders regression models. J Appl Stat. 2016b;43:2829–2849.
[13] Jensen WA, Birch JB, Woodall WH. High breakdown estimation methods for phase I multi-
variate control charts. Qual Reliab Eng Int. 2007;23:615–629.
[14] Chenouri S, Variyath AM, Steiner SH. A multivariate robust control chart for individual
observations. J Qual Technol. 2009;41:259–271.
[15] Chenouri S, Variyath AM. A comparative study of phase II robust multivariate control charts
for individual observations. Qual Reliab Eng Int. 2011;27:857–865.
[16] Saulo H, Leão J, Leiva V, et al. Birnbaum–Saunders autoregressive conditional duration mod-
els applied to high-frequency financial data. Stat Pap. Available at http://dx.doi.org/10.1007/
s00362–017–0888–6, 2017.
[17] Leiva V, Athayde E, Azevedo C, et al. Modeling wind energy flux by a Birnbaum–Saunders
distribution with unknown shift parameter. J Appl Stat. 2011;38:2819–2838.
[18] Leiva V. The Birnbaum–Saunders distribution. New York (NY): Academic Press; 2016.
[19] Lio YL, Park C. A bootstrap control chart for Birnbaum–Saunders percentiles. Qual Reliab Eng
Int. 2008;24:585–600.
[20] Leiva V, Marchant C, Ruggeri F, et al. A criterion for environmental assessment using Birn-
baum–Saunders attribute control charts. Environmetrics. 2015;26:463–476.
[21] Birnbaum ZW, Saunders SC. A new family of life distributions. J Appl Probab. 1969;6:
319–327.
[22] Johnson NL, Kotz S, Balakrishnan N. Continuous univariate distributions, vol. 2. New York
(NY): Wiley; 1995.
[23] Kotz S, Leiva V, Sanhueza A. Two new mixture models related to the inverse Gaussian
distribution. Methodol Comput Appl Probab. 2010;12:199–212.
[24] Balakrishnan N, Gupta R, Kundu D, et al. On some mixture models based on the
Birnbaum–Saunders distribution and associated inference. J Stat Plan Inf. 2011;141:
2175–2190.
[25] Paula GA, Leiva V, Barros M, et al. Robust statistical modeling using the Birnbaum–Saunders-t
distribution applied to insurance. Appl Stoch Models Bus Ind. 2012;28:16–34.
[26] Marchant C, Leiva V, Cysneiros FJA. A multivariate log-linear model for Birnbaum–Saunders
distributions. IEEE Trans Reliab. 2016a;65:816–827.
JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION 21

[27] Kundu D, Balakrishnan N, Jamalizadeh A. Generalized multivariate Birnbaum–Saunders


distributions and related inferential issues. J Multivar Anal. 2013;116:230–244.
[28] Lemonte A, Martínez-Flores G, Moreno-Arenas G. Multivariate Birnbaum–Saunders distribu-
tion: properties and associated inference. J Stat Comput Simul. 2015;85:374–392.
[29] CONAMA. Establishment of primary quality guideline for PM10 that regulates environmen-
tal alerts (Technical Report Decree 59). Ministry of Environment (CONAMA) of the Chilean
Government, Santiago, Chile. 1998.
[30] Fang KT, Kotz S, Ng KW. Symmetric multivariate and related distributions. London: Chapman
and Hall; 1990.
[31] Fang KT, Zhang YT. Generalized multivariate analysis. Berlin: Springer; 1990.
[32] Gupta AK, Varga T, Bodnar T. Elliptically contoured models in statistics and portfolio theory.
New York (NY): Springer; 2013.
[33] Garcia-Papani F, Uribe-Opazo MA, Leiva V, et al. Birnbaum–Saunders spatial modelling
and diagnostics applied to agricultural engineering data. Stoch Environ Res Risk Assess.
Downloaded by [Australian Catholic University] at 06:57 10 October 2017

2017;31:105–124.
[34] Lucas A. Robustness of the student t based M-estimator. Commun Stat Theory Methods.
1997;41:1165–1182.
[35] Leiva V, Marchant C, Saulo H, et al. Capability indices for Birnbaum–Saunders processes
applied to electronic and food industries. J Appl Stat. 2014;41:1881–1902.
[36] Kundu D. Bivariate log-Birnbaum–Saunders distribution. Statistics. 2015;49:900–917.
[37] Kotz S, Nadarajah S. Multivariate t-distributions and their applications. New York (NY):
Cambridge University Press; 2004.
[38] Hall P. The bootstrap and edgeworth expansion. New York (NY): Springer; 2013.
[39] Leiva V, Sanhueza A, Sen PK, et al. Random number generators for the generalized Birn-
baum–Saunders distribution. J Stat Comput Simul. 2008;78:1105–1118.
[40] Duncan A. Quality control and industrial statistics. Homewood (IL): Irwin; 1986.
[41] Marchant C, Leiva V, Cavieres M, et al. Air contaminant statistical distributions with applica-
tion to PM10 in Santiago, Chile. Rev Environ Contam Toxicol. 2013;223:1–31.

You might also like