Professional Documents
Culture Documents
Adaptive High Learning Rate Probabilistic Disruption Predictors From Scratch For The Next Generation of Tokamaks
Adaptive High Learning Rate Probabilistic Disruption Predictors From Scratch For The Next Generation of Tokamaks
Adaptive High Learning Rate Probabilistic Disruption Predictors From Scratch For The Next Generation of Tokamaks
Adaptive high learning rate probabilistic disruption predictors from scratch for the next
generation of tokamaks
This content has been downloaded from IOPscience. Please scroll down to see the full text.
(http://iopscience.iop.org/0029-5515/54/12/123001)
View the table of contents for this issue, or go to the journal homepage for more
Download details:
IP Address: 128.119.168.112
This content was downloaded on 02/02/2015 at 02:30
Abstract
The development of accurate real-time disruption predictors is a pre-requisite to any mitigation action. Present theoretical
models of disruptions do not reliably cope with the disruption issues. This article deals with data-driven predictors and a review
of existing machine learning techniques, from both physics and engineering points of view, is provided. All these methods need
large training datasets to develop successful predictors. However, ITER or DEMO cannot wait for hundreds of disruptions to
have a reliable predictor. So far, the attempts to extrapolate predictors between different tokamaks have not shown satisfactory
results. In addition, it is not clear how valid this approach can be between present devices and ITER/DEMO, due to the
differences in their respective scales and possibly underlying physics. Therefore, this article analyses the requirements to create
adaptive predictors from scratch to learn from the data of an individual machine from the beginning of operation. A particular
algorithm based on probabilistic classifiers has been developed and it has been applied to the database of the three first ITER-like
wall campaigns of JET (1036 non-disruptive and 201 disruptive discharges). The predictions start from the first disruption and
only 12 re-trainings have been necessary as a consequence of missing 12 disruptions only. Almost 10 000 different predictors
have been developed (they differ in their features) and after the chronological analysis of the 1237 discharges, the predictors
recognize 94% of all disruptions with an average warning time (AWT) of 654 ms. This percentage corresponds to the sum of
tardy detections (11%), valid alarms (76%) and premature alarms (7%). The false alarm rate is 4%. If only valid alarms are
considered, the AWT is 244 ms and the standard deviation is 205 ms. The average probability interval about the reliability and
accuracy of all the individual predictions is 0.811 ± 0.189.
2
Nucl. Fusion 54 (2014) 123001 J. Vega et al
signals and the results show a success rate of 94%, a false alarm
General rule
rate of about 4% and an average warning time of 654 ms. The
(model)
global average probability of each prediction is 0.811 ± 0.189,
which provides a high degree of confidence in the classifier. Induction Deduction
At this point, it should be specified the meaning of several
terms used in the article with regard to disruptive discharges. Training set Prediction
The term tardy detection is applied to those disruptions that are
recognized with a warning time shorter than 30 ms. The term Figure 2. A classifier has to be understood as consisting of two
premature alarm is reserved for predictions whose warning steps: induction and deduction. Both of them are indissolubly
time is longer than 1 s (this threshold will be justified later). linked together in the notion of classical classifiers.
Valid alarm refers to predictions with warning times between
30 ms and 1 s. The term success rate is used as the rate of the predicted label. Common methods used for the inductive
recognizing disruptive behaviours regardless of the warning step in supervised classification problems are artificial neural
times, i.e. the success rate is the sum of the tardy detection networks [13], support vector machines (SVM) [14], k-nearest
rate, the premature alarm rate and the valid alarm rate. This neighbours [15], self-organizing maps [16] and generative
definition of success rate is necessary to be able to compare the topographic maps [17] among others.
Venn predictor results with previous predictors from scratch After finishing the training process, a test dataset with
already published that use the term success as synonym of known labels is used to estimate the classifier quality. The
disruptive behaviour recognition. feature vectors of the test dataset are used as inputs to the model
This paper is structured in nine sections. Section 2 and the predictions are compared with the real labels. The
introduces the machine learning terminology used in the results obtained in terms of success rate are assumed to be of
article. Section 3 reviews recent data-driven predictors for the same quality for all future feature vectors that will be input
JET and AUG. Section 4 summarizes the results of pioneering to the model. Although this is not necessarily true, this is a
predictors from scratch also tested with JET and AUG data standard hypothesis that is based on the fact that the samples are
respectively and section 5 discusses general requirements to be supposed to be independent and identically distributed (i.i.d.
met for high learning rate predictors from scratch. Section 6 assumption).
explains the mathematical aspects of the Venn prediction Finally, it should be noted that the selection of the
framework (as it is the base of the probabilistic predictor feature vector components is a key aspect for the success of
developed in this article) and the specific taxonomy used for a prediction task. The feature vectors have to provide crucial
the DPFS. Section 7 describes the specific implementation of information about the configuration space of the problem and
the predictor and section 8 shows the Venn predictor results the set of components can be seen as a ‘parametrization’
with JET campaigns. Finally, section 9 is devoted to a short of such a space. Unfortunately, there are no mathematical
discussion of the outcomes of the paper and of the proposals theorems to ensure optimal choices and, therefore, the feature
for future work in the line of predictors from scratch.
selection process is a trial and error procedure that is typically
guided by the knowledge of the expert about the problem at
2. Machine learning terminology hand.
Focusing the attention on disruptions, it is important
This section summarizes the general concepts and terminology to emphasize that to deal with disruption prediction, the
that will be used in the article. ‘parametrization’ of the configuration space depends not
In automatic classification problems, a sample is only on the physical quantities but also on their specific
represented by an ordered pair (xi , yi ), where xi ∈ Rm is representations in both data formats (waveforms, time series
the feature vector (i.e. the set of m features that characterize data, images and, in general, any abstract form) and signal
the sample i) and yi is its corresponding label (yi ∈ domains (time/space, frequency/wave number or both). In this
{C1 , C2 , . . . , CL }, where Cj , j = 1, . . . , L are the existing sense, even linear models can be difficult (or impossible) to
classes). For disruption prediction the classifiers are binary interpret from a physics point of view. But it should be taken
(L = 2), which corresponds to labels ‘disruptive’ and ‘non- into account that the main objective of a data-driven disruption
disruptive’. Henceforth, all reference to classifiers will mean
predictor can be to recognize an incoming disruption, although
binary classifiers unless otherwise noted.
the physics nature of the disruption is not clear.
Any automatic classification system requires a training
process. In the case of supervised classifiers, the training
sample labels are known. The prediction process is made up of 3. Review of recent data-driven predictors
two steps: induction and deduction (figure 2). The first one is
an inductive phase by means of which the real learning process Disruptions have been of primary interest in medium and large
is carried out. The training samples, (xi , yi ) , i = 1, . . . , N, fusion devices. A well-known scenario that causes a disruption
are used to obtain a general rule (also called model or decision is associated to the plasma MHD activity. A rotating island
function). From that moment, the deduction step is used to starts to grow, it is slowed down and finally locks and continues
make predictions. Given any new sample represented by the growing until the final disruption. The physical mechanism
pair (x, y) with known feature vector x but unknown label responsible for mode locking is the braking effect of error fields
y, the objective is to estimate the label. The feature vector or MHD modes (such as resistive wall modes) that slow down
is used as input to the general rule and the model output is and ultimately stop the plasma rotation [18]. The resulting
3
Nucl. Fusion 54 (2014) 123001 J. Vega et al
4
Nucl. Fusion 54 (2014) 123001 J. Vega et al
All the above predictors use as much information as good prediction capabilities and low rate of false alarms after
possible from past discharges to recognize in advance an including in the training process about 40 disruptions [32].
incoming disruption. The main objective is to detect the With this set-up, the success rate is 93.5% and the rate of
disruptive event although the recognition mechanism does not false alarms is 2.3%. However, in spite of this important
rely on physics. However, it is important to mention two reduction in the training requirements (to be compared with
recent works based on physics data-driven approaches with 246 disruptions to train APODIS in JET), the number of 40
large amounts of past shots that are founded on statistical data disruptive discharges could be too large for the ITER needs
analysis. and higher learning rate classifiers from scratch are necessary.
In the first one [30], a set of simple predictive criteria, A second predictor from scratch [33] makes reference to
each optimized to predict up to four types of AUG disruptions fault detection and isolation (FDI) techniques [34]. In [33],
(vertical instability, edge cooling, impurity accumulation and the disruption prediction is formalized as a fault detection
beta limit) are analysed. Disruptions occurred in the period problem, where the pulses which are correctly terminated
2005–2009 were divided into the four types according to (non-disruptive pulses) are assumed as the normal operation
the last physical mechanisms leading to them and particular conditions and the disruptions are assumed as status of fault.
attention was paid to the dominant precursors. Only those The normal operation conditions model was built with safe
disruptions which took place in the flat-top phase or within pulses from AUG and the dynamic structure of the data
the first 100 ms of the plasma ramp-down phase (whenever was estimated through the fitting of a multivariate ARX
both the plasma current is above 0.7 MA and the plasma (AutoRegressive with eXogenous inputs) model. The datasets
elongation is greater than 1.5) were considered. Discriminant are composed of time series of the radiated fraction of the
analysis techniques were used to select the most significant total input power, the internal inductance and the poloidal
variables and to derive the discriminant function. A rigorous beta. The disruption prediction system is based on the analysis
performance analysis of the four types of disruptions can be of residuals in the multidimensional space of the selected
found in [30]. variables. The discrepancy between the outputs provided by
The second work [31], describes the prediction of the ARX model and the actual measurements is an indication
disruptions based on diagnostic data of the high-β spherical of process fault (disruption). The predictor was applied to
torus NSTX. To this end, the disruptive threshold values AUG data between 2002 and 2009. Results are promising but
of many signals are examined. The paper explains the lower false alarm rates are needed. It should be noted that the
combination of multiple thresholds to produce a total failure methodology is not applied to conditions ‘from scratch’ but
rate of 6.5% when applied to a database of about 2000 the technique is susceptible of such a development. However,
disruptions during the plasma current flat top. thinking of ITER, the authors state that, perhaps, the method
cannot be applied during the very first pulses of ITER due to
4. Review of predictors from scratch the need of a sufficient number of pulses safely landed.
All the methodologies about disruption prediction that have 5. General requirements of a high learning rate
been reviewed in section 3 absolutely depend on a large predictor from scratch
database of past discharges to create models able to predict
disruptions. However, as it was mentioned in the introduction, As mentioned previously, disruption predictors can be essential
tokamaks such as ITER and DEMO can suffer irreversible systems in ITER and DEMO. Any disruption predictor in
damage as a consequence of disruptions and, therefore, these devices has to be ready to work from the beginning of
disruption prediction from scratch will be an essential part of the operations and has to be able to recognize an incoming
their operation. disruption regardless of both the root cause and discharge
Recently, two different works have dealt with the phases. As it was mentioned, nowadays, plasma theoretical
development of adaptive data-driven predictors from scratch to models are not completely satisfactory for the prediction and,
learn from the data of an individual machine. Both predictors therefore, data-driven models are the only viable option.
are not based on plasma physics behaviours, but on discovering It is important to remark again that a disruption predictor
relations between signals to identify a close disruption. is a pre-requisite for any mitigation system. This fundamental
The first work is related to APODIS. During the three characteristic determines the list of general requirements to
first ILW campaigns, APODIS has proven to achieve high be met for any data-driven DPFS that learns from the own
success rates, low rates of false alarms, early enough disruption experimental data of a fusion device. At least, the following
prediction and no ‘ageing’ effect (figure 3 and table 1). The operational requirements should be taken into account:
word ‘ageing’ refers to the deterioration of a predictor derived learning from scratch, real-time operation, high success rate,
from operating the device in different operational regions from high learning rate, early recognition of disruptions, low rate of
those used for the training [23]. Due to these good results, false alarms, controlled ‘ageing’ effect, predictor simplicity,
APODIS has been used to predict from scratch and to estimate fast training process and reliable predictions.
the minimum number of disruptions to have a reliable predictor
[32]. 1237 JET discharges (1036 non-disruptive and 201 R1. Learning from scratch: data-driven models require a
disruptive) corresponding to the JET ILW campaigns have been training set of samples (the greater the better) with well-known
used in chronological order. The first predictor is created after labels to distinguish among different classes. However, in
the first disruption and re-trainings are carried out after each ITER and DEMO, there will be a complete absence of previous
missed alarm. The main result is that APODIS reproduces its experimental data and the training has to be accomplished in
5
Nucl. Fusion 54 (2014) 123001 J. Vega et al
start of the past ILW campaigns C28–C30 are analysed and, therefore,
mitigation information about false alarms can be obtained.
alarm time disruption
R7. Controlled ‘ageing’ effect: the term ‘ageing’ was defined
in section 4. However, the ‘ageing’ effect in the context of
reaction time time a high learning rate predictor from scratch means that the
predictor has to be an adaptive predictor from the beginning.
A continuous re-training strategy is fundamental to maintain
warning time an updated system able to properly work at any time.
Figure 4. If the reaction time is larger than the warning time, the
R8. Simplicity: this means to develop the simplest possible
mitigation action is late.
classifier to distinguish between disruptive and non-disruptive
an adaptive way as the discharges are produced. This lack of behaviours at any moment. To this end, the disruption
previous information is not only limited to new fusion devices configuration space will have to be characterized with a
but it also happens in existing devices when significant changes reduced number of features compatible with the best possible
are implemented (for example, in JET after the installation of generalization capability.
the metallic wall).
R9. Fast training process: due to the fact that an adaptive
R2. Real-time operation: real-time means a deterministic
classifier is needed to continuously incorporate new relevant
response to meet the time needs of the fusion device. In
information, the training process should be fast enough (from
general, the induction step of figure 2 can be very expensive in
a computational point of view) to allow inter-shot trainings
terms of computational resources (computers and CPU time)
when necessary.
and it is carried out on an off-line basis. However, the real-
time operation requirement is related to the deduction step of R10. Reliable predictions: any training process with low
figure 2. number of samples is an issue. For this reason, each
R3. High success rate: this is equivalent to demand a low individual prediction should be qualified with estimation about
rate of missed alarms. This is a consequence of the negative its reliability.
implications that a missed alarm can have in a device. ITER
needs a success rate 95%. In general, the level of suitability of a candidate as
predictor from scratch depends on its compliance with
R4. High learning rate: the previous requirement is not the previous requirements. But together with the above
enough for a DPFS that has to be used from the beginning operational requirements, certain training specificities have
of the device operation. The development of predictors with to be considered in the development of general high learning
low learning rates is of low practical interest. The requirement rate predictors from scratch. These specificities are related to
R3 has to be achieved in the least possible time and this means
the inductive step of figure 2 and can be seen as constraints
that the predictor has to demonstrate a high learning rate.
to take into account in the training process: chronological
R5. Early recognition of disruptions: this requirement is order of training data, need of general predictors regardless
essential in the perspective of mitigation actions. In any of disruption types, independency of plasma scenarios and,
disruption mitigation technique, a delay between the predictor finally, predictor applicability from plasma breakdown to
alarm and the start of the effective mitigation action (reaction extinction.
time) is inevitable. One of the objectives of a mitigation system Chronological order of training data: in predictors from
is to minimize this time and it is clear that the reaction time scratch, the available information for training purposes is very
has to be lesser than the warning time (figure 4). limited and it depends crucially on the chronological order of
R6. Low rate of false alarms: obviously, it is always possible the discharges. This means that the experimental program of
to design an extremely sensible predictor that triggers an a new device has to envisage a discharge sequence that takes
alarm when the minimum sign of disruptive behaviour appears. into account the necessary continuous learning.
Such level of sensitivity can result incompatible with standard Need of general predictors regardless of disruption types:
operation of the devices, because all discharges could end in the main objective of the training process is to obtain the most
a premature way rendering the operation very tedious and general possible predictor that is independent of disruption root
possibly preventing the achievement of high performance. causes but is valid for all of them.
Therefore, a trade-off is necessary between the disruption risk Independency of plasma scenarios: the training set has
and the operation interruption in such a way that the false to incorporate details about any type of discharges thereby
alarms are minimized. avoiding a specialized predictor on specific plasma currents,
It is important to note that in an on-line implementation of magnetic fields, densities, temperatures and so on.
a real-time disruption predictor, the concept of false alarm Predictor applicability from plasma breakdown to
does not make sense. When a predictor triggers an alarm, the extinction: the successive predictors have to work during the
tokamak control system has to activate mitigation actions and total duration of a discharge and, therefore, the training with
it is not possible to know whether or not the alarm was false. In past discharges has to cover not only plateau regimes but also
this article, the JET database of all discharges corresponding to ramp-up, ramp-down and plasma transient regimes.
6
Nucl. Fusion 54 (2014) 123001 J. Vega et al
This article describes the design and development of a Training set Prediction
particular DPFS. Its initial objective is to provide a trustworthy
disruption predictor in situations with absolute lack of previous Figure 5. Conformal predictors do not follow the
information. The aim is to learn in a continuous way and induction/deduction steps. All training samples are used with each
to start the predictions as soon as possible. In this case, new prediction.
the first predictor will include features of both disruptive and
non-disruptive discharges. This means that the first predictor parametric forms for the likelihood. In these circumstances, to
can be developed after having at least information about one avoid unjustified assumptions about the form of the pdf, non-
disruptive discharge and one non-disruptive discharge. parametric approaches are used. Among the non-parametric
All general requirements (R1–R10) have been taken into probability density estimators, the Parzen window method [35]
account to create the predictor. However, requirement R10 is the most popular. It is based on kernel methods but it needs a
(reliable predictions) has had a major impact in the search minimum number of samples to produce reliable estimations.
of mathematical approaches to qualify each prediction with a Due to the fact that the first predictors will not have available
measure of reliability. enough number of samples, Bayesian classifiers have been
Probabilistic predictors are particularly suited to these discarded.
purposes. However, taking into account that the first Other probabilistic classifiers such as SVM binning, SVM
predictions will be based on a very reduced number of training isotonic regression or the Platt method are known that do not
examples, it is important to note that just a probability could be always provide well calibrated outputs (in other words, do not
misleading. It is necessary to determine a probability interval provide a frequentist justification) [36]. Therefore, they have
for each prediction. The probability interval will give a clear not been considered.
idea about the efficiency of the prediction. The smaller the Finally, Venn predictors are machine learning algorithms
prediction interval the more efficient is the prediction. For that provide a probability prediction interval for each
example, a classification with a prediction interval [0, 1] is prediction and they also provide well calibrated outputs under
completely uninformative and a prediction interval [0.61, 0.62] the i.i.d. assumption [12]. The well calibrated probabilistic
is better than a prediction interval [0.6, 0.7]. outputs mean that the accuracy of the Venn predictors is
This section is split into two subsections. The first expected to fall within the lower and upper probability intervals
one is devoted to briefly reviewing potential probabilistic and, therefore, the accuracy rates of the predictions do not drop
classifiers. The second one provides a general description below the minimum probabilities given by the Venn predictors
of the probabilistic classifier chosen for the particular DPFS (up to statistical fluctuations). This kind of predictors has been
implemented here. This implementation is valid for ITER the choice for the DPFS.
and DEMO and has been tested with JET discharges after the
installation of the metallic wall.
6.2. Formulation of a Venn predictor
6.1. Review of potential probabilistic classifiers Venn predictors belong to the family of conformal predictors
[37, 38]. Conformal predictors, unlike other state-of-the-art
Several probabilistic classifiers can be proposed. A classifier
machine learning methods, provide information about their
based on the Bayes decision theory has been considered. In
own accuracy and reliability. Conformal predictors do not
this framework, given a classification task of two classes, ω1
follow the inductive/deductive steps of figure 2. Instead
and ω2 (disruptive and non-disruptive), and a sample to classify
of using a batch of old samples (training set) to produce
(x, y) with known feature vector x but unknown label y, the
a prediction rule (model) that is applied to new samples,
Bayes rule states
conformal predictions use an alternative framework that makes
p (x|ωk ) P (ωk ) predictions sequentially, basing each new prediction on all
P (ωk |x) = , k = 1, 2.
p (x) the previous samples. In this way, to make a prediction
with a new sample, a shortcut is taken in figure 2 and the
This Bayes formula gives the posterior probability that the
classification process moves from the training samples directly
unknown pattern belongs to the respective class ωk , given
to the prediction (figure 5). From a mathematical point of view,
that the corresponding feature vector takes the value x.
conformal predictors are always valid. This means that the
The probability distribution function (pdf) p (x|ωk ) is the
long-run frequency of errors at confidence level 1 − ε is ε or
likelihood function of ωk with respect to x. P (ωk ) is the a
less.
priori probability of class ωk and p (x) is the pdf of x given by
Focusing the attention on the formulation of a Venn
2 predictor [12], the starting point is a training set whose only
p ( x) = p (x|ωi ) P (ωi ). assumption about the examples is that they satisfy the i.i.d.
i=1 hypothesis. Let {z1 , . . . , zn−1 } be the training set, where each
In order to apply the Bayes rule, the likelihood and the zi ∈ Z = X × Y is a pair (xi , yi ) consisting of the sample
prior probability of each class must be known. However, or feature vector xi and its class yi . Given a new feature
in the particular case of disruption predictors from scratch, vector xn , the objective of a Venn predictor is to estimate the
the estimation of the likelihood is an issue. The main probability
of belonging
to one class Yj from a set of J classes
problem resides in the fact that there are no evidences of Yj ∈ {Y1 , . . . , YJ } .
7
Nucl. Fusion 54 (2014) 123001 J. Vega et al
8
Nucl. Fusion 54 (2014) 123001 J. Vega et al
has to be detailed. In machine learning, a ‘training set’ is Table 2. List of signals to characterize the disruptive/non-disruptive
typically a dataset made up of examples with well-known status of JET plasmas.
labels. In the case of non-conformal predictors, the ‘training Signal name Acronym Units
set’ is used to obtain a model (figure 2) and from that moment,
Plasma current Ip A
the training set can be ignored because only the model is Mode locked amplitude ML T
necessary to make predictions. However, with Venn predictors, Plasma internal inductance LI
there is not a model and, therefore, the ‘training set’ cannot Plasma density Ne m−3
be ignored. The ‘training set’ is always used together with Stored diamagnetic energy time derivative dW /dt W
each new sample to classify (figure 5). Due to this reason, in Radiated power Pout W
Total input power Pin W
the following, the term ‘model’ is no longer used. Instead,
‘training set’ is utilized.
Focusing the attention on the Venn predictor of this paper, applied to a database of JET discharges and, therefore, the
it should be noted that the prediction of the disruptive/non- predictor performances are assessed after the analysis of the
disruptive behaviour of a discharge has to be carried out at predictions.
regular time intervals during the execution of the discharge. In Requirements R8 and R9 are discussed together due
each time interval, a feature vector is generated with several to their close relation in the DPFS implementation. First
plasma quantities and this feature vector is used together with of all, it is important to remember that a Venn predictor
the ‘training set’ to make the prediction. The ‘training set’ is does not use the deductive/inductive steps of figure 2. To
the dataset that is generated after each re-training. Therefore, make predictions, Venn predictors jointly use all previous
it is important to establish that in the case of a DPFS based on samples and the new sample to classify. Consequently,
a Venn predictor, the term ‘re-training’ means to create a new depending on both the total number of samples and the
‘training set’. feature vectors dimensionality, the Venn predictor framework
The following subsections are devoted to discussing the can be computationally very expensive and, therefore, the
specific implementation details of the Venn DPFS. Once computational needs can be an issue under real-time requisites.
an explicit selection of a probabilistic predictor has been This implies to handle the least possible number of samples
performed under requirement R10, section 7.1 discusses the with a reduced dimensionality. All aspects related to this are
particular aspects of requirements R1–R9. Sections 7.2, 7.3 treated in sections 7.2, 7.3 and 7.4.
and 7.4 describe the precise choices to optimize the issues The requirements R3–R6 are final objectives and they
related to the real-time response of the DPFS. Finally, cannot be used to impose ‘a priori’ implementation constraints.
section 7.5 presents the implementation of the predictor from Results about the success rate (R3), number of disruptions
scratch developed in this article. to have a reliable classifier (R4), average warning time (R5)
and false alarm rate (R6) can only be assessed after analysing
7.1. Analysis of requirements R1–R9 a particular implementation of a DPFS. If the predictor
evaluation is not good enough in terms of requirements R3–
To deal with the issue of learning from scratch (requirement R6, the specific choices performed in requirements R1, R2,
R1), an important consideration about the training examples is R7–R9 will have to be reviewed.
needed. In machine learning, it is very difficult in most cases
to achieve good results without presenting examples of all the
7.2. Use of a reduced number of features in the DPFS
classes. This means that the first training process of the DPFS
requires at least one disruptive example and one non-disruptive To satisfy the requirement of simplicity (R8), the predictor
example. It is important to emphasize that the term ‘example’ architecture consists of just one single classifier instead of
is not necessarily equivalent to ‘discharge’, i.e. one discharge using a multi-layer architecture as APODIS. This has been
could provide lots of examples and, conversely, one example decided to avoid the extra computation of several classifiers
could summarize data corresponding to several discharges. plus their combination into a single output.
Concerning the real-time response of the DPFS The starting point has been to select a reduced number of
(requirement R2), a Venn predictor has been chosen. However, signals to identify disruptive and non-disruptive behaviours.
it is well known that the major issue of any conformal predictor Due to the characteristic of ‘starting from scratch’, only those
is usually the computation time. To overcome this crucial quantities that are considered essential diagnostics from the
difficulty, the selection of, first, a reduced number of features, start of a fusion device have been considered. In particular,
second, the concentration of knowledge about disruptive and the same signals that are used by APODIS in JET (table 2)
non-disruptive behaviours and, third, the choice of a proper have been chosen.
Venn prediction taxonomy are essential points. These three After this decision, it is necessary to take advantage
aspects are discussed in sections 7.2, 7.3 and 7.4 respectively. of some knowledge derived from the use of APODIS.
The ageing effect of a classifier from scratch (requirement APODIS has shown that mean values (in the time
R7) has to be tackled in the perspective of defining a re- domain) and standard deviations (in the frequency domain)
training strategy to maintain an updated system. In the present corresponding to temporal windows of 32 ms provide a
implementation of the DPFS, the policy has been to only re- powerful parameterization of the disruption configuration
train after a missed alarm. This criterion is the closest one to a space. Therefore, the new predictor will also use 32 ms long
real on-line operation in which it is not possible to determine temporal windows (with sampling periods of 1 ms for all
whether or not an alarm is true. In this paper, the DPFS is the signals), as basic time segments inside which the mean
9
Nucl. Fusion 54 (2014) 123001 J. Vega et al
10
Nucl. Fusion 54 (2014) 123001 J. Vega et al
Discharge 1 the decision of using balanced ‘training sets’, ideally, the first
predictor should be put into operation after the first disruption
... ... ... ... (typically, when a tokamak starts the operation, a number
Feat 1 Feat 2 ... Feat n 32 ms of k1 non-disruptive discharges are produced before the first
Feat 1 Feat 2 ... Feat n 32 ms disruptive one). These k1 +1 discharges will be used to generate
the initial training set. The Venn predictor is used for the first
... ... ... ...
time with the discharge k1 + 2. During this discharge, feature
. . . vectors to assess the plasma behaviour are calculated every
32 ms and each feature vector, together with the initial ‘training
Discharge K1 set’, is used to make predictions. The initial ‘training set’, with
both disruptive and non-disruptive discharges, remains valid
... ... ... ...
until it misses an alarm. At this moment, it is necessary to
Feat 1 Feat 2 ... Feat n 32 ms re-train the predictor, which means to generate a new ‘training
Feat 1 Feat 2 ... Feat n 32 ms set’. Algorithm 1 represents in pseudo-code the DPFS working
... ... ... ... scheme from the beginning of the tokamak operation.
Algorithm 1: pseudo-code of the DPFS predictor. The
Venn prediction output depends on both the ‘training set’ and
the ‘feature vector’ to classify at each time instant. When a new
mean(col 1) mean(col 2) ... mean(col n) ‘training set’ is set-up, it includes all the previous samples plus
knowledge about the last missed alarm and the safe discharges
Figure 9. The example that describes a safe behaviour in the first
training set is a row vector whose components are the mean values after the previous training.
of the different features between the first non-disruptive discharge
and the last one previous to the first disruption. Start of tokamak operation
K1 non-disruptive discharges
feature vectors, knowledge about both the missed alarm and First disruptive discharge
the non-disruptive discharges from the first ‘training set’. The First ‘training set’
information to include relatively to the missed alarm is the last Prediction loop
feature vector previous to the missed disruption (see figure 8).
Start of discharge
With regard to the new feature vector corresponding to non-
disruptive behaviours, an ‘average behaviour’ feature vector, While Ip > 750 kA and every 32 ms
whose components are the mean value of the components of Form ‘feature vector’
all non-disruptive discharges from the preceding training up prediction = Venn predictor output
to the last safe discharge previous to the missed disruption, is • ‘training set’
added. This criterion ensures balanced training sets between • ‘feature vector’
disruptive and non-disruptive samples. If prediction = = ‘disruptive’
However, it is necessary to note that this balance can
break
be altered under a specific condition. If two or more
consecutive discharges are disruptive and the predictor fails End of If prediction
in the recognition, there are not safe discharges between the End of While Ip
contiguous disruptive ones and, therefore, only the feature End of discharge
vectors of each disruptive discharge previous to the disruptions If prediction = = ‘non-disruptive’ & it is false
will be included. Missed alarm
New ‘training set’
7.4. Venn predictor taxonomy • Previous training set
The next aspect to consider is the choice of the Venn taxonomy • Knowledge about
for the DPFS. The disruptive/non-disruptive behaviours ideally ◦ the missed alarm
should be condensed in two regions of the parameter space ◦ safe discharges from the last training
that are well separated between them. This condition looks for End of If prediction
achieving enough generalization capability with the predictor,
End of Prediction loop
independently of the number of feature vectors in the training
set. To concentrate as much as possible the information of both Evaluation of success rate and false alarm rate
behaviours in just two points, the Venn predictor uses the NCT
To conclude this section, it should be emphasized that
that was described in section 6.2.
the selections made in this article about training samples,
number of features and NCT, allow computing each individual
7.5. DPFS algorithm
prediction in less than 32 ms. This is necessary to ensure that
In relation with the DPFS, it is necessary to emphasize the there are no pending predictions when a new feature vector is
importance of making predictions as soon as possible, i.e. ready to be classified. In fact, the computation time for each
with the minimum number of examples. Taking into account prediction takes less than 0.1 ms [43].
11
Nucl. Fusion 54 (2014) 123001 J. Vega et al
Table 3. Feature identification. Acronyms are related to table 2. Table 4. nfeat: number of features in the predictors. C14,nfeat :
mean(.) represents the mean value during the time window of 32 ms. number of predictors developed with 14 features taken nfeat at a
std(|fft(.)|) signifies the standard deviation of the Fourier spectrum time. ASR: average success rate corresponding to all predictors with
during the time window of 32 ms (the dc component has been nfeat features. AFAR: average false alarm rate corresponding to all
removed). classifiers with nfeat features.
Feature id Definition nfeat C14,nfeat ASR (%) AFAR (%)
1 mean (Ip) 2 91 96.68 ± 2.18 70.30 ± 30.68
2 std (|fft(Ip)|) 3 364 96.79 ± 2.13 67.82 ± 31.37
3 mean (ML) 4 1001 96.37 ± 1.97 58.19 ± 30.68
4 std (|fft(ML)|) 5 2002 96.34 ± 1.76 54.94 ± 27.81
5 mean (LI) 6 3003 96.38 ± 1.56 52.97 ± 24.67
6 std (|fft(LI)|) 7 3432 96.47 ± 1.39 51.53 ± 21.51
7 mean (Ne)
8 std (|fft(Ne)|)
9 mean (dW /dt) best two predictors for each number of features (different cell
10 std (|fft(dW /dt)|)
11 mean (Pout) backgrounds allow the distinction on the number of features).
12 std (|fft(Pout)|) The first result to emphasize is the achievement of the
13 mean (Pin) same success rate, 94%, for almost all predictors. The lower
14 std (|fft(Pin)|) bound of false alarms is 4.21% and the average prediction
probability is high (the minimum value of the error bar is
0.604 and the maximum one is always 1). This high prediction
8. Results probability gives enough confidence about the reliability of the
predictors in spite of the low number of training samples that
Algorithm 1 has been implemented and it has been applied
have been used. The use of probabilistic classifiers has been
to a database of 1237 JET discharges (1036 safe discharges
contemplated just to validate the prediction results as long as
and 201 unintentional disruptions) corresponding to the three
the prediction probabilities are high enough.
first JET experimental campaigns after the installation of
According to table 5, the best candidates for predictions
the metallic wall. The first predictor is obtained after the
are the ones with false alarm rates of 4.21%. Among the
first disruption and, from that moment, all discharges are
analysed in chronological order. Each discharge is analysed five candidates, the predictors with lower number of features
by simulating a real-time data processing. Feature vectors are are preferred (simplicity requirement R8). This reduces the
created every 32 ms and they are classified as disruptive or non- possibilities to only two options. The final selection between
disruptive. After a missed alarm, a new ‘training set’ is created them is favourable to the predictor with the smaller probability
to incorporate new knowledge as previously explained. error bar (the predictor with features 2, 3, 4 and 5) because it
Table 3 shows the 14 features used with the Venn means a more accurate prediction.
predictors. All possible combinations between 2 and 7 features Figure 10 corresponds to the results of the predictor with
have been tested (9893 predictors have been developed). Given features 2, 3, 4 and 5. Figure 10(a) shows the evolution
a specific combination of features, the algorithm 1 is executed of the success rate with the number of disruptions. It is
for the whole dataset of discharges. The first predictor is important to observe the fast learning rate achieved in the
generated after the first disruption and it is used with all learning process, as a success rate of 90.91% is reached with
posterior discharges, subject to re-train after every missed a training set made up of only 2 disruptive examples and 2
alarm. The statistics presented in this section correspond to non-disruptive examples. The squares in the figure show the
the evaluation of the predictors after the analysis of the 1237 missed alarms. Figure 10(b) represents the evolution of the
discharges. In other words, the success and false alarm rates false alarm rate. The false alarm rate tends to slightly increase
are the cumulative results of algorithm 1 after 1237 discharges. with the number of disruptions. Last but not least, figure 10(c)
It is important to note that the rates are expressed in terms of gives the evolution of the prediction probability as the system
discharges and not in terms of the number of feature vectors learns. The prediction probability increases and the error bars
classified. This means that one discharge is classified as safe diminish (which means more accurate predictions) with the
when all its feature vectors are classified as non-disruptive. number of disruptions.
Also, one discharge is recognized as disruptive when just 1 The comparison of these results with the APODIS version
feature vector is classified as disruptive. Finally, a single from scratch described in [32] is favourable to the Venn
feature vector classified as disruptive in a safe discharge is predictors. The Venn predictors do not need tens of disruptions
enough to identify the event as a false alarm. to achieve stable and good enough success and false alarm rates
Table 4 shows the average success and false alarm as in the case of APODIS from scratch [22] (figure 11). One
rates corresponding to the different predictors developed with important issue of the APODIS from scratch version is the
different number of features. Globally, the success rates are need of about 40 disruptions to provide good performances.
above 96% but the false alarm rates are too high. This rate This problem is avoided with the Venn predictors, as it can be
decreases with the number of features but not sufficiently. seen by comparing the success and false alarm rates of figures
According to these results, the selection of a specific predictor 10(a), 10(b) and 11.
should not to be based on the success rate (quite high in all From table 5 it is clear that both the stored diamagnetic
cases) but on a combination of high success rate and false energy time derivative and the total input power do not play
alarm rate (the smaller the better). Table 5 provides the any role in the prediction from scratch. The most important
12
Nucl. Fusion 54 (2014) 123001 J. Vega et al
Table 5. Success rate (SR), false alarm rate (FA) and average prediction probability (AVP) for several combination of feature vectors.
The numbers to identify features correspond to the ones of table 3.
Feature id
1 2 3 4 5 6 7 8 9 10 11 12 13 14 SR (%) FA (%) AVP
x x 94.00 4.70 0.813 ± 0.187
x x 92.50 5.09 0.831 ± 0.169
x x x 94.00 4.31 0.809 ± 0.191
x x x 94.00 4.70 0.813 ± 0.187
x x x x 94.00 4.21 0.811 ± 0.189
x x x x 94.00 4.21 0.810 ± 0.190
x x x x x 94.00 4.21 0.811 ± 0.189
x x x x x 94.00 4.21 0.803 ± 0.197
x x x x x x 94.00 4.21 0.803 ± 0.197
x x x x x x 94.00 4.31 0.810 ± 0.190
x x x x x x x 94.00 4.31 0.802 ± 0.198
x x x x x x x 94.00 4.31 0.802 ± 0.198
13
Nucl. Fusion 54 (2014) 123001 J. Vega et al
Table 6. The column ‘Features’ identifies the features from table 3. deviation is 205 ms. It should be emphasized that these results
AWT is the average warning time of the different predictors and correspond to an approach from scratch in which the first
AF30 is the accumulative fraction of detected disruptions 30 ms
predictor starts to operate from the first disruption and only
before the disruption.
12 disruptions are missed after 1237 discharges (1036 non-
Features AWT (ms) AF30 (%) disruptive and 201 disruptive), i.e. 12 re-trainings have been
3, 4 654.564 83.0 performed.
2, 4 732.281 79.0
2, 3, 4 641.798 83.0
3, 4, 11 654.564 83.0 9. Discussion
2, 3, 4, 5 654.394 83.0
2, 3, 4, 11 653.884 83.0 This article has shown the viability of developing high learning
2, 3, 4, 5, 11 654.394 83.0 rate disruption predictors from scratch. In the particular
2, 3, 4, 7, 8 672.947 83.5 implementation carried out in the article, the prediction
2, 3, 4, 7, 8, 11 672.947 83.5 probabilities are high enough to guarantee the reliability of
2, 3, 4, 5, 11, 12 707.330 83.0
2, 3, 4, 5, 7, 8, 11 673.628 83.5 the predictors. After applying the algorithm 1 to 1237 JET
2, 3, 4, 7, 8, 11, 12 725.883 83.5 discharges (1036 safe and 201 disruptive) and using only four
features from three different signals (plasma current, mode
lock and plasma internal inductance), the results of predicting
from scratch give a success rate of 94%, a false alarm rate of
4.21% and a warning time of 654 ms. If the success rate is
expressed in terms of the tardy detection rate, the valid alarm
rate and the premature alarm rate, the corresponding values
are 11%, 76% and 7% respectively.
As discussed in section 8, we do not believe that the above
7% of premature predictions are really premature alarms.
However, if they are considered as such, the predictor results
can be summarized in 76% of true positive detections and 4%
of false positive detections. However, to assess these numbers,
several facts have to be taken into account:
Figure 12. Warning times from the Venn predictor with features 2,
3, 4 and 5. (a) The predictions start from scratch just with one disruptive
example and one non-disruptive example. The predictor
is re-trained after each missed alarm. Table 7 summarizes
time in JET to perform mitigation actions. This figure should
the generation of training sets after the missed alarms. It is
be compared to figure 1 that shows the results with APODIS
important to note that the first predictor (grey background
in JET.
in the table) is generated after the first disruption.
The plots of warning times for the rest of predictors that
(b) All discharges are considered in chronological order
appear in table 5 are very similar to the one shown in figure 12.
without filtering any type of discharge.
It should be emphasized that 7% of detected disruptions have
(c) The training data cover not only plateau regimes but also
a warning time greater than 1 s. However, we do not think
ramp-up, ramp-down and plasma transient regimes.
that these predictions are premature alarms. A preliminary
(d) All discharges are analysed from the plasma breakdown
analysis of this 7% of discharges has revealed that, in the last
to the end.
second before the disruptions, the quality of the plasmas has
(e) All disruptive discharges are processed regardless of
degraded significantly, basically due to the influxes of W from
type and temporal location of the disruption within the
the divertor. This observation has two main consequences.
discharge.
First of all, this degradation of the plasma performance is
(f) Only three signals are necessary (plasma current, mode
an indication that the predictor is really detecting a physical
lock and plasma internal inductance), which are standard
evolution of the plasma and therefore, these cases cannot be
signals in any tokamak.
simply considered premature alarms. Second, terminating the
(g) Only real-time versions of signals are used.
discharges at the moment of the predictor alarm would not
cause any loss of valuable experimental time, given the already Therefore, according to these facts, the rate of true positive
very poor quality of the discharges at the moment of the alarm and false positive are comparable to other predictors (1) not
itself. However, if these considerations are not taken into trained from scratch (a large dataset of past shots are typically
account, the Venn predictors would show a premature alarm used) (2) that carry out some kind of discharge selection (in
rate of 7%. The threshold of 1 s to define potential premature both disruptive and non-disruptive shots) for training/testing
alarms in JET is based on both figures 1 and 12: a warning purposes (3) based on processed versions of signals (free of
time of 1 s defines a clear frontier between the two slopes of the real-time issues) and (4) with a larger dataset of signals.
the accumulative fraction of detected disruptions. An important aspect to train any kind of disruption
Finally, by considering that the tardy detection rate is 11% predictor is the problem of distinguishing between disruptive
(figure 12) and taking into account the premature alarm rate and non-disruptive samples to assign a proper label. Generally
of 7%, a valid alarm rate of 76% is obtained. The AWT in a disruptive discharge, a time interval before a disruption
corresponding to the valid alarms is 244 ms and its standard is assumed as ‘disruptive’ and all feature vectors inside this
14
Nucl. Fusion 54 (2014) 123001 J. Vega et al
Table 7. The column shows the number of disruption, in standard deviation of the power spectrum after removing the
chronological order, with missed alarms. The row corresponding to dc component) is good enough to describe disruptive and
disruption number 1 is not a missed alarm. It allows the generation non-disruptive behaviours. This fact is not new and it has
of the first classifier to start to make predictions.
been used from the first versions of APODIS. Secondly, the
#Disruption particular choice of feature vectors should be mentioned.
1 The feature vector previous to a missed alarm as disruptive
2 example, together with the selection of an average vector
13 to summarize a non-disruptive behaviour, allows condensing
25 the disruptive/non-disruptive information in a convenient
27 way. Thirdly, the use of a NCT has provided a high
40
85 generalization capability to distinguish between both types of
93 plasma behaviours.
101 However, the increasing of the false alarm rate with the
122 number of disruptions has to be commented. One would expect
128 the false alarm rate to decrease (or to maintain) as the training
161
183
samples increase. However, this does not happen. Figure 10(b)
shows that the false alarm rate is about 2% until the disruption
#40 (approximately). From this moment, it shows a positive
interval are classified as ‘disruptive’. Of course, the interval slope tendency. This behaviour is similar in all the predictors
length selection is somehow arbitrary and could introduce shown in table 5. Therefore, the problem does not seem to
ambiguous information in the model construction. However, in be related to the parameter space dimensionality, but with
the present DPFS, the potential ambiguity to assign ‘disruptive’ the number of training examples. A low number of training
examples is not enough to have an accurate separation in the
and ‘non-disruptive’ labels does not exist. Four points should
parameter space to distinguish between the disruptive and non-
be emphasized in this respect. Firstly, only one feature per
disruptive behaviours. It is important to remind that just after
missed disruptive discharge has been considered. The feature
the fifth missed alarm (disruption #40), the training set is made
vector is the one just before the disruption. Secondly, when a
up of only six disruptive examples and six non-disruptive
disruptive discharge is recognized by the predictor, no feature
examples. At the end of the 1237 discharges, the training
vectors from it (neither disruptive nor non-disruptive class) are
set contains 13 disruptive examples and 13 non-disruptive
taken into account for re-trainings. Thirdly, when the predictor
examples. These training sets have shown to be efficient in the
triggers a false alarm, also no feature vectors from the discharge
recognition of disruptive behaviours but, apparently, need to
are used for re-trainings. Last but not least, feature vectors of
incorporate more information about non-disruptive examples
class ‘non-disruptive’ are formed with mean values of samples
to best delimit the separation frontier between both behaviours.
that belong to the ‘non-disruptive’ class. Therefore, it is clear These numbers of training examples (13 safe and 13 disruptive)
that in the predictor described in this article, the choice of a have to be compared with the ones used by APODIS: 2 035 000
time interval to assign labels has been avoided. In this way, safe and 738 disruptive.
the issue of including ambiguous information in the model It should also be mentioned that the union of the Venn
creation is overcome. predictors with APODIS from scratch is a valid possibility to
The results obtained with the Venn predictor from scratch predict without previous information. Venn predictors can be
are absolutely comparable to the ones obtained with APODIS. the first predictors up to about 50 disruptions. Figure 10(b)
However, there exists an important difference. APODIS was shows a false alarm rate of 2% for these predictors. From this
trained with 246 disruptive discharges and 4070 non-disruptive moment, APODIS from scratch can be used because with 50
discharges. Taking into account that 3 feature vectors per disruptions there is enough information to provide high success
disruptive discharge were used with label ‘Disruptive’, a rates and low false alarm rates (figure 11).
number of 738 disruptive samples were necessary to train The Venn predictors described in this article have been the
APODIS. In addition, about 500 feature vectors per safe first adaptive predictors from scratch that are reliable from the
discharge were used with label ‘Non-disruptive’, which gives first disruption and provide good results in terms of success
2 035 000 safe samples in total. For a fair comparison with rate, false alarm rate and warning time at least during the
the present Venn predictors, the number of safe and disruptive first 40 disruptions, but more research is needed. The several
training samples, 2 035 000 and 738 respectively, has to be predictors developed with different number of features satisfy
compared with the samples used in the adaptive Venn predictor. all operational requirements discussed in section 5. However,
The first Venn predictor uses just one disruptive feature vector it should be noted that the results obtained for the three first
and one non-disruptive feature vector and a very limited ILW campaigns of JET do not ensure similar results for ITER.
number of feature vectors is included during the posterior re- The prediction from scratch is an open research line
trainings after the missed alarms. According to table 7, only in nuclear fusion in the perspective of ITER/DEMO and
12 re-trainings were needed and the final ‘training set’ after important efforts should be devoted to them. As a first
12 missed alarms consists of just 13 disruptive feature vectors suggestion in the line of this article, it should be mentioned
and 13 non-disruptive feature vectors. the need of improving the predictions not only to increase
The good results obtained with so few training feature the valid alarm rates but also to reduce the false alarm rates.
vectors have to be interpreted as a combination of several In any case, it seems that the number of disruptive and non-
factors. Firstly, the feature selection (mean values and disruptive examples have to be larger for a better knowledge of
15
Nucl. Fusion 54 (2014) 123001 J. Vega et al
the operational space (disruptive and non-disruptive regions). [16] Kohonen T. 1998 The self-organizing map Neurocomputing
This new selection of training examples can be carried out by 21 1–6
[17] Bishop C.M., Svensén M. and Williams C.K.I. 1998 The
applying active learning techniques [44–46] in order to select generative topographic mapping Neural Comput. 10 215–34
the more informative examples to learn in the fastest way. As [18] Buttery R.J. et al and the DIII-D Team 2012 Phys. Plasmas
a second point, it is important to note that ITER is going to 19 056111
have different time-scales to JET and, therefore, an analysis [19] Wesson J.A. et al 1989 Nucl. Fusion
about the influence of sampling times in the warning times 29 641–66
[20] Nave M.F.F. and Wesson A. 1990 Nucl. Fusion 30 2575–83
is necessary and, perhaps, it could show a significant effect. [21] Cannas B., Fanni A., Sonato P., Zedda M.K. and JET-EFDA
Finally, another very important aspect is the validation of the Contributors 2007 Nucl. Fusion 47 1559–69
approach with different isotopic compositions of the main [22] de Vries P.C., Johnson M.F., Alper B., Buratti P., Hender T.C.,
plasma. Indeed, it must be considered that ITER operation Koslowski H.R., Riccardo V. and JET-EFDA Contributors
will include different fuels mixtures, from H and He to full D– 2011 Nucl. Fusion 51 053018
[23] Rattá G.A., Vega J., Murari A., Vagliasindi G., Johnson M.F.,
T, and the performance of adaptive predictors in these varying de Vries P.C. and JET-EFDA Contributors 2010 An
conditions should be carefully assessed. advanced disruption predictor for JET tested in a simulated
real time environment Nucl. Fusion 50 025005
[24] Cannas B., Fanni A., Marongiu E. and Sonato P. 2004 Nucl.
Acknowledgments Fusion 44 68–76
[25] Sengupta A. and Ranjan P. 2000 Forecasting disruptions in the
This work was partially funded by the Spanish Ministry ADITYA tokamak using neural networks Nucl. Fusion
40 1993–2008
of Economy and Competitiveness under the Projects No [26] Cannas B., Fanni A., Pautasso G., Sias G. and Sonato P. 2010
ENE2012-38970-C04-01 and ENE2012-38970-C04-03. This Nucl. Fusion 50 075004
work, supported by the European Communities under the [27] Cannas B., Fanni A., Pautasso G., Sias G. and the ASDEX
contract of Association between EURATOM/CIEMAT, was Upgrade Team 2011 Fusion Eng. Des. 86 1039–44
carried out within the framework of the European Fusion [28] Murari A., Vagliasindi G., Arena P., Fortuna L., Barana O.,
Johnson M. and JET-EFDA Contributors 2008 Prototype
Development Agreement. The views and opinions expressed of an adaptive disruption predictor for JET based on
herein do not necessarily reflect those of the European fuzzy logic and regression trees Nucl. Fusion
Commission. 48 035010
[29] Vagliasindi G., Murari A., Arena P., Fortuna L., Johnson M.,
Howell D. and JET-EFDA Contributors 2008 A disruption
References predictor based on fuzzy logic applied to JET database
IEEE Trans. Plasma Sci. 36 253–62
[1] Esposito B. et al and FTU, ECRH and ASDEX Upgrade [30] Zhang Y., Pautasso G., Kardaun O., Tardini G., Zhang X.D.
Teams 2009 Nucl. Fusion 49 065014 and the ASDEX Upgrade Team 2011 Nucl. Fusion
[2] Hender T.C. et al and the ITPA MHD, Disruption and 51 063039
Magnetic Control Topical Group 2007 Progress in the ITER [31] Gerhardt S.P., Darrow D.S., Bell R.E., LeBlanc B.P.,
physics basis: chapter 3. MHD stability, operational limits Menard J.E., Mueller D., Roquemore A.L., Sabbagh S.A.
and disruptions Nucl. Fusion 47 S128–202 and Yuh H. 2013 Nucl. Fusion 53 063021
[3] Reux C., Bucalossi J., Saint-Laurent F., Gil C., Moreau P. and [32] Dormido-Canto S., Vega J., Ramı́rez J.M., Murari A.,
Maget P. 2010 Nucl. Fusion 50 095006 Moreno R., López J.M., Pereira A. and JET-EFDA
[4] Commaux N., Baylor L.R., Jernigan T.C., Hollmann E.M., Contributors 2013 Nucl. Fusion 53 113001
Parks P.B., Humphreys D.A., Wesley J.C. and Yu J.H. 2010 [33] Aledda R., Cannas B., Fanni A., Sias G., Pautasso G. and
Nucl. Fusion 50 112001 ASDEX Upgrade Team 2013 Multivariate statistical models
[5] Bakhtiari M., Olynyk G., Granetz R., Whyte D.G., for disruption prediction at ASDEX Upgrade Fusion Eng.
Reinke M.L., Zhurovich K. and Izzo V. 2011 Nucl. Fusion Des. 88 1297–301
51 063007 [34] Patton R.J., Frank P. and Clark R. 1989 Fault Diagnosis in
[6] Commaux N. et al 2011 Nucl. Fusion 51 103001 Dynamic Systems. Theory and Applications (Control
[7] Lehnen M. et al and JET EFDA Contributors 2011 Nucl. Engineering Series) (Englewood Cliffs, NJ: Prentice-Hall)
Fusion 51 123010 [35] Duda R.O., Hart P.E. and Stork D.G. 2001 Pattern
[8] Boozer A.H. 2012 Phys. Plasmas 19 058101 Classification 2nd edn (New York: Wiley-Interscience)
[9] Windsor C.G., Pautasso G., Tichmann C., Buttery R.J., [36] Lambrou A., Papadopoulos H., Nouretdinov I. and
Hender T.C., JET-EFDA Contributors and the ASDEX Gammerman A. 2012 Reliable probability estimates based
Upgrade Team 2005 Nucl. Fusion 45 337–50 on support vector machines for large multiclass datasets
[10] Vega J., Dormido-Canto S., López J.M., Murari A., AIAI 2012 Workshops (Halkidiki, Greece), IFIP AICT vol
Ramı́rez J.M., Moreno R., Ruiz M., Alves D., Felton R. and 382 (Berlin: Springer) pp 182–91
JET-EFDA Contributors 2013 Results of the JET real-time [37] Vovk V., Gammerman A. and Saunders C. 1999 Machine
disruption predictor in the ITER-like wall campaigns learning applications of algorithmic randomness Proc. 16th
Fusion Eng. Des. 88 1228–31 Int. Conf. on Machine Learning (Bled, Slovenia) vol 2 (San
[11] de Vries P.C., Johnson M.F., Segui I. and JET-EFDA Francisco, CA: Morgan Kaufmann) pp 444–53
Contributors 2009 Nucl. Fusion 49 055011 [38] Saunders C., Gammerman A. and Vovk V. 1999 Transduction
[12] Vovk V., Gammerman A. and Shafer G. 2005 Algorithmic with confidence and credibility Proc. 16th Int. Joint Conf.
Learning in a Random World (Berlin: Springer) on Artificial Intelligence (Bled, Slovenia) vol 2 (San
[13] Bishop C.M. 2004 Neural Networks for Pattern Recognition Francisco, CA: Morgan Kaufmann) pp 722–6
(Oxford: Oxford University Press) [39] Papadopoulos H. 2013 Reliable probabilistic classification
[14] Cherkassky V. and Mulier F. 2007 Learning from Data 2nd with neural networks Neurocomputing 107 59–68
edn (New York: Wiley) [40] Nouretdinov I. et al 2012 Multiprobabilistic Venn predictors
[15] Duda R.O., Hart P.E. and Stork D.G. 2001 Pattern with logistic regression AIAI 2012 Workshops (Halkidiki,
Classification 2nd edn (New York: Wiley-Interscience) Greece), IFIP AICT vol 382 (Berlin: Springer) pp 224–33
16
Nucl. Fusion 54 (2014) 123001 J. Vega et al
[41] Dashevskiy M. and Luo Z. 2008 Reliable probabilistic (Nara, Japan, 26–30 May 2014)
classification and its application to internet traffic ICIC 2008 http://rt2014.rcnp.osaka-u.ac.jp/AbstractsRT2014.pdf
(Shanghai, China) (Lecture Notes in Computer Science vol [44] Ho S.-S. and Wechsler H. 2003 Transductive confidence
5226) ed D.-S. Huang et al (Heidelberg: Springer) pp 380–8 machine for active learning Proc. Int. Joint Conf. on Neural
[42] de Vries P.C. et al and JET EFDA Contributors 2012 Networks (Portland, OR, 2003) vol 2 pp 1435–40
The impact of the ITER-like wall at JET on disruptions [45] Ho S.-S. and Wechsler H. 2008 Query by transduction IEEE
Plasma Phys. Control. Fusion 54 124032 Trans. Pattern Anal. Mach. Intell. 30 1557–71
[43] Acero A., Vega J., Dormido-Canto S., Guinaldo M., Murari A. [46] Balasubramanian V.N., Chakraborty S. and Panchanathan S.
and JET-EFDA Contributors 2014 Assessment of 2009 Generalized query by transduction for online active
probabilistic Venn machines as real-time disruption learning 12th IEEE Int. Conf. on Computer Vision
predictors from scratch: application to JET with a view on Workshops (Kyoto, Japan, 2009) pp 1378–85
ITER Conf. Record of the 19th IEEE Real-Time Conf.
17