Professional Documents
Culture Documents
Towards Big Industrial Data Mining Through Explainable Automated Machine Learning
Towards Big Industrial Data Mining Through Explainable Automated Machine Learning
Research Article
Keywords: Machine learning, AutoML, Explainable AI, Data analysis, Decision-support systems, Industry
4.0
DOI: https://doi.org/10.21203/rs.3.rs-755783/v1
License: This work is licensed under a Creative Commons Attribution 4.0 International License.
Read Full License
Version of Record: A version of this preprint was published at The International Journal of Advanced
Manufacturing Technology on February 10th, 2022. See the published version at
https://doi.org/10.1007/s00170-022-08761-9.
Noname manuscript No.
(will be inserted by the editor)
Abstract Industrial systems resources are capable of based on the analysis of more than 360 databases from
producing large amount of data. These data are often the subjected field.
in heterogeneous formats and distributed, yet they pro-
Keywords Machine learning · AutoML · Explainable
vide means to mine the information which can allow the
AI · Data analysis · Decision-support systems ·
deployment of intelligent management tools for produc-
Industry 4.0
tion activities. For this purpose, it is necessary to be
able to implement knowledge extraction and prediction
processes using Artificial Intelligence (AI) models but 1 Introduction
the selection and configuration of intended AI models
tend to be increasingly complex for a non-expert user. Data driven decision-making may be defined by the set
In this paper, we present an approach and a software of practices aiming to make decisions based on data
platform that may allow industrial actors, who are usu- analysis rather than on intuitive insights [1]. The busi-
ally not familiar with AI, to select and configure algo- ness entities that have deployed data-driven decision-
rithms optimally adapted to their needs. Hence, the making activities have been observed as more profitable
approach is essentially based on automated machine and productive compared to the traditional ones [2].
learning. The resulting platform effectively enables a Nowadays, decision making tools are mainly based on
better choice among the combination of AI algorithms results of the current AI research works [3]. The suc-
and hyper-parameter configurations. It also makes it cess of AI based tools is mainly due to the advances
possible to provide features of explainability of the re- in machine learning approaches [4]. This is particularly
sulting algorithms and models, thus increasing the ac- stimulated by the availability of large datasets concern-
ceptability of these models in practicing community of ing various real-world features [3] and also through the
the users. The proposed approach has been applied in increase of the computational gains which are generally
the field of predictive maintenance. Current tests are attributed to the powerful GPU cards [5].
The manufacturing area is one of those generat-
ing huge amounts of data gathered by means of Cy-
Moncef Garouani ber Physical System (CPS) devices. The availability of
E-mail: moncef.garouani@etu.univ-littoral.fr such data combined with the knowledge of manufactur-
1
ing experts may be an opportunity to build AI based
Univ. Littoral Cote d’Opale, UR 4491, LISIC, Laboratoire
d’Informatique Signal et Image de la Cote d’Opale,F-62100
processes and models providing high value insights and
Calais, France assets for decision makers. Nevertheless, building such
processes and models requires AI and data science skills
2
CCPS Laboratory, ENSAM, University of Hassan II, and expertise that are not always available in the man-
Casablanca, Morocco
ufacturing area workbenches and laboratories.
3
Study and Research Center for Engineering and Manage- The work shown in this paper aims to bridge the gap
ment(CERIM), HESTIM, Casablanca, Morocco between AI expertise and manufacturing experts. We
believe that automated machine learning (AutoML) [6]
1, 2, 3
2 Moncef Garouani et al.
is one of the most powerful approaches that can effec- run-time) as much as possible, which can result as bar-
tively deal with this problem; notably the ones pro- riers of the AutoML effectiveness.
posed in [7–10]. Instead of searching appropriate Ma- It is therefore a preliminary objective of the current
chine Learning (ML) algorithms and manually tuning work to make the outcome from such well-performing
the hyper-parameters in a virtually infinite space, the AutoML systems transparent, interpretable and self-
AutoML leads to automatically and iteratively config- explainable. This shall make AutoML support systems
ure the appropriate hyper-parameters of multiple ma- more reliable and operational through a set of different
chine learning algorithms in a virtually infinite space, visual summary levels of the provided models and con-
in order to optimize these parameters for a predefined figurations. It may render the AutoML system more
search space. transparent and controllable, hence increasing its ac-
The use of AutoML approaches has gained sig- ceptance.
nificant attention in research community. A plethora In the current work, we attempt a transparent and
of systems for AutoML, such as Auto-sklearn [7], auto-explainable AutoML system for recommending
TPOT [8],Auto-WEKA [9], ATM [10], and Google the most adequate ML configuration for a given prob-
Cloud AutoML1 have been developed in recent years. lem and explain the rationale traceability behind a rec-
The interest of building complex AI models that ommendation. It may further allow to analyze the pre-
are able to achieve unprecedented performance levels dictive results in an interpretable and reliable manner.
has been gradually replaced by a growing concern for In the proposed approach, the end users can explore the
alternative design factors leading to an improved us- AutoML process at different levels, such as described in
ability of the resulting tools. Indeed, in a manifold of the following :
application areas, complex AI models become of limited
practical utility [11]. The major reason lies on the fact – The AutoML-oriented level (i.e. exploring the Au-
that AI models are often designed to focus the per- toML process from recommendation to refinement).
formance factors, thus leaving aside other important – The Data-oriented level (i.e. exploring data proper-
and even sometimes the crucial aspects such as confi- ties through different visualization levels).
dence, transparency, fairness or accountability. The ab- – The Model-oriented level (i.e. exploring the models
sence of explanation for predicted performing factors provided by the AutoML system (e.g. model perfor-
make the AI models usually black boxes, which only mance, what-if-analysis, decision path, etc.)).
allows the prominent exhibition of input and output The system consists of two integrated modules : the
parameters but conceal the visibility of inherent asso- Automated Machine Learning module and the Auto-
ciations among them. It is more preferably desired to mated Machine Learning explainer (AMLExplainer).
avoid such lack of transparency in real-life applications However, the later, AMLExplainer module is not sys-
such that in industrial manufacturing processes. Since, tem or algorithm specific; it is inter-operable with a va-
these applications may imply critical decision choices, riety of AutoML frameworks. The main contributions
it is favorable to have some justifications of individ- of this work are summarized as follows:
ual predictions which are perceived trough an AI algo-
rithm, more particularly, in an automated environment. 1. Develop a premier transparent and self-explained
Therefore acceptance of, and the trust in, an AutoML AutoML system.
system is highly dependent on the transparency of the 2. Provide an assisted traceability of the reasoning be-
recommendations. hind the AutoML recommendation generation pro-
Because of the lack of transparency in AutoML sys- cess.
tems as Decision Support Systems (DSS), users tend to 3. Develop a module that can explain the predictions
question the validity of automatic results, such that : of any recommendation through linked visual sum-
did the AutoML run long enough? Did the AutoML miss mary and/or textual information.
some suitable models? Did the AutoML sufficiently ex- 4. Provide a multi-level interactive visualization tool
plore the search space? Did the recommended configu- that facilitate the model operation and performance
ration over or under fit?, etc. Such queries may cause inspection to address the ”trusting the model”.
reluctance for users to apply the results of AutoML in 5. Provide a reliable guidance, when AutoML returns
more critical situations [12]. Meanwhile, when AutoML unsatisfying results in order to improve the expected
provides unsatisfactory results, users are unable to rea- performances by assessing the importance of an al-
son and thus cannot improve the obtained results. They gorithm hyper-parameters.
may only increase the computational budget (e.g., the
The rest of the paper is organized as follows : the
1
https://cloud.google.com/automl section 2 discusses the closely related works in respect
Towards big industrial data mining through explainable automated machine learning 3
of ML-based data analytics solutions and the need for spite its countless benefits and advances, building a ma-
transparency to gain trust in AI models. The section 3 chine learning pipeline is still a challenging task, partly
briefly describes the different types of explanations and because of the difficulty in manually selecting an effec-
also their respective information content with their use tive combination of an algorithm and hyper-parameters
in practice. The section 4 provides an overview of the values for a given task or problem.
proposed framework and discusses how these compo- Owing to the development of open source ML pack-
nents collaborate to achieve the pursued goals. Finally, ages and the active research in the ML field, there are
the contents of this paper are concluded in section 5 dozens of machine learning algorithms, where each ma-
along with outlines of future perspectives. chine learning algorithm has two types of model param-
eters : (1) ordinary parameters that are automatically
optimized or learned during the model training phase;
2 Related works (2) and hyper-parameters (categorical and continuous)
that are typically set by the user manually before the
The available literature testifies a considerable progress training of the model (as shown in Table 1).
in respect of automated machine learning method-
ologies and their application in multi-disciplinary ar-
eas. We have been attentively studying some of the Table 1 Configuration space of some classification algo-
most relevant works, particularly the recommendation- rithms.
based decision support systems in manufacturing indus- ML Algorithm Number of Number of
try. We particularly considered the works dealing with Ordinary Hyperparame-
transparency and explainability of automated machine parame- ters
learning and those related to big industrial data mining. ters
Through the advances on high-tech sensing and the Support Vector Machine 2 5
widespread use of applications such as electronic manu- Decision Tree 1 3
Random Forest 2 4
facturing records, mobile sensors, and Industrial Inter-
Logistic Regression 4 6
net of Things (IIoT) tools, manufacturing data are be-
ing accumulated at an exponentially growing rate an-
nually. According to a recent research report [13], the
To achieve the desired performance for a particular
global big data market size will grow from USD 138.9
problem, users typically try a set of models and config-
billions in 2020 to USD 229.4 billion by 2025 (increase
urations based on their understanding of the algorithms
from Petabytes to Exabytes). Machine learning is a key
and their observation of the data since there is no algo-
technology to transform large manufacturing data sets,
rithm that performs well on all possible problems (i.e.,
or ”big industrial data”, into actionable knowledge.
No Free Lunch [17]). Then, based on the feedback about
In this section, we observe the limitations of avail-
how the learning tools performed, the practitioner may
able research work on explainable AI /AutoML systems
adjust the configuration to verify if the performance can
as DSS that primarily motivate the work described
be improved. Such a trial-and-error process terminates
in this paper. The current literature, in this regard
once a desired performance is achieved or the compu-
can be generally summarized in form of an overlapped
tational budget runs out (as shown in Figure 1).
overview of three major research areas, as described in
the following subsections.
2.2 Automated machine learning 2.3 The need for Transparency to Trust in AI and in
AutoML
The Automated Machine Learning or AutoML [18] field
is among the rapidly emerging sub-fields of ML that at-
tempts to address the theoretical and algorithmic chal-
lenges in order to fully automate the ML process. It Black-box AI systems have been used in various ar-
addresses also the development and deployment of sys- eas. Their implication in critical domains, like in power
tems in this regard. AutoML has two main goals : (1) consumption forecasting or supply chain management
democratizing the application of ML to non-experts of to analyze the brands trends or consumer sentiments;
data analysis by providing them with “off the shelf” usually have less focus to consider the quality features
solutions, and (2) enabling the knowledge practitioners such as transparency and explainability rather con-
to save time and effort. sidering more importantly the system’s overall perfor-
mance. However, even if these systems fail, e.g., the
Over the last few years, a plethora of AutoML sys- Quality Control system is mostly not able to detect the
tems have been developed providing partial or com- failure, the Equipment Failure Prevention system are
plete ML automation, such as Auto-sklearn [7], TPOT less expected to identify the exact cause of failure and
[8],Auto-WEKA [9], ATM [10], as well as commercial generally produces false or inaccurate predictions. The
systems such as Google AutoML1 , RapidMiner2 , Dar- consequences are rather underwhelming. In industrial
winAI3 , and DataRobo4 . These tools range from au- critical applications, the situations are different where
tomatic data preprocessing [19, 20], automatic feature the lack of transparency of ML techniques can be a
engineering [21, 22] to automatic model selection [18, disqualifying factor, if not limited. Specifically, a single
23] and automatic hyper-parameters tuning [24, 25]. wrong decision can be highly risked to put in danger the
Some approaches attempt to automatically and simul- entire production line (e.g., failure of a critical unit) and
taneously choose a learning algorithm and optimize its can cause significant financial deprivations (e.g., prod-
hyper-parameters. These approaches are also known as uct conformity). It is therefore, relying on an incompre-
Combined Algorithm Selection and Hyper-parameters hensible black-box data-driven system would not be the
optimization problem (CASH) [7–9, 25–28]. Table 2 best option. The lack of transparency is among the most
shows a comparison among some of the most popu- relevant reasons to question the adoption of AI models
lar AutoML tools, in terms of training framework, sup- in manufacturing industry. The stakeholders are more
ported ML tasks, automatic features engineering, user cautious than doing so in the consumer entertainment,
interface and process transparency. or e-commerce industries.
The interest in developing complex AI models, de-
spite their revolutionary characteristics and capabili-
ties of achieving unprecedented levels of performance Explaining the reasoning behind one’s decisions or
has been progressively perishing. The loss of sustain- actions is an important part of human interactions
ability is mainly concerned with alternative design fac- in the social dimension [30]. As the explanations help
tors particularly to make such models more usable in to build trust in human-to-human relationships, sim-
practice [3]. These systems emphasize on useful assis- ilarly, these should also be part of human-to-machine
tance but their usability is greatly limited to provide de- interactions [3]. In this work, we investigate the contri-
tailed analysis about the recommended configurations butions and feasibility of a process designed to make
and the insightful working of the black-box models [12]. such powerful DSS transparent, interpretable and self-
The reason lies on the fact that AI models are often explainable to foster trust, both in situations where
designed with performance as their only design goal, the AI system has a supportive role (e.g., production
thus ignoring other important matters such as privacy planning) and in those where it provides directions and
awareness, confidence, transparency, and accountabil- decision-making (e.g., Quality Control, predictive main-
ity makes them usually untrustworthy black-boxes [12, tenance or autonomous driving). In the former cases,
29]. explanations provide extra information, which help the
human in the loop to gain an overall view of the situa-
tion or the problem at hand in order to take decisions.
It is similar to an expert who has to provide a detailed
2
https://rapidminer.com report explaining his/her findings, a supportive AI sys-
3
https://darwinai.com/ tem should explain the decisions in detail instead of
4
thttps://www.datarobot.com/ providing only a prediction or a decision.
Towards big industrial data mining through explainable automated machine learning 5
Table 2 Summary of the main features of some state-of-the-art AutoML tools. (*): commercialized tools.
Table 3 Properties of XAI state of the art tools. Level is the interpretability coverage: local or global. Dependency specifies
necessary inputs.
Level Dependency
XAI method Require pre-configuration
Local Global Data Model
LIME[12] • ◦ ? ◦ •
ANCHORS[33] • ◦ • ◦ •
Node-Link Vis[34] • • ◦ • •
SHAP [35] • • ◦ ◦ •
DeepTaylor[36] • ◦ • • •
Occlusion [37] • ◦ • • •
AMLExplainer • • ◦ ◦ ◦
Fig. 3 The global architecture of the proposed white-box AutoML (AMLBID and AMLExplainer).
manufacturing levels generating more than 3 millions 4.1.1 The learning phase
of different ML configurations (pipelines). Each pipeline
consists of a choice of a machine learning model and During the learning phase, we evaluate different clas-
its hyperparameters configuration. By exploring the in- sification algorithms with multiple hyper-parameters
teractions between datasets and pipelines topology, the configurations on a large collection of various datasets.
system is able to identify effective pipelines without per- Then, we generate meta-features that are used to train
forming computationally expensive analysis. a meta-model able to recommend promising classifica-
tion pipelines for a given dataset and performance met-
ric. The entire training phase is illustrated in Figure 4.
Building a meta-learning based system to deal with
the algorithms selection and configuration problem re-
quires a meta-knowledge base for the learning process.
This involves collecting datasets, choosing machine
learning algorithms, extracting meta-features (datasets
and pipelines characteristics), and determining the per-
formance of the algorithm configurations according
to different evaluation measures (e.g. Accuracy, preci-
sion, Recall). Though, when a new problem is pre-
sented to the system, the meta-features are extracted,
and a recommendation mechanism that makes use of
the meta-knowledge base provides the ranking of the
pipeline(s) (algorithms and configurations) for the un-
seen problem according to the desired performance
measure.
are collected from the popular UCI5 , OpenML6 , Kag- description of the algorithms and their tuned hyper-
gle7 and KEEL8 repositories, among other real world parameters are described on the Table A1-A7 in the
scenarios. These datasets cover various tasks with re- Appendix.
spect to their size, the number of attributes, their com- For each run of a classifier C over a dataset D, we
position and class imbalance. Datasets characteristics generated 1000 different combinations of their hyper-
are illustrated in Table 4. parameters configurations. This process resulted in an
average of 8000 pipelines per dataset. In particular, for
each classifier, we have generated a list of all possible
Table 4 Datasets’s dimensions.
and reasonable combinations where we conducted, for
Number of Number of Number of each dataset, a random search among them [42].
classes attributes instances
During the training phase, we used a 5-fold strat-
Min 2 5 1800 ified cross-validation strategy to construct our meta-
Max 18 1000 105908
datasets. As a result, our knowledge base consists of
more than 3 millions evaluated classification pipelines.
It is noted that due to the different number of algo-
rithms hyper-parameters, not every algorithm had the
same number of configurations / evaluations.
The Meta-features : The meta-learning core
paradigm is to relate the performance of learning
Measures : As part of our core idea, we aim to recom-
algorithms and configurations to data character-
mend high-performing ML pipelines for a given combi-
istics (Meta-features). Meta-features are common
nation of datasets and evaluation measure. The point
characteristics of several problems, and their aim is to
that most of state-of-the-art systems do not take into
identify structural similarities and differences among
account, the proposed system supports various classi-
problems. These characteristics can be divided into
fication performance measures to evaluate the perfor-
three categories [28] :
mance of the ML pipelines (ML algorithms and related
Simple : based on general measures, such as the num- hyper-parameters configuration). Table 5 shows sup-
ber of instances, attributes and classes, dataset di- ported measures details.
mensionality. They are designed to some extent to
measure the complexity of the underlying problem. 4.1.2 The recommending phase
Statistical : based on statistical measures obtained
from the dataset attributes, such as means, stan- The recommending phase is initiated when a new
dard deviation, class entropy, and correlations, etc. dataset to be analyzed occurs. At this point, the user se-
Landmark : that characterize the extent of datasets lects a predictive analytic metric to be used for the anal-
when basic machine learning algorithms (with de- ysis (e.g. Accuracy, Recall, F1 score), and then the sys-
fault configuration) are performed on them. The tem automatically recommends a set of machine learn-
used landmark characteristics in our system in- ing algorithms and their related hyper-parameters con-
clude performance of the linear discriminant anal- figuration to be applied, such that the predictive per-
ysis (LDA), Gaussian Naive Bayes (GNB), Decision formance is the first-rate. To do so, the system first, ex-
Trees (DT), Gaussian Naive Bayes (GNB) and the tracts the dataset characteristics (meta-features). Then,
K-Nearest Neighbor (KNN) landmarks. the extracted meta-features are fed to the meta-model
to provide the candidate pipelines. Finally, the sug-
The pipelines generation : To build the Meta- gestion engine, according to the meta-knowledge base,
knowledge base, we used 08 classifiers from the pop- ranks the pipelines in respect to the provided analytic
ular Python-based machine learning library, Scikit- metric. The recommending process is shown in figure 5.
learn. These classifiers are AdaBoost, Support Vector
Classifier (SVC), Extra Trees, Gradient Boosting, De- Meta-model : After having generated a meta-dataset
cision Tree, Logistic Regression, Random Forest, and with all the necessary metadata, the goal is to build a
Stochastic Gradient Descent (SGD) classifiers. Detailed predictive meta-model that can learn the complex re-
5
https://archive.ics.uci.edu/
lationship between a task meta-features and the utility
6
https://www.openml.org/ of specific ML pipelines to recommend the most useful
7
https://www.kaggle.com/ configuration Θnew given the meta-features M of a new
8
https://sci2s.ugr.es/keel/ task tnew .
Towards big industrial data mining through explainable automated machine learning 9
model clearly performs better than the random for- than those of the baselines even though it does not run
est classifier based meta-learner according to the ac- any pipeline on the dataset prior to the recommenda-
curacy metric. tion.
4.1.3 The evaluation of robustness Table 7 Number of datasets for each baseline AutoML sys-
tem for which the optimal algorithm configuration has been
In this evaluation, we investigate the performance that recommended.
can be achieved by using the proposed recommending System Number of Datasets with Top Performance
module on various manufacturing related problems. We
Autosklearn 3 (10%)
evaluate its ability to predict the ML algorithms with
TPOT 6 (20%)
related hyper-parameters configurations that shall pro- AMLBID 21 (70%)
vide the best result of the analysis. We benchmark on a
highly varied selection of datasets (30 datasets) cover-
ing binary and multi-class classification problems from
different industry 4.0 levels with a sample size from As state of the art systems support only the predic-
1000 to 90000 instance (this is a common sample size tive accuracy as the performance measure of the rec-
for general real-world datasets). ommended configuration, further robustness compari-
The proposed system uses the meta-model to predict son on different performance measures such Recall, F1
all the pipelines in the meta-knowledge base with re- score and Precision could not be done. Whereas in some
spect to the analyzed dataset and then returns its top- concrete cases, Recall or Precision may be more im-
ranked pipelines according to the provided performance portant and informative than the predictive accuracy.
criteria. The top-ranked pipelines are then fitted on the Therefore, the proposed system is the first AutoML sys-
datasets that was split into train and test sets using a tem to support different predictive performance mea-
70% / 30% ratio. The results of this evaluation were sures (Precision, Recall, Accuracy and F1score).
used to compare the performances of AMLBID to those Beyond their black box nature, one of the most
of the TPOT and Auto-sklearn state-of-the-art frame- shortcomings of AutoML solutions is their computa-
works. It is important to stress again, that while the tional complexity, requiring a huge budget of time and
majority of state-of-the-art frameworks evaluate a set of resources. On the contrary, AMLBID has the advantage
pipelines by running them on the dataset at hand before of the O(1) computational complexity, generating the
the recommendation step, which demands considerable recommendation in a negligible amount of time. Table 8
computational budget that is not always available and presents the running time of AMLBID, TPOT and Au-
takes a huge amount of time in the majority of cases, tosklearn on the benchmarked datasets.
making them unpractical solutions in real world prob-
lems as in the industrial ones, AMLBID immediately
produces a ranked list of potential top pipelines con-
figurations using its meta-model and meta-knowledge
base at an imperceptible computational cost (in term
of time and computational resources). The evaluation
results are presented in Table 6.
Table 8 The running time of AMLBID and the baseline Au- distribution. As shown in (Figure 8), a detailed ex-
toML frameworks on the benchmark datasets. The running planation about the top performed recommendation
time format is HH:MM:SS.
is generated through multiple granularity levels,
Dataset Ds size AMLBID TPOT Autosklearn such as statistics about the configuration perfor-
[43] 95 00:00:05 00:08:14 01:23:47 mances (Figure 8(A)), and a tree based explanation of
[44] 61000 00:05:29 03:42:09 04:19:05 the conducted predictions (Figure 8(B)).
[45] 2000 00:00:12 00:13:57 01:49:21 By providing intelligible explanations about the pro-
APS Failure 60000 00:05:39 05:23:35 03:58:39 cess and reasoning behind an individual prediction, as
Higgs 110000 00:06:16 05:43:24 07:37:55
CustSat 76020 00:04:06 04:09:36 05:07:03 illustrated in the Figure 8, it is clear that the decision-
maker whether a manufacturing engineer or a machine
learning practitioner is much better positioned to make
4.2 The explainer module decisions since He / She usually have prior knowledge
about the data and the application domain, which can
AMLExplainer is implemented as a client-server tool in- use to trust in and accept or reject a prediction if the
tegrated with the recommender module. The server co- reasoning behind it is well explained.
ordinates as an AutoML support system. As the client, The What-if analysis-level View : (Figure 9) is de-
the visual interface provides graphical interaction with signed to investigate the machine learning models. It
AutoML results and maps the summary data for visu- enables understanding models by enabling end users to
alization through a set of different visual summary lev- investigate attribution values for individual input fea-
els of the recommended models. AMLExplainer users tures in relation to model predictions. Explaining the
are allowed to explore the models provided by the Au- inner working of the model helps to gain an understand-
toML process at four main levels of detail (i.e. Au- ing of what the model does and does not do. This is im-
toML Overview, Recommendation-level View, What-if portant so that they can gain an intuition for when the
analysis-level View, and Refinement-level View). Mean- model is likely missing information and may have to be
while, AMLExplainer provide end users with a guid- overruled. Therefore explore scenarios, test, and evalu-
ance, when AutoML returns unsatisfying results, to im- ate / validate business assumptions, and gain intuition
prove the predictive performances. Thence increases the for modification.
transparency, controllability, and the acceptance of Au- The refinement-level View : (Figure 10) shows the
toML. The tool documentation and a detailed list of correlation between performances and hyperparameters
features with an illustrative example is available in the of a recommended algorithm. To accomplish that, we
github repository 9 . takes as input performances data gathered with dif-
The workflow of the proposed auto-explanatory Au- ferent hyperparameter settings of the algorithm (from
toML system consists of two main components : the recommender module’s knowledge-base) , fits a ran-
dom forest to capture the relationship between hyper-
– The AutoML component, which shows the high-
parameters and performances, and then we apply func-
level of the AutoML process from recommendations
tional Analysis of variance (ANOVA [46]) to assess how
to refinements.
important each of the hyperparameters and each low-
– The recommended configuration component, that
order interaction of hyperparameters is to performance.
allows users to inspect the recommended model’s
Guided by this in-depth analysis, end users have a guid-
inner working and decision’s generation pro-
ance, when AutoML returns unsatisfying results, to im-
cess (include the Recommendation-level and What-
prove to predictive performances.
if analysis-level views).
The AutoML Overview : the AutoML overview
level (Figure 7) summarizes high-level information of 4.3 Demonstration test case : Application to
the AutoML process. Users will be able to compare and Manufacturing Quality Prediction
choose between the top K recommended configurations.
Our auto-explainer module works for any predictive
They can focus their analysis on the top model config-
modeling problem where machine learning is used. As
uration on the next level view, which highlights the
our work’s goal is to show the feasibility to achieve
corresponding algorithm in the detail views.
maximum possible performance for a specific predictive
The recommendation-level View : the
modeling problem and automatically explaining results
recommendation-level view enables users to in-
for any machine learning predictive model, we evalu-
spect recommendations with respect to performance
ated our automatic explanation method on a Manufac-
9
https://github.com/LeMGarouani/AMLBID turing Quality Prediction use case. The data contains
1, 2, 3
12 Moncef Garouani et al.
187.156 historical 1-year records of a production unit. module and the explanatory one). Based on their feed-
Among these records, 74.39% were diagnosed as com- back, we summarize two main appreciation of the pro-
pliant products. posed tool:
There has been significant progress in democratizing Consent for publication All authors agree to publish this
the application of ML to non-experts of data analy- article in the International Journal of Advanced Manufactur-
ing Technology.
sis by providing them with ”off the shelf” solutions.
However, these powerful support systems fail to provide Conflict of interest The authors declare that they have no
detailed instructions about the recommended configu- conflict of interest.
rations and the inner working of these models, thence
making them less trustworthy highly performant black-
boxes. In this work, we present a novel transparent and References
self-explained AutoML system along with an interac-
tive visualization module that supports machine learn- [1] F. Provost and T. Fawcett. “Data Science and Its
ing experts and neophytes in analyzing the automatic Relationship to Big Data and Data-Driven Deci-
results of an AutoML DSS. sion Making”. In: Big Data 1.1 (2013), pp. 51–59.
To our knowledge, the proposed system is the first doi: 10.1089/big.2013.1508.
application of the general explanation methods of Au- [2] E. Brynjolfsson, L. M. Hitt, and H. H. Kim.
toML systems as decision support systems. We explore Strength in Numbers: How Does Data-Driven De-
several levels of explanations, ranged from individual cisionmaking Affect Firm Performance? SSRN
decisions to the entire model’s recommendations and Scholarly Paper ID 1819486. Rochester, NY: So-
predictions. The explanations of the prediction models cial Science Research Network, 2011. doi: 10 .
and what-if analysis proved to be an effective support 2139/ssrn.1819486.
for manufacturing related problems. A set of evalua- [3] W. Samek and K.-R. Müller. “Towards Explain-
tions demonstrate the utility and usability of AMLBID able Artificial Intelligence”. In: Explainable AI:
in a real-world manufacturing problem. We show how Interpreting, Explaining and Visualizing Deep
powerful black-box ML systems could be made trans- Learning. Ed. by W. Samek, G. Montavon, A.
parent and help domain experts to iteratively evalu- Vedaldi, L. K. Hansen, and K.-R. Müller. Lecture
ate and update their beliefs. Based on the promising Notes in Computer Science. Cham: Springer In-
findings presented in this paper, further validation of ternational Publishing, 2019, pp. 5–22. doi: 10.
the proposed framework in other real-world applica- 1007/978-3-030-28954-6.
tions with a larger and more diverse group of users shall [4] Y. LeCun, Y. Bengio, and G. Hinton. “Deep
improve the visualization and presentation of explana- Learning”. In: Nature 521.7553 (2015), pp. 436–
tions. We plan to provide the proposed system as an 444. doi: 10.1038/nature14539.
open source python package, which we are currently in [5] E. Lindholm, J. Nickolls, S. Oberman, and J.
process of publishing. Montrym. “NVIDIA Tesla: A Unified Graphics
and Computing Architecture”. In: IEEE Micro
28.2 (2008), pp. 39–55. doi: 10.1109/MM.2008.
Acknowledgements The authors would like to thank all
the participants involved in the system evaluation for their 31.
constructive discussions and valuable suggestions. [6] M. V. Nural, H. Peng, and J. A. Miller. “Using
Funding This work is supported, in part, by School of en- Meta-Learning for Model Type Selection in Pre-
gineering’s and business’ sciences and technics (HESTIM), dictive Big Data Analytics”. In: 2017 IEEE In-
Casablanca-Morocco, CNRST Morocco, and the Université ternational Conference on Big Data (Big Data).
du Littoral Côte D’Opale, Calais France.
2017, pp. 2027–2036. doi: 10 . 1109 / BigData .
Availability of data and materials All data generated or 2017.8258149.
analyzed during this study are included in this paper.
[7] M. Feurer, A. Klein, K. Eggensperger, J. T. Sprin- and Sales), Industry Vertical (BFSI, Manufactur-
genberg, M. Blum, and F. Hutter. “Efficient and ing, and Healthcare and Life Sciences), and Re-
Robust Automated Machine Learning”. In: Pro- gion - Global Forecast to 2025.
ceedings of the 28th International Conference on [14] M. Cuartas, E. Ruiz, D. Ferreño, J. Setién, V. Ar-
Neural Information Processing Systems - Volume royo, and F. Gutiérrez-Solana. “Machine Learn-
2. NIPS’15. Cambridge, MA, USA: MIT Press, ing Algorithms for the Prediction of Non-Metallic
2015, pp. 2755–2763. Inclusions in Steel Wires for Tire Reinforce-
[8] R. S. Olson and J. H. Moore. “TPOT: A Tree- ment”. en. In: Journal of Intelligent Manufactur-
Based Pipeline Optimization Tool for Automat- ing (2020). doi: 10.1007/s10845-020-01623-9.
ing Machine Learning”. In: Automated Machine [15] R. Medina, J. C. Macancela, P. Lucero, D. Cabr-
Learning: Methods, Systems, Challenges. Ed. by era, R.-V. Sánchez, and M. Cerrada. “Gear and
F. Hutter, L. Kotthoff, and J. Vanschoren. The Bearing Fault Classification under Different Load
Springer Series on Challenges in Machine Learn- and Speed by Using Poincaré Plot Features and
ing. Cham: Springer International Publishing, SVM”. en. In: Journal of Intelligent Manufactur-
2019, pp. 151–160. doi: 10.1007/978- 3- 030- ing (2020). doi: 10.1007/s10845-020-01712-9.
05318-5_8. [16] A. Jalali, C. Heistracher, A. Schindler, B. Hasl-
[9] L. Kotthoff, C. Thornton, H. H. Hoos, F. Hut- hofer, T. Nemeth, R. Glawar, W. Sihn, and P.
ter, and K. Leyton-Brown. “Auto-WEKA: Auto- De Boer. “Predicting Time-to-Failure of Plasma
matic Model Selection and Hyperparameter Op- Etching Equipment Using Machine Learning”. In:
timization in WEKA”. In: Automated Machine 2019 IEEE International Conference on Prog-
Learning: Methods, Systems, Challenges. Ed. by nostics and Health Management (ICPHM). 2019,
F. Hutter, L. Kotthoff, and J. Vanschoren. The pp. 1–8. doi: 10.1109/ICPHM.2019.8819404.
Springer Series on Challenges in Machine Learn- [17] D. Wolpert and W. Macready. “No free lunch the-
ing. Cham: Springer International Publishing, orems for optimization”. In: IEEE Transactions
2019, pp. 81–95. doi: 10 . 1007 / 978 - 3 - 030 - on Evolutionary Computation 1.1 (1997), pp. 67–
05318-5_4. 82. doi: 10.1109/4235.585893.
[10] T. Swearingen, W. Drevo, B. Cyphers, A. Cuesta- [18] M. Reif, F. Shafait, M. Goldstein, T. Breuel, and
Infante, A. Ross, and K. Veeramachaneni. “ATM: A. Dengel. “Automatic Classifier Selection for
A Distributed, Collaborative, Scalable System Non-Experts”. In: Pattern Analysis and Appli-
for Automated Machine Learning”. In: 2017 cations 17.1 (2014), pp. 83–96. doi: 10 . 1007 /
IEEE International Conference on Big Data s10044-012-0280-z.
(Big Data). 2017, pp. 151–162. doi: 10 . 1109 / [19] B. Bilalli, A. Abelló, T. Aluja-Banet, and R.
BigData.2017.8257923. Wrembel. “Automated Data Pre-Processing via
[11] F. Xu, H. Uszkoreit, Y. Du, W. Fan, D. Zhao, Meta-Learning”. In: Model and Data Engineer-
and J. Zhu. “Explainable AI: A Brief Survey on ing. Ed. by L. Bellatreche, Ó. Pastor, J. M. Al-
History, Research Areas, Approaches and Chal- mendros Jiménez, and Y. Aı̈t-Ameur. Lecture
lenges”. In: Natural Language Processing and Notes in Computer Science. Cham: Springer In-
Chinese Computing. Ed. by J. Tang, M.-Y. Kan, ternational Publishing, 2016, pp. 194–208. doi:
D. Zhao, S. Li, and H. Zan. Lecture Notes in Com- 10.1007/978-3-319-45547-1_16.
puter Science. Cham: Springer International Pub- [20] B. Bilalli, A. Abelló, T. Aluja-Banet, R. F. Mu-
lishing, 2019, pp. 563–574. doi: 10.1007/978-3- nir, and R. Wrembel. “PRESISTANT: Data Pre-
030-32236-6_51. Processing Assistant”. In: Information Systems
[12] M. T. Ribeiro, S. Singh, and C. Guestrin. “”Why in the Big Data Era. Ed. by J. Mendling and
Should I Trust You?”: Explaining the Predic- H. Mouratidis. Lecture Notes in Business Infor-
tions of Any Classifier”. In: Proceedings of the mation Processing. Cham: Springer International
22nd ACM SIGKDD International Conference on Publishing, 2018, pp. 57–65. doi: 10.1007/978-
Knowledge Discovery and Data Mining. KDD ’16. 3-319-92901-9_6.
New York, NY, USA: Association for Computing [21] U. Khurana, H. Samulowitz, and D. Turaga.
Machinery, 2016, pp. 1135–1144. doi: 10.1145/ “Feature Engineering for Predictive Modeling Us-
2939672.2939778. ing Reinforcement Learning”. In: arXiv e-prints
[13] R. a. M. ltd. Big Data Market by Component, 1709 (2017), arXiv:1709.07150.
Deployment Mode, Organization Size, Business [22] F. Nargesian, H. Samulowitz, U. Khurana, E. B.
Function (Operations, Finance, and Marketing Khalil, and D. Turaga. “Learning Feature Engi-
1, 2, 3
16 Moncef Garouani et al.
neering for Classification”. In: (2017), pp. 2529– tificial Intelligence”. In: Science Robotics 4.37
2535. (2019). doi: 10.1126/scirobotics.aay7120.
[23] R. Vainshtein, A. Greenstein-Messica, G. Katz, [32] D. Castelvecchi. “Can We Open the Black Box
B. Shapira, and L. Rokach. “A Hybrid Approach of AI?” In: Nature News 538.7623 (2016), p. 20.
for Automatic Model Recommendation”. In: Pro- doi: 10.1038/538020a.
ceedings of the 27th ACM International Con- [33] M. T. Ribeiro, S. Singh, and C. Guestrin. “An-
ference on Information and Knowledge Manage- chors: High-Precision Model-Agnostic Explana-
ment. CIKM ’18. New York, NY, USA: Associa- tions”. en. In: Proceedings of the AAAI Confer-
tion for Computing Machinery, 2018, pp. 1623– ence on Artificial Intelligence (2018).
1626. doi: 10.1145/3269206.3269299. [34] A. W. Harley. “An Interactive Node-Link Visu-
[24] M. Feurer, J. T. Springenberg, and F. Hutter. alization of Convolutional Neural Networks”. In:
“Initializing Bayesian Hyperparameter Optimiza- Advances in Visual Computing. Ed. by G. Bebis
tion via Meta-Learning”. In: Proceedings of the et al. Lecture Notes in Computer Science. Cham:
Twenty-Ninth AAAI Conference on Artificial In- Springer International Publishing, 2015, pp. 867–
telligence. AAAI’15. Austin, Texas: AAAI Press, 877. doi: 10.1007/978-3-319-27857-5_77.
2015, pp. 1128–1135. [35] S. M. Lundberg et al. “From Local Explanations
[25] B. Komer, J. Bergstra, and C. Eliasmith. to Global Understanding with Explainable AI
“Hyperopt-Sklearn: Automatic Hyperparameter for Trees”. In: Nature Machine Intelligence 2.1
Configuration for Scikit-Learn”. In: Proceedings (2020), pp. 56–67. doi: 10.1038/s42256- 019-
of the 13th Python in Science Conference (2014), 0138-9.
pp. 32–37. doi: 10 . 25080 / Majora - 14bd3278 - [36] G. Montavon, S. Lapuschkin, A. Binder, W.
006. Samek, and K.-R. Müller. “Explaining Nonlinear
[26] H. Jin, Q. Song, and X. Hu. “Auto-Keras: An Ef- Classification Decisions with Deep Taylor Decom-
ficient Neural Architecture Search System”. In: position”. en. In: Pattern Recognition 65 (2017),
Proceedings of the 25th ACM SIGKDD Inter- pp. 211–222. doi: 10.1016/j.patcog.2016.11.
national Conference on Knowledge Discovery & 008.
Data Mining. KDD ’19. New York, NY, USA: [37] M. D. Zeiler and R. Fergus. “Visualizing and Un-
Association for Computing Machinery, 2019, derstanding Convolutional Networks”. In: Com-
pp. 1946–1956. doi: 10.1145/3292500.3330648. puter Vision – ECCV 2014. Ed. by D. Fleet, T.
[27] M. Garouani., A. Ahmad., M. Bouneffa., A. Pajdla, B. Schiele, and T. Tuytelaars. Lecture
Lewandowski., G. Bourguin., and M. Hamlich. Notes in Computer Science. Cham: Springer In-
“Towards the Automation of Industrial Data Sci- ternational Publishing, 2014, pp. 818–833. doi:
ence: A Meta-learning based Approach”. In: Pro- 10.1007/978-3-319-10590-1_53.
ceedings of the 23rd International Conference [38] J. Müller, M. Stoehr, A. Oeser, J. Gaebel, M.
on Enterprise Information Systems - Volume 1: Streit, A. Dietz, and S. Oeltze-Jafra. “A Visual
ICEIS, INSTICC. SciTePress, 2021, pp. 709–716. Approach to Explainable Computerized Clinical
doi: 10.5220/0010457107090716. Decision Support”. en. In: Computers & Graphics
[28] M. Maher and S. Sakr. SmartML: A Meta 91 (2020), pp. 1–11. doi: 10.1016/j.cag.2020.
Learning-Based Framework for Automated Selec- 06.004.
tion and Hyperparameter Tuning for Machine [39] T. Spinner, U. Schlegel, H. Schäfer, and M. El-
Learning Algorithms. 2019. doi: 10.5441/002/ Assady. “explAIner: A Visual Analytics Frame-
edbt.2019.54. work for Interactive and Explainable Machine
[29] D. Shin and Y. J. Park. “Role of Fairness, Ac- Learning”. In: IEEE Transactions on Visual-
countability, and Transparency in Algorithmic ization and Computer Graphics 26.1 (2020),
Affordance”. en. In: Computers in Human Behav- pp. 1064–1074. doi: 10 . 1109 / TVCG . 2019 .
ior 98 (2019), pp. 277–284. doi: 10.1016/j.chb. 2934629.
2019.04.019. [40] Q. Wang, Y. Ming, Z. Jin, Q. Shen, D. Liu, M. J.
[30] R. L. Heath and J. Bryant. Human Communi- Smith, K. Veeramachaneni, and H. Qu. “ATM-
cation Theory and Research: Concepts, Contexts, Seer: Increasing Transparency and Controllability
and Challenges. English. 2nd edition. Mahwah, in Automated Machine Learning”. In: Proceed-
N.J: Routledge, 2000. ings of the 2019 CHI Conference on Human Fac-
[31] D. Gunning, M. Stefik, J. Choi, T. Miller, S. tors in Computing Systems. CHI ’19. New York,
Stumpf, and G.-Z. Yang. “XAI—Explainable Ar-
17
Appendices
18
Table A2 Random Forest & Extra Trees Hyperparameters tuned in the experiments.